public final class RandomForestRegressionModel extends PredictionModel<Vector,RandomForestRegressionModel> implements scala.Serializable
Random Forest
model for regression.
It supports both continuous and categorical features.
param: _trees Decision trees in the ensemble.
param: numFeatures Number of features used by this modelModifier and Type | Method and Description |
---|---|
RandomForestRegressionModel |
copy(ParamMap extra)
Creates a copy of this instance with the same UID and some extra params.
|
Vector |
featureImportances()
Estimate of the importance of each feature.
|
static RandomForestRegressionModel |
fromOld(RandomForestModel oldModel,
RandomForestRegressor parent,
scala.collection.immutable.Map<java.lang.Object,java.lang.Object> categoricalFeatures)
(private[ml]) Convert a model from the old API
|
int |
numFeatures() |
protected double |
predict(Vector features)
Predict label for the given features.
|
java.lang.String |
toString() |
protected DataFrame |
transformImpl(DataFrame dataset) |
org.apache.spark.ml.tree.DecisionTreeModel[] |
trees() |
double[] |
treeWeights() |
java.lang.String |
uid()
An immutable unique ID for the object and its derivatives.
|
StructType |
validateAndTransformSchema(StructType schema,
boolean fitting,
DataType featuresDataType)
Validates and transforms the input schema with the provided param map.
|
featuresDataType, setFeaturesCol, setPredictionCol, transform, transformSchema
transform, transform, transform
transformSchema
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
clear, copyValues, defaultCopy, defaultParamMap, explainParam, explainParams, extractParamMap, extractParamMap, get, getDefault, getOrDefault, getParam, hasDefault, hasParam, isDefined, isSet, paramMap, params, set, set, set, setDefault, setDefault, shouldOwn, validateParams
initializeIfNecessary, initializeLogging, isTraceEnabled, log_, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning
public static RandomForestRegressionModel fromOld(RandomForestModel oldModel, RandomForestRegressor parent, scala.collection.immutable.Map<java.lang.Object,java.lang.Object> categoricalFeatures)
public java.lang.String uid()
Identifiable
uid
in interface Identifiable
public int numFeatures()
public org.apache.spark.ml.tree.DecisionTreeModel[] trees()
public double[] treeWeights()
protected DataFrame transformImpl(DataFrame dataset)
transformImpl
in class PredictionModel<Vector,RandomForestRegressionModel>
protected double predict(Vector features)
PredictionModel
transform()
and output predictionCol
.predict
in class PredictionModel<Vector,RandomForestRegressionModel>
features
- (undocumented)public RandomForestRegressionModel copy(ParamMap extra)
Params
copy
in interface Params
copy
in class Model<RandomForestRegressionModel>
extra
- (undocumented)defaultCopy()
public java.lang.String toString()
toString
in interface Identifiable
toString
in class java.lang.Object
public Vector featureImportances()
This generalizes the idea of "Gini" importance to other losses, following the explanation of Gini importance from "Random Forests" documentation by Leo Breiman and Adele Cutler, and following the implementation from scikit-learn.
This feature importance is calculated as follows: - Average over trees: - importance(feature j) = sum (over nodes which split on feature j) of the gain, where gain is scaled by the number of instances passing through node - Normalize importances for tree based on total number of training instances used to build tree. - Normalize feature importance vector to sum to 1.
public StructType validateAndTransformSchema(StructType schema, boolean fitting, DataType featuresDataType)
schema
- input schemafitting
- whether this is in fittingfeaturesDataType
- SQL DataType for FeaturesType.
E.g., VectorUDT
for vector features.