List of supported feature subset sampling strategies.
List of supported feature subset sampling strategies.
Java-friendly API for org.apache.spark.mllib.tree.RandomForest$#trainClassifier
Java-friendly API for org.apache.spark.mllib.tree.RandomForest$#trainClassifier
Method to train a decision tree model for binary or multiclass classification.
Method to train a decision tree model for binary or multiclass classification.
Training dataset: RDD of org.apache.spark.mllib.regression.LabeledPoint. Labels should take values {0, 1, ..., numClasses-1}.
number of classes for classification.
Map storing arity of categorical features. E.g., an entry (n -> k) indicates that feature n is categorical with k categories indexed from 0: {0, 1, ..., k-1}.
Number of trees in the random forest.
Number of features to consider for splits at each node. Supported: "auto", "all", "sqrt", "log2", "onethird". If "auto" is set, this parameter is set based on numTrees: if numTrees == 1, set to "all"; if numTrees > 1 (forest) set to "sqrt".
Criterion used for information gain calculation. Supported values: "gini" (recommended) or "entropy".
Maximum depth of the tree. E.g., depth 0 means 1 leaf node; depth 1 means 1 internal node + 2 leaf nodes. (suggested value: 4)
maximum number of bins used for splitting features (suggested value: 100)
Random seed for bootstrapping and choosing feature subsets.
a random forest model that can be used for prediction
Method to train a decision tree model for binary or multiclass classification.
Method to train a decision tree model for binary or multiclass classification.
Training dataset: RDD of org.apache.spark.mllib.regression.LabeledPoint. Labels should take values {0, 1, ..., numClasses-1}.
Parameters for training each tree in the forest.
Number of trees in the random forest.
Number of features to consider for splits at each node. Supported: "auto", "all", "sqrt", "log2", "onethird". If "auto" is set, this parameter is set based on numTrees: if numTrees == 1, set to "all"; if numTrees > 1 (forest) set to "sqrt".
Random seed for bootstrapping and choosing feature subsets.
a random forest model that can be used for prediction
Java-friendly API for org.apache.spark.mllib.tree.RandomForest$#trainRegressor
Java-friendly API for org.apache.spark.mllib.tree.RandomForest$#trainRegressor
Method to train a decision tree model for regression.
Method to train a decision tree model for regression.
Training dataset: RDD of org.apache.spark.mllib.regression.LabeledPoint. Labels are real numbers.
Map storing arity of categorical features. E.g., an entry (n -> k) indicates that feature n is categorical with k categories indexed from 0: {0, 1, ..., k-1}.
Number of trees in the random forest.
Number of features to consider for splits at each node. Supported: "auto", "all", "sqrt", "log2", "onethird". If "auto" is set, this parameter is set based on numTrees: if numTrees == 1, set to "all"; if numTrees > 1 (forest) set to "onethird".
Criterion used for information gain calculation. Supported values: "variance".
Maximum depth of the tree. E.g., depth 0 means 1 leaf node; depth 1 means 1 internal node + 2 leaf nodes. (suggested value: 4)
maximum number of bins used for splitting features (suggested value: 100)
Random seed for bootstrapping and choosing feature subsets.
a random forest model that can be used for prediction
Method to train a decision tree model for regression.
Method to train a decision tree model for regression.
Training dataset: RDD of org.apache.spark.mllib.regression.LabeledPoint. Labels are real numbers.
Parameters for training each tree in the forest.
Number of trees in the random forest.
Number of features to consider for splits at each node. Supported: "auto", "all", "sqrt", "log2", "onethird". If "auto" is set, this parameter is set based on numTrees: if numTrees == 1, set to "all"; if numTrees > 1 (forest) set to "onethird".
Random seed for bootstrapping and choosing feature subsets.
a random forest model that can be used for prediction