|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
Object org.apache.spark.mllib.clustering.KMeans
public class KMeans
K-means clustering with support for multiple parallel runs and a k-means++ like initialization mode (the k-means|| algorithm by Bahmani et al). When multiple concurrent runs are requested, they are executed together with joint passes over the data for efficiency.
This is an iterative algorithm that will make multiple passes over the data, so any RDDs given to it should be cached by the user.
Constructor Summary | |
---|---|
KMeans()
Constructs a KMeans instance with default parameters: {k: 2, maxIterations: 20, runs: 1, initializationMode: "k-means||", initializationSteps: 5, epsilon: 1e-4, seed: random}. |
Method Summary | |
---|---|
double |
getEpsilon()
The distance threshold within which we've consider centers to have converged. |
String |
getInitializationMode()
The initialization algorithm. |
int |
getInitializationSteps()
Number of steps for the k-means|| initialization mode |
int |
getK()
Number of clusters to create (k). |
int |
getMaxIterations()
Maximum number of iterations to run. |
int |
getRuns()
:: Experimental :: Number of runs of the algorithm to execute in parallel. |
long |
getSeed()
The random seed for cluster initialization. |
static String |
K_MEANS_PARALLEL()
|
static String |
RANDOM()
|
KMeansModel |
run(RDD<Vector> data)
Train a K-means model on the given set of points; data should be cached for high
performance, because this is an iterative algorithm. |
KMeans |
setEpsilon(double epsilon)
Set the distance threshold within which we've consider centers to have converged. |
KMeans |
setInitializationMode(String initializationMode)
Set the initialization algorithm. |
KMeans |
setInitializationSteps(int initializationSteps)
Set the number of steps for the k-means|| initialization mode. |
KMeans |
setK(int k)
Set the number of clusters to create (k). |
KMeans |
setMaxIterations(int maxIterations)
Set maximum number of iterations to run. |
KMeans |
setRuns(int runs)
:: Experimental :: Set the number of runs of the algorithm to execute in parallel. |
KMeans |
setSeed(long seed)
Set the random seed for cluster initialization. |
static KMeansModel |
train(RDD<Vector> data,
int k,
int maxIterations)
Trains a k-means model using specified parameters and the default values for unspecified. |
static KMeansModel |
train(RDD<Vector> data,
int k,
int maxIterations,
int runs)
Trains a k-means model using specified parameters and the default values for unspecified. |
static KMeansModel |
train(RDD<Vector> data,
int k,
int maxIterations,
int runs,
String initializationMode)
Trains a k-means model using the given set of parameters. |
static KMeansModel |
train(RDD<Vector> data,
int k,
int maxIterations,
int runs,
String initializationMode,
long seed)
Trains a k-means model using the given set of parameters. |
Methods inherited from class Object |
---|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Methods inherited from interface org.apache.spark.Logging |
---|
initializeIfNecessary, initializeLogging, isTraceEnabled, log_, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning |
Constructor Detail |
---|
public KMeans()
Method Detail |
---|
public static String RANDOM()
public static String K_MEANS_PARALLEL()
public static KMeansModel train(RDD<Vector> data, int k, int maxIterations, int runs, String initializationMode, long seed)
data
- training points stored as RDD[Vector]
k
- number of clustersmaxIterations
- max number of iterationsruns
- number of parallel runs, defaults to 1. The best model is returned.initializationMode
- initialization model, either "random" or "k-means||" (default).seed
- random seed value for cluster initialization
public static KMeansModel train(RDD<Vector> data, int k, int maxIterations, int runs, String initializationMode)
data
- training points stored as RDD[Vector]
k
- number of clustersmaxIterations
- max number of iterationsruns
- number of parallel runs, defaults to 1. The best model is returned.initializationMode
- initialization model, either "random" or "k-means||" (default).
public static KMeansModel train(RDD<Vector> data, int k, int maxIterations)
data
- (undocumented)k
- (undocumented)maxIterations
- (undocumented)
public static KMeansModel train(RDD<Vector> data, int k, int maxIterations, int runs)
data
- (undocumented)k
- (undocumented)maxIterations
- (undocumented)runs
- (undocumented)
public int getK()
public KMeans setK(int k)
public int getMaxIterations()
public KMeans setMaxIterations(int maxIterations)
public String getInitializationMode()
public KMeans setInitializationMode(String initializationMode)
initializationMode
- (undocumented)
public int getRuns()
public KMeans setRuns(int runs)
runs
- (undocumented)
public int getInitializationSteps()
public KMeans setInitializationSteps(int initializationSteps)
initializationSteps
- (undocumented)
public double getEpsilon()
public KMeans setEpsilon(double epsilon)
epsilon
- (undocumented)
public long getSeed()
public KMeans setSeed(long seed)
public KMeansModel run(RDD<Vector> data)
data
should be cached for high
performance, because this is an iterative algorithm.
data
- (undocumented)
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |