public class StreamingKMeans extends Object implements Logging
StreamingKMeans provides methods for configuring a streaming k-means analysis, training the model on streaming, and using the model to make predictions on streaming data. See KMeansModel for details on algorithm and update rules.
Use a builder pattern to construct a streaming k-means analysis in an application, like:
val model = new StreamingKMeans()
.setDecayFactor(0.5)
.setK(3)
.setRandomCenters(5, 100.0)
.trainOn(DStream)
Constructor and Description |
---|
StreamingKMeans() |
StreamingKMeans(int k,
double decayFactor,
String timeUnit) |
Modifier and Type | Method and Description |
---|---|
static String |
BATCHES() |
double |
decayFactor() |
int |
k() |
StreamingKMeansModel |
latestModel()
Return the latest model.
|
static String |
POINTS() |
DStream<Object> |
predictOn(DStream<Vector> data)
Use the clustering model to make predictions on batches of data from a DStream.
|
<K> DStream<scala.Tuple2<K,Object>> |
predictOnValues(DStream<scala.Tuple2<K,Vector>> data,
scala.reflect.ClassTag<K> evidence$1)
Use the model to make predictions on the values of a DStream and carry over its keys.
|
StreamingKMeans |
setDecayFactor(double a)
Set the decay factor directly (for forgetful algorithms).
|
StreamingKMeans |
setHalfLife(double halfLife,
String timeUnit)
Set the half life and time unit ("batches" or "points") for forgetful algorithms.
|
StreamingKMeans |
setInitialCenters(Vector[] centers,
double[] weights)
Specify initial centers directly.
|
StreamingKMeans |
setK(int k)
Set the number of clusters.
|
StreamingKMeans |
setRandomCenters(int dim,
double weight,
long seed)
Initialize random centers, requiring only the number of dimensions.
|
String |
timeUnit() |
void |
trainOn(DStream<Vector> data)
Update the clustering model by training on batches of data from a DStream.
|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
initializeIfNecessary, initializeLogging, isTraceEnabled, log_, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning
public StreamingKMeans(int k, double decayFactor, String timeUnit)
public StreamingKMeans()
public static final String BATCHES()
public static final String POINTS()
public int k()
public double decayFactor()
public String timeUnit()
public StreamingKMeans setK(int k)
public StreamingKMeans setDecayFactor(double a)
public StreamingKMeans setHalfLife(double halfLife, String timeUnit)
public StreamingKMeans setInitialCenters(Vector[] centers, double[] weights)
public StreamingKMeans setRandomCenters(int dim, double weight, long seed)
dim
- Number of dimensionsweight
- Weight for each centerseed
- Random seedpublic StreamingKMeansModel latestModel()
public void trainOn(DStream<Vector> data)
data
- DStream containing vector datapublic DStream<Object> predictOn(DStream<Vector> data)
data
- DStream containing vector datapublic <K> DStream<scala.Tuple2<K,Object>> predictOnValues(DStream<scala.Tuple2<K,Vector>> data, scala.reflect.ClassTag<K> evidence$1)
data
- DStream containing (key, feature vector) pairs