Class StreamingKMeansModel
- All Implemented Interfaces:
- Serializable,- org.apache.spark.internal.Logging,- PMMLExportable,- Saveable
The update algorithm uses the "mini-batch" KMeans rule, generalized to incorporate forgetfulness (i.e. decay). The update rule (for each cluster) is:
$$ \begin{align} c_{t+1} &= [(c_t * n_t * a) + (x_t * m_t)] / [n_t + m_t] \\ n_{t+1} &= n_t * a + m_t \end{align} $$
Where c_t is the previously estimated centroid for that cluster, n_t is the number of points assigned to it thus far, x_t is the centroid estimated on the current batch, and m_t is the number of points assigned to that centroid in the current batch.
The decay factor 'a' scales the contribution of the clusters as estimated thus far, by applying a as a discount weighting on the current point when evaluating new incoming data. If a=1, all batches are weighted equally. If a=0, new centroids are determined entirely by recent data. Lower values correspond to more forgetting.
Decay can optionally be specified by a half life and associated time unit. The time unit can either be a batch of data or a single data point. Considering data arrived at time t, the half life h is defined such that at time t + h the discount applied to the data from t is 0.5. The definition remains the same whether the time unit is given as batches or points.
- See Also:
- 
Nested Class SummaryNested classes/interfaces inherited from class org.apache.spark.mllib.clustering.KMeansModelKMeansModel.Cluster$, KMeansModel.SaveLoadV1_0$, KMeansModel.SaveLoadV2_0$Nested classes/interfaces inherited from interface org.apache.spark.internal.Loggingorg.apache.spark.internal.Logging.LogStringContext, org.apache.spark.internal.Logging.SparkShellLoggingFilter
- 
Constructor SummaryConstructors
- 
Method SummaryMethods inherited from class org.apache.spark.mllib.clustering.KMeansModelcomputeCost, distanceMeasure, k, load, predict, predict, predict, save, trainingCostMethods inherited from class java.lang.Objectequals, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface org.apache.spark.internal.LogginginitializeForcefully, initializeLogIfNecessary, initializeLogIfNecessary, initializeLogIfNecessary$default$2, isTraceEnabled, log, logBasedOnLevel, logDebug, logDebug, logDebug, logDebug, logError, logError, logError, logError, logInfo, logInfo, logInfo, logInfo, logName, LogStringContext, logTrace, logTrace, logTrace, logTrace, logWarning, logWarning, logWarning, logWarning, org$apache$spark$internal$Logging$$log_, org$apache$spark$internal$Logging$$log__$eq, withLogContext
- 
Constructor Details- 
StreamingKMeansModel
 
- 
- 
Method Details- 
clusterCenters- Overrides:
- clusterCentersin class- KMeansModel
 
- 
clusterWeightspublic double[] clusterWeights()
- 
updatePerform a k-means update on a batch of data.- Parameters:
- data- (undocumented)
- decayFactor- (undocumented)
- timeUnit- (undocumented)
- Returns:
- (undocumented)
 
 
-