public class MultivariateOnlineSummarizer extends Object implements MultivariateStatisticalSummary, scala.Serializable
MultivariateStatisticalSummary
to compute the mean,
variance, minimum, maximum, counts, and nonzero counts for samples in sparse or dense vector
format in a online fashion.
Two MultivariateOnlineSummarizer can be merged together to have a statistical summary of the corresponding joint dataset.
A numerically stable algorithm is implemented to compute sample mean and variance:
Reference: variance-wiki
Zero elements (including explicit zero values) are skipped when calling add(),
to have time complexity O(nnz) instead of O(n) for each column.
Constructor and Description |
---|
MultivariateOnlineSummarizer() |
Modifier and Type | Method and Description |
---|---|
MultivariateOnlineSummarizer |
add(Vector sample)
Add a new sample to this summarizer, and update the statistical summary.
|
long |
count()
Sample size.
|
Vector |
max()
Maximum value of each column.
|
Vector |
mean()
Sample mean vector.
|
MultivariateOnlineSummarizer |
merge(MultivariateOnlineSummarizer other)
Merge another MultivariateOnlineSummarizer, and update the statistical summary.
|
Vector |
min()
Minimum value of each column.
|
Vector |
normL1()
L1 norm of each column
|
Vector |
normL2()
Euclidean magnitude of each column
|
Vector |
numNonzeros()
Number of nonzero elements (including explicitly presented zero values) in each column.
|
Vector |
variance()
Sample variance vector.
|
public MultivariateOnlineSummarizer add(Vector sample)
sample
- The sample in dense/sparse vector format to be added into this summarizer.public MultivariateOnlineSummarizer merge(MultivariateOnlineSummarizer other)
this
object will be modified.)
other
- The other MultivariateOnlineSummarizer to be merged.public Vector mean()
MultivariateStatisticalSummary
mean
in interface MultivariateStatisticalSummary
public Vector variance()
MultivariateStatisticalSummary
variance
in interface MultivariateStatisticalSummary
public long count()
MultivariateStatisticalSummary
count
in interface MultivariateStatisticalSummary
public Vector numNonzeros()
MultivariateStatisticalSummary
numNonzeros
in interface MultivariateStatisticalSummary
public Vector max()
MultivariateStatisticalSummary
max
in interface MultivariateStatisticalSummary
public Vector min()
MultivariateStatisticalSummary
min
in interface MultivariateStatisticalSummary
public Vector normL2()
MultivariateStatisticalSummary
normL2
in interface MultivariateStatisticalSummary
public Vector normL1()
MultivariateStatisticalSummary
normL1
in interface MultivariateStatisticalSummary