org.apache.spark.mllib.feature
:: Experimental :: Inverse document frequency (IDF). The standard formulation is used: idf = log((m + 1) / (d(t) + 1)), where m is the total number of documents and d(t) is the number of documents that contain term t.
idf = log((m + 1) / (d(t) + 1))
m
d(t)
t
Computes the inverse document frequency.
a JavaRDD of term frequency vectors
an RDD of term frequency vectors
:: Experimental :: Inverse document frequency (IDF). The standard formulation is used:
idf = log((m + 1) / (d(t) + 1))
, wherem
is the total number of documents andd(t)
is the number of documents that contain termt
.