pyspark.sql.functions.hll_sketch_estimate#
- pyspark.sql.functions.hll_sketch_estimate(col)[source]#
Returns the estimated number of unique values given the binary representation of a Datasketches HllSketch.
New in version 3.5.0.
Examples
>>> df = spark.createDataFrame([1,2,2,3], "INT") >>> df = df.agg(hll_sketch_estimate(hll_sketch_agg("value")).alias("distinct_cnt")) >>> df.show() +------------+ |distinct_cnt| +------------+ | 3| +------------+