pyspark.sql.functions.covar_samp#

pyspark.sql.functions.covar_samp(col1, col2)[source]#

Returns a new Column for the sample covariance of col1 and col2.

New in version 2.0.0.

Changed in version 3.4.0: Supports Spark Connect.

Parameters
col1Column or column name

first column to calculate covariance.

col2Column or column name

second column to calculate covariance.

Returns
Column

sample covariance of these two column values.

Examples

>>> from pyspark.sql import functions as sf
>>> a = [1] * 10
>>> b = [1] * 10
>>> df = spark.createDataFrame(zip(a, b), ["a", "b"])
>>> df.agg(sf.covar_samp("a", df.b)).show()
+----------------+
|covar_samp(a, b)|
+----------------+
|             0.0|
+----------------+