pyspark.pandas.Series.corr¶
-
Series.
corr
(other: pyspark.pandas.series.Series, method: str = 'pearson') → float[source]¶ Compute correlation with other Series, excluding missing values.
- Parameters
- otherSeries
- method{‘pearson’, ‘spearman’}
pearson : standard correlation coefficient
spearman : Spearman rank correlation
- Returns
- correlationfloat
Notes
There are behavior differences between pandas-on-Spark and pandas.
the method argument only accepts ‘pearson’, ‘spearman’
the data should not contain NaNs. pandas-on-Spark will return an error.
pandas-on-Spark doesn’t support the following argument(s).
min_periods argument is not supported
Examples
>>> df = ps.DataFrame({'s1': [.2, .0, .6, .2], ... 's2': [.3, .6, .0, .1]}) >>> s1 = df.s1 >>> s2 = df.s2 >>> s1.corr(s2, method='pearson') -0.851064...
>>> s1.corr(s2, method='spearman') -0.948683...