pyspark.sql.functions.collect_list¶
-
pyspark.sql.functions.
collect_list
(col: ColumnOrName) → pyspark.sql.column.Column[source]¶ Aggregate function: returns a list of objects with duplicates.
New in version 1.6.0.
Changed in version 3.4.0: Supports Spark Connect.
- Parameters
- col
Column
or str target column to compute on.
- col
- Returns
Column
list of objects with duplicates.
Notes
The function is non-deterministic because the order of collected results depends on the order of the rows which may be non-deterministic after a shuffle.
Examples
>>> df2 = spark.createDataFrame([(2,), (5,), (5,)], ('age',)) >>> df2.agg(collect_list('age')).collect() [Row(collect_list(age)=[2, 5, 5])]