pyspark.sql.functions.array_size#

pyspark.sql.functions.array_size(col)[source]#

Array function: returns the total number of elements in the array. The function returns null for null input.

New in version 3.5.0.

Parameters
colColumn or str

The name of the column or an expression that represents the array.

Returns
Column

A new column that contains the size of each array.

Examples

Example 1: Basic usage with integer array

>>> from pyspark.sql import functions as sf
>>> df = spark.createDataFrame([([2, 1, 3],), (None,)], ['data'])
>>> df.select(sf.array_size(df.data)).show()
+----------------+
|array_size(data)|
+----------------+
|               3|
|            NULL|
+----------------+

Example 2: Usage with string array

>>> from pyspark.sql import functions as sf
>>> df = spark.createDataFrame([(['apple', 'banana', 'cherry'],)], ['data'])
>>> df.select(sf.array_size(df.data)).show()
+----------------+
|array_size(data)|
+----------------+
|               3|
+----------------+

Example 3: Usage with mixed type array

>>> from pyspark.sql import functions as sf
>>> df = spark.createDataFrame([(['apple', 1, 'cherry'],)], ['data'])
>>> df.select(sf.array_size(df.data)).show()
+----------------+
|array_size(data)|
+----------------+
|               3|
+----------------+

Example 4: Usage with array of arrays

>>> from pyspark.sql import functions as sf
>>> df = spark.createDataFrame([([[2, 1], [3, 4]],)], ['data'])
>>> df.select(sf.array_size(df.data)).show()
+----------------+
|array_size(data)|
+----------------+
|               2|
+----------------+

Example 5: Usage with empty array

>>> from pyspark.sql import functions as sf
>>> from pyspark.sql.types import ArrayType, IntegerType, StructType, StructField
>>> schema = StructType([
...   StructField("data", ArrayType(IntegerType()), True)
... ])
>>> df = spark.createDataFrame([([],)], schema=schema)
>>> df.select(sf.array_size(df.data)).show()
+----------------+
|array_size(data)|
+----------------+
|               0|
+----------------+