pyspark.sql.functions.
spark_partition_id
A column for partition ID.
New in version 1.6.0.
Changed in version 3.4.0: Supports Spark Connect.
Column
partition id the record belongs to.
Notes
This is non deterministic because it depends on data partitioning and task scheduling.
Examples
>>> df = spark.range(2) >>> df.repartition(1).select(spark_partition_id().alias("pid")).collect() [Row(pid=0), Row(pid=0)]