RDD.
repartitionAndSortWithinPartitions
Repartition the RDD according to the given partitioner and, within each resulting partition, sort records by their keys.
New in version 1.2.0.
the number of partitions in new RDD
RDD
a function to compute the partition index
sort the keys in ascending or descending order
a function to compute the key
a new RDD
See also
RDD.repartition()
RDD.partitionBy()
RDD.sortBy()
RDD.sortByKey()
Examples
>>> rdd = sc.parallelize([(0, 5), (3, 8), (2, 6), (0, 8), (3, 8), (1, 3)]) >>> rdd2 = rdd.repartitionAndSortWithinPartitions(2, lambda x: x % 2, True) >>> rdd2.glom().collect() [[(0, 5), (0, 8), (2, 6)], [(1, 3), (3, 8), (3, 8)]]