pyspark.RDD.randomSplit#
- RDD.randomSplit(weights, seed=None)[source]#
Randomly splits this RDD with the provided weights.
New in version 1.3.0.
- Parameters
- weightslist
weights for splits, will be normalized if they don’t sum to 1
- seedint, optional
random seed
- Returns
- list
split
RDD
s in a list
See also
Examples
>>> rdd = sc.parallelize(range(500), 1) >>> rdd1, rdd2 = rdd.randomSplit([2, 3], 17) >>> len(rdd1.collect() + rdd2.collect()) 500 >>> 150 < rdd1.count() < 250 True >>> 250 < rdd2.count() < 350 True