public class CoGroupedRDD<K> extends RDD<scala.Tuple2<K,scala.collection.Iterable<Object>[]>>
Note: This is an internal API. We recommend users use RDD.cogroup(...) instead of instantiating this directly.
param: rdds parent RDDs. param: part partitioner used to partition the shuffle output
Constructor and Description |
---|
CoGroupedRDD(scala.collection.Seq<RDD<? extends scala.Product2<K,?>>> rdds,
Partitioner part) |
Modifier and Type | Method and Description |
---|---|
void |
clearDependencies()
Clears the dependencies of this RDD.
|
scala.collection.Iterator<scala.Tuple2<K,scala.collection.Iterable<Object>[]>> |
compute(Partition s,
TaskContext context)
:: DeveloperApi ::
Implemented by subclasses to compute a given partition.
|
scala.collection.Seq<Dependency<?>> |
getDependencies()
Implemented by subclasses to return how this RDD depends on parent RDDs.
|
Partition[] |
getPartitions()
Implemented by subclasses to return the set of partitions in this RDD.
|
scala.Some<Partitioner> |
partitioner()
Optionally overridden by subclasses to specify how they are partitioned.
|
scala.collection.Seq<RDD<? extends scala.Product2<K,?>>> |
rdds() |
CoGroupedRDD<K> |
setSerializer(Serializer serializer)
Set a serializer for this RDD's shuffle, or null to use the default (spark.serializer)
|
aggregate, cache, cartesian, checkpoint, checkpointData, coalesce, collect, collect, context, count, countApprox, countApproxDistinct, countApproxDistinct, countByValue, countByValueApprox, creationSite, dependencies, distinct, distinct, doubleRDDToDoubleRDDFunctions, filter, filterWith, first, flatMap, flatMapWith, fold, foreach, foreachPartition, foreachWith, getCheckpointFile, getStorageLevel, glom, groupBy, groupBy, groupBy, id, intersection, intersection, intersection, isCheckpointed, isEmpty, iterator, keyBy, map, mapPartitions, mapPartitionsWithContext, mapPartitionsWithIndex, mapPartitionsWithSplit, mapWith, max, min, name, numericRDDToDoubleRDDFunctions, partitions, persist, persist, pipe, pipe, pipe, preferredLocations, randomSplit, rddToAsyncRDDActions, rddToOrderedRDDFunctions, rddToPairRDDFunctions, rddToSequenceFileRDDFunctions, reduce, repartition, sample, saveAsObjectFile, saveAsTextFile, saveAsTextFile, scope, setName, sortBy, sparkContext, subtract, subtract, subtract, take, takeOrdered, takeSample, toArray, toDebugString, toJavaRDD, toLocalIterator, top, toString, treeAggregate, treeReduce, union, unpersist, zip, zipPartitions, zipPartitions, zipPartitions, zipPartitions, zipPartitions, zipPartitions, zipWithIndex, zipWithUniqueId
initializeIfNecessary, initializeLogging, isTraceEnabled, log_, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning
public CoGroupedRDD(scala.collection.Seq<RDD<? extends scala.Product2<K,?>>> rdds, Partitioner part)
public CoGroupedRDD<K> setSerializer(Serializer serializer)
public scala.collection.Seq<Dependency<?>> getDependencies()
RDD
public Partition[] getPartitions()
RDD
public scala.Some<Partitioner> partitioner()
RDD
partitioner
in class RDD<scala.Tuple2<K,scala.collection.Iterable<Object>[]>>
public scala.collection.Iterator<scala.Tuple2<K,scala.collection.Iterable<Object>[]>> compute(Partition s, TaskContext context)
RDD
public void clearDependencies()
RDD
UnionRDD
for an example.