public abstract class VertexRDD<VD> extends RDD<scala.Tuple2<Object,VD>>
RDD[(VertexId, VD)]
by ensuring that there is only one entry for each vertex and by
pre-indexing the entries for fast, efficient joins. Two VertexRDDs with the same index can be
joined efficiently. All operations except reindex
preserve the index. To construct a
VertexRDD
, use the VertexRDD object
.
Additionally, stores routing information to enable joining the vertex attributes with an
EdgeRDD
.
Constructor and Description |
---|
VertexRDD(SparkContext sc,
scala.collection.Seq<Dependency<?>> deps) |
Modifier and Type | Method and Description |
---|---|
abstract <VD2> VertexRDD<VD2> |
aggregateUsingIndex(RDD<scala.Tuple2<Object,VD2>> messages,
scala.Function2<VD2,VD2,VD2> reduceFunc,
scala.reflect.ClassTag<VD2> evidence$12)
Aggregates vertices in
messages that have the same ids using reduceFunc , returning a
VertexRDD co-indexed with this . |
static <VD> VertexRDD<VD> |
apply(RDD<scala.Tuple2<Object,VD>> vertices,
scala.reflect.ClassTag<VD> evidence$14)
Constructs a standalone
VertexRDD (one that is not set up for efficient joins with an
EdgeRDD ) from an RDD of vertex-attribute pairs. |
static <VD> VertexRDD<VD> |
apply(RDD<scala.Tuple2<Object,VD>> vertices,
EdgeRDD<?> edges,
VD defaultVal,
scala.reflect.ClassTag<VD> evidence$15)
Constructs a
VertexRDD from an RDD of vertex-attribute pairs. |
static <VD> VertexRDD<VD> |
apply(RDD<scala.Tuple2<Object,VD>> vertices,
EdgeRDD<?> edges,
VD defaultVal,
scala.Function2<VD,VD,VD> mergeFunc,
scala.reflect.ClassTag<VD> evidence$16)
Constructs a
VertexRDD from an RDD of vertex-attribute pairs. |
scala.collection.Iterator<scala.Tuple2<Object,VD>> |
compute(Partition part,
TaskContext context)
Provides the
RDD[(VertexId, VD)] equivalent output. |
static RDD<RoutingTablePartition> |
createRoutingTables(EdgeRDD<?> edges,
Partitioner vertexPartitioner) |
abstract VertexRDD<VD> |
diff(VertexRDD<VD> other)
Hides vertices that are the same between
this and other ; for vertices that are different,
keeps the values from other . |
VertexRDD<VD> |
filter(scala.Function1<scala.Tuple2<Object,VD>,Object> pred)
Restricts the vertex set to the set of vertices satisfying the given predicate.
|
static <VD> VertexRDD<VD> |
fromEdges(EdgeRDD<?> edges,
int numPartitions,
VD defaultVal,
scala.reflect.ClassTag<VD> evidence$17)
Constructs a
VertexRDD containing all vertices referred to in edges . |
abstract <U,VD2> VertexRDD<VD2> |
innerJoin(RDD<scala.Tuple2<Object,U>> other,
scala.Function3<Object,VD,U,VD2> f,
scala.reflect.ClassTag<U> evidence$10,
scala.reflect.ClassTag<VD2> evidence$11)
Inner joins this VertexRDD with an RDD containing vertex attribute pairs.
|
abstract <U,VD2> VertexRDD<VD2> |
innerZipJoin(VertexRDD<U> other,
scala.Function3<Object,VD,U,VD2> f,
scala.reflect.ClassTag<U> evidence$8,
scala.reflect.ClassTag<VD2> evidence$9)
Efficiently inner joins this VertexRDD with another VertexRDD sharing the same index.
|
abstract <VD2,VD3> VertexRDD<VD3> |
leftJoin(RDD<scala.Tuple2<Object,VD2>> other,
scala.Function3<Object,VD,scala.Option<VD2>,VD3> f,
scala.reflect.ClassTag<VD2> evidence$6,
scala.reflect.ClassTag<VD3> evidence$7)
Left joins this VertexRDD with an RDD containing vertex attribute pairs.
|
abstract <VD2,VD3> VertexRDD<VD3> |
leftZipJoin(VertexRDD<VD2> other,
scala.Function3<Object,VD,scala.Option<VD2>,VD3> f,
scala.reflect.ClassTag<VD2> evidence$4,
scala.reflect.ClassTag<VD3> evidence$5)
Left joins this RDD with another VertexRDD with the same index.
|
abstract <VD2> VertexRDD<VD2> |
mapValues(scala.Function1<VD,VD2> f,
scala.reflect.ClassTag<VD2> evidence$2)
Maps each vertex attribute, preserving the index.
|
abstract <VD2> VertexRDD<VD2> |
mapValues(scala.Function2<Object,VD,VD2> f,
scala.reflect.ClassTag<VD2> evidence$3)
Maps each vertex attribute, additionally supplying the vertex ID.
|
abstract <VD2> VertexRDD<VD2> |
mapVertexPartitions(scala.Function1<ShippableVertexPartition<VD>,ShippableVertexPartition<VD2>> f,
scala.reflect.ClassTag<VD2> evidence$1)
Applies a function to each
VertexPartition of this RDD and returns a new VertexRDD. |
abstract RDD<ShippableVertexPartition<VD>> |
partitionsRDD() |
abstract VertexRDD<VD> |
reindex()
Construct a new VertexRDD that is indexed by only the visible vertices.
|
abstract VertexRDD<VD> |
reverseRoutingTables()
Returns a new
VertexRDD reflecting a reversal of all edge directions in the corresponding
EdgeRDD . |
abstract RDD<scala.Tuple2<Object,VertexAttributeBlock<VD>>> |
shipVertexAttributes(boolean shipSrc,
boolean shipDst)
Generates an RDD of vertex attributes suitable for shipping to the edge partitions.
|
abstract RDD<scala.Tuple2<Object,long[]>> |
shipVertexIds()
Generates an RDD of vertex IDs suitable for shipping to the edge partitions.
|
abstract VertexRDD<VD> |
withEdges(EdgeRDD<?> edges)
Prepares this VertexRDD for efficient joins with the given EdgeRDD.
|
abstract <VD2> VertexRDD<VD2> |
withPartitionsRDD(RDD<ShippableVertexPartition<VD2>> partitionsRDD,
scala.reflect.ClassTag<VD2> evidence$13)
Replaces the vertex partitions while preserving all other properties of the VertexRDD.
|
abstract VertexRDD<VD> |
withTargetStorageLevel(StorageLevel targetStorageLevel)
Changes the target storage level while preserving all other properties of the
VertexRDD.
|
aggregate, cache, cartesian, checkpoint, checkpointData, coalesce, collect, collect, collectPartitions, computeOrReadCheckpoint, conf, context, count, countApprox, countApproxDistinct, countApproxDistinct, countByValue, countByValueApprox, creationSite, dependencies, distinct, distinct, doCheckpoint, elementClassTag, filterWith, first, flatMap, flatMapWith, fold, foreach, foreachPartition, foreachWith, getCheckpointFile, getCreationSite, getNarrowAncestors, getStorageLevel, glom, groupBy, groupBy, groupBy, id, intersection, intersection, intersection, isCheckpointed, iterator, keyBy, map, mapPartitions, mapPartitionsWithContext, mapPartitionsWithIndex, mapPartitionsWithSplit, mapWith, markCheckpointed, max, min, name, partitioner, partitions, persist, persist, pipe, pipe, pipe, preferredLocations, randomSplit, reduce, repartition, retag, retag, sample, saveAsObjectFile, saveAsTextFile, saveAsTextFile, setName, sortBy, sparkContext, subtract, subtract, subtract, take, takeOrdered, takeSample, toArray, toDebugString, toJavaRDD, toLocalIterator, top, toString, union, unpersist, zip, zipPartitions, zipPartitions, zipPartitions, zipPartitions, zipPartitions, zipPartitions, zipWithIndex, zipWithUniqueId
initializeIfNecessary, initializeLogging, isTraceEnabled, log_, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning
public VertexRDD(SparkContext sc, scala.collection.Seq<Dependency<?>> deps)
public static <VD> VertexRDD<VD> apply(RDD<scala.Tuple2<Object,VD>> vertices, scala.reflect.ClassTag<VD> evidence$14)
VertexRDD
(one that is not set up for efficient joins with an
EdgeRDD
) from an RDD of vertex-attribute pairs. Duplicate entries are removed arbitrarily.
vertices
- the collection of vertex-attribute pairspublic static <VD> VertexRDD<VD> apply(RDD<scala.Tuple2<Object,VD>> vertices, EdgeRDD<?> edges, VD defaultVal, scala.reflect.ClassTag<VD> evidence$15)
VertexRDD
from an RDD of vertex-attribute pairs. Duplicate vertex entries are
removed arbitrarily. The resulting VertexRDD
will be joinable with edges
, and any missing
vertices referred to by edges
will be created with the attribute defaultVal
.
vertices
- the collection of vertex-attribute pairsedges
- the EdgeRDD
that these vertices may be joined withdefaultVal
- the vertex attribute to use when creating missing verticespublic static <VD> VertexRDD<VD> apply(RDD<scala.Tuple2<Object,VD>> vertices, EdgeRDD<?> edges, VD defaultVal, scala.Function2<VD,VD,VD> mergeFunc, scala.reflect.ClassTag<VD> evidence$16)
VertexRDD
from an RDD of vertex-attribute pairs. Duplicate vertex entries are
merged using mergeFunc
. The resulting VertexRDD
will be joinable with edges
, and any
missing vertices referred to by edges
will be created with the attribute defaultVal
.
vertices
- the collection of vertex-attribute pairsedges
- the EdgeRDD
that these vertices may be joined withdefaultVal
- the vertex attribute to use when creating missing verticesmergeFunc
- the commutative, associative duplicate vertex attribute merge functionpublic static <VD> VertexRDD<VD> fromEdges(EdgeRDD<?> edges, int numPartitions, VD defaultVal, scala.reflect.ClassTag<VD> evidence$17)
VertexRDD
containing all vertices referred to in edges
. The vertices will be
created with the attribute defaultVal
. The resulting VertexRDD
will be joinable with
edges
.
edges
- the EdgeRDD
referring to the vertices to createnumPartitions
- the desired number of partitions for the resulting VertexRDD
defaultVal
- the vertex attribute to use when creating missing verticespublic static RDD<RoutingTablePartition> createRoutingTables(EdgeRDD<?> edges, Partitioner vertexPartitioner)
public abstract RDD<ShippableVertexPartition<VD>> partitionsRDD()
public scala.collection.Iterator<scala.Tuple2<Object,VD>> compute(Partition part, TaskContext context)
RDD[(VertexId, VD)]
equivalent output.public abstract VertexRDD<VD> reindex()
public abstract <VD2> VertexRDD<VD2> mapVertexPartitions(scala.Function1<ShippableVertexPartition<VD>,ShippableVertexPartition<VD2>> f, scala.reflect.ClassTag<VD2> evidence$1)
VertexPartition
of this RDD and returns a new VertexRDD.public VertexRDD<VD> filter(scala.Function1<scala.Tuple2<Object,VD>,Object> pred)
It is declared and defined here to allow refining the return type from RDD[(VertexId, VD)]
to
VertexRDD[VD]
.
public abstract <VD2> VertexRDD<VD2> mapValues(scala.Function1<VD,VD2> f, scala.reflect.ClassTag<VD2> evidence$2)
f
- the function applied to each value in the RDDf
to each of the entries in the
original VertexRDDpublic abstract <VD2> VertexRDD<VD2> mapValues(scala.Function2<Object,VD,VD2> f, scala.reflect.ClassTag<VD2> evidence$3)
f
- the function applied to each ID-value pair in the RDDf
to each of the entries in the
original VertexRDD. The resulting VertexRDD retains the same index.public abstract VertexRDD<VD> diff(VertexRDD<VD> other)
this
and other
; for vertices that are different,
keeps the values from other
.public abstract <VD2,VD3> VertexRDD<VD3> leftZipJoin(VertexRDD<VD2> other, scala.Function3<Object,VD,scala.Option<VD2>,VD3> f, scala.reflect.ClassTag<VD2> evidence$4, scala.reflect.ClassTag<VD3> evidence$5)
this
.
If other
is missing any vertex in this VertexRDD, f
is passed None
.
other
- the other VertexRDD with which to join.f
- the function mapping a vertex id and its attributes in this and the other vertex set
to a new vertex attribute.f
public abstract <VD2,VD3> VertexRDD<VD3> leftJoin(RDD<scala.Tuple2<Object,VD2>> other, scala.Function3<Object,VD,scala.Option<VD2>,VD3> f, scala.reflect.ClassTag<VD2> evidence$6, scala.reflect.ClassTag<VD3> evidence$7)
leftZipJoin
implementation is
used. The resulting VertexRDD contains an entry for each vertex in this
. If other
is
missing any vertex in this VertexRDD, f
is passed None
. If there are duplicates,
the vertex is picked arbitrarily.
other
- the other VertexRDD with which to joinf
- the function mapping a vertex id and its attributes in this and the other vertex set
to a new vertex attribute.f
.public abstract <U,VD2> VertexRDD<VD2> innerZipJoin(VertexRDD<U> other, scala.Function3<Object,VD,U,VD2> f, scala.reflect.ClassTag<U> evidence$8, scala.reflect.ClassTag<VD2> evidence$9)
innerJoin
for the behavior of the join.public abstract <U,VD2> VertexRDD<VD2> innerJoin(RDD<scala.Tuple2<Object,U>> other, scala.Function3<Object,VD,U,VD2> f, scala.reflect.ClassTag<U> evidence$10, scala.reflect.ClassTag<VD2> evidence$11)
innerZipJoin
implementation
is used.
other
- an RDD containing vertices to join. If there are multiple entries for the same
vertex, one is picked arbitrarily. Use aggregateUsingIndex
to merge multiple entries.f
- the join function applied to corresponding values of this
and other
this
, containing only vertices that appear in both
this
and other
, with values supplied by f
public abstract <VD2> VertexRDD<VD2> aggregateUsingIndex(RDD<scala.Tuple2<Object,VD2>> messages, scala.Function2<VD2,VD2,VD2> reduceFunc, scala.reflect.ClassTag<VD2> evidence$12)
messages
that have the same ids using reduceFunc
, returning a
VertexRDD co-indexed with this
.
messages
- an RDD containing messages to aggregate, where each message is a pair of its
target vertex ID and the message datareduceFunc
- the associative aggregation function for merging messages to the same vertexthis
, containing only vertices that received messages.
For those vertices, their values are the result of applying reduceFunc
to all received
messages.public abstract VertexRDD<VD> reverseRoutingTables()
VertexRDD
reflecting a reversal of all edge directions in the corresponding
EdgeRDD
.public abstract VertexRDD<VD> withEdges(EdgeRDD<?> edges)
public abstract <VD2> VertexRDD<VD2> withPartitionsRDD(RDD<ShippableVertexPartition<VD2>> partitionsRDD, scala.reflect.ClassTag<VD2> evidence$13)
public abstract VertexRDD<VD> withTargetStorageLevel(StorageLevel targetStorageLevel)
This does not actually trigger a cache; to do this, call
RDD.cache()
on the returned VertexRDD.
public abstract RDD<scala.Tuple2<Object,VertexAttributeBlock<VD>>> shipVertexAttributes(boolean shipSrc, boolean shipDst)
public abstract RDD<scala.Tuple2<Object,long[]>> shipVertexIds()