Package org.apache.spark.graphx
Class VertexRDD<VD>
- Type Parameters:
- VD- the vertex attribute associated with each vertex in the set.
- All Implemented Interfaces:
- Serializable,- org.apache.spark.internal.Logging
- Direct Known Subclasses:
- VertexRDDImpl
Extends 
RDD[(VertexId, VD)] by ensuring that there is only one entry for each vertex and by
 pre-indexing the entries for fast, efficient joins. Two VertexRDDs with the same index can be
 joined efficiently. All operations except reindex() preserve the index. To construct a
 VertexRDD, use the VertexRDD object.
 
 Additionally, stores routing information to enable joining the vertex attributes with an
 EdgeRDD.
 
- See Also:
- Example:
- Construct a VertexRDDfrom a plain RDD:// Construct an initial vertex set val someData: RDD[(VertexId, SomeType)] = loadData(someFile) val vset = VertexRDD(someData) // If there were redundant values in someData we would use a reduceFunc val vset2 = VertexRDD(someData, reduceFunc) // Finally we can use the VertexRDD to index another dataset val otherData: RDD[(VertexId, OtherType)] = loadData(otherFile) val vset3 = vset2.innerJoin(otherData) { (vid, a, b) => b } // Now we can construct very fast joins between the two sets val vset4: VertexRDD[(SomeType, OtherType)] = vset.leftJoin(vset3)
- 
Nested Class SummaryNested classes/interfaces inherited from interface org.apache.spark.internal.Loggingorg.apache.spark.internal.Logging.LogStringContext, org.apache.spark.internal.Logging.SparkShellLoggingFilter
- 
Constructor SummaryConstructorsConstructorDescriptionVertexRDD(SparkContext sc, scala.collection.immutable.Seq<Dependency<?>> deps) 
- 
Method SummaryModifier and TypeMethodDescriptionabstract <VD2> VertexRDD<VD2>aggregateUsingIndex(RDD<scala.Tuple2<Object, VD2>> messages, scala.Function2<VD2, VD2, VD2> reduceFunc, scala.reflect.ClassTag<VD2> evidence$12) Aggregates vertices inmessagesthat have the same ids usingreduceFunc, returning a VertexRDD co-indexed withthis.static <VD> VertexRDD<VD>apply(RDD<scala.Tuple2<Object, VD>> vertices, EdgeRDD<?> edges, VD defaultVal, scala.Function2<VD, VD, VD> mergeFunc, scala.reflect.ClassTag<VD> evidence$16) Constructs aVertexRDDfrom an RDD of vertex-attribute pairs.static <VD> VertexRDD<VD>apply(RDD<scala.Tuple2<Object, VD>> vertices, EdgeRDD<?> edges, VD defaultVal, scala.reflect.ClassTag<VD> evidence$15) Constructs aVertexRDDfrom an RDD of vertex-attribute pairs.static <VD> VertexRDD<VD>Constructs a standaloneVertexRDD(one that is not set up for efficient joins with anEdgeRDD) from an RDD of vertex-attribute pairs.compute(Partition part, TaskContext context) Provides theRDD[(VertexId, VD)]equivalent output.For each vertex present in boththisandother,diffreturns only those vertices with differing values; for values that are different, keeps the values fromother.For each vertex present in boththisandother,diffreturns only those vertices with differing values; for values that are different, keeps the values fromother.Restricts the vertex set to the set of vertices satisfying the given predicate.static <VD> VertexRDD<VD>fromEdges(EdgeRDD<?> edges, int numPartitions, VD defaultVal, scala.reflect.ClassTag<VD> evidence$17) Constructs aVertexRDDcontaining all vertices referred to inedges.abstract <U,VD2> VertexRDD<VD2> innerJoin(RDD<scala.Tuple2<Object, U>> other, scala.Function3<Object, VD, U, VD2> f, scala.reflect.ClassTag<U> evidence$10, scala.reflect.ClassTag<VD2> evidence$11) Inner joins this VertexRDD with an RDD containing vertex attribute pairs.abstract <U,VD2> VertexRDD<VD2> innerZipJoin(VertexRDD<U> other, scala.Function3<Object, VD, U, VD2> f, scala.reflect.ClassTag<U> evidence$8, scala.reflect.ClassTag<VD2> evidence$9) Efficiently inner joins this VertexRDD with another VertexRDD sharing the same index.abstract <VD2,VD3> VertexRDD<VD3> leftJoin(RDD<scala.Tuple2<Object, VD2>> other, scala.Function3<Object, VD, scala.Option<VD2>, VD3> f, scala.reflect.ClassTag<VD2> evidence$6, scala.reflect.ClassTag<VD3> evidence$7) Left joins this VertexRDD with an RDD containing vertex attribute pairs.abstract <VD2,VD3> VertexRDD<VD3> leftZipJoin(VertexRDD<VD2> other, scala.Function3<Object, VD, scala.Option<VD2>, VD3> f, scala.reflect.ClassTag<VD2> evidence$4, scala.reflect.ClassTag<VD3> evidence$5) Left joins this RDD with another VertexRDD with the same index.abstract <VD2> VertexRDD<VD2>Maps each vertex attribute, preserving the index.abstract <VD2> VertexRDD<VD2>Maps each vertex attribute, additionally supplying the vertex ID.For each VertexId present in boththisandother, minus will act as a set difference operation returning only those unique VertexId's present inthis.For each VertexId present in boththisandother, minus will act as a set difference operation returning only those unique VertexId's present inthis.reindex()Construct a new VertexRDD that is indexed by only the visible vertices.Returns a newVertexRDDreflecting a reversal of all edge directions in the correspondingEdgeRDD.Prepares this VertexRDD for efficient joins with the given EdgeRDD.Methods inherited from class org.apache.spark.rdd.RDDaggregate, barrier, cache, cartesian, checkpoint, cleanShuffleDependencies, coalesce, collect, collect, context, count, countApprox, countApproxDistinct, countApproxDistinct, countByValue, countByValueApprox, dependencies, distinct, distinct, doubleRDDToDoubleRDDFunctions, first, flatMap, fold, foreach, foreachPartition, getCheckpointFile, getNumPartitions, getResourceProfile, getStorageLevel, glom, groupBy, groupBy, groupBy, id, intersection, intersection, intersection, isCheckpointed, isEmpty, iterator, keyBy, localCheckpoint, map, mapPartitions, mapPartitionsWithEvaluator, mapPartitionsWithIndex, max, min, name, numericRDDToDoubleRDDFunctions, partitioner, partitions, persist, persist, pipe, pipe, pipe, preferredLocations, randomSplit, rddToAsyncRDDActions, rddToOrderedRDDFunctions, rddToPairRDDFunctions, rddToSequenceFileRDDFunctions, reduce, repartition, sample, saveAsObjectFile, saveAsTextFile, saveAsTextFile, setName, sortBy, sparkContext, subtract, subtract, subtract, take, takeOrdered, takeSample, toDebugString, toJavaRDD, toLocalIterator, top, toString, treeAggregate, treeAggregate, treeReduce, union, unpersist, withResources, zip, zipPartitions, zipPartitions, zipPartitions, zipPartitions, zipPartitions, zipPartitions, zipPartitionsWithEvaluator, zipWithIndex, zipWithUniqueIdMethods inherited from class java.lang.Objectequals, getClass, hashCode, notify, notifyAll, wait, wait, waitMethods inherited from interface org.apache.spark.internal.LogginginitializeForcefully, initializeLogIfNecessary, initializeLogIfNecessary, initializeLogIfNecessary$default$2, isTraceEnabled, log, logBasedOnLevel, logDebug, logDebug, logDebug, logDebug, logError, logError, logError, logError, logInfo, logInfo, logInfo, logInfo, logName, LogStringContext, logTrace, logTrace, logTrace, logTrace, logWarning, logWarning, logWarning, logWarning, org$apache$spark$internal$Logging$$log_, org$apache$spark$internal$Logging$$log__$eq, withLogContext
- 
Constructor Details- 
VertexRDD
 
- 
- 
Method Details- 
applypublic static <VD> VertexRDD<VD> apply(RDD<scala.Tuple2<Object, VD>> vertices, scala.reflect.ClassTag<VD> evidence$14) Constructs a standaloneVertexRDD(one that is not set up for efficient joins with anEdgeRDD) from an RDD of vertex-attribute pairs. Duplicate entries are removed arbitrarily.- Parameters:
- vertices- the collection of vertex-attribute pairs
- evidence$14- (undocumented)
- Returns:
- (undocumented)
 
- 
applypublic static <VD> VertexRDD<VD> apply(RDD<scala.Tuple2<Object, VD>> vertices, EdgeRDD<?> edges, VD defaultVal, scala.reflect.ClassTag<VD> evidence$15) Constructs aVertexRDDfrom an RDD of vertex-attribute pairs. Duplicate vertex entries are removed arbitrarily. The resultingVertexRDDwill be joinable withedges, and any missing vertices referred to byedgeswill be created with the attributedefaultVal.- Parameters:
- vertices- the collection of vertex-attribute pairs
- edges- the- EdgeRDDthat these vertices may be joined with
- defaultVal- the vertex attribute to use when creating missing vertices
- evidence$15- (undocumented)
- Returns:
- (undocumented)
 
- 
applypublic static <VD> VertexRDD<VD> apply(RDD<scala.Tuple2<Object, VD>> vertices, EdgeRDD<?> edges, VD defaultVal, scala.Function2<VD, VD, VD> mergeFunc, scala.reflect.ClassTag<VD> evidence$16) Constructs aVertexRDDfrom an RDD of vertex-attribute pairs. Duplicate vertex entries are merged usingmergeFunc. The resultingVertexRDDwill be joinable withedges, and any missing vertices referred to byedgeswill be created with the attributedefaultVal.- Parameters:
- vertices- the collection of vertex-attribute pairs
- edges- the- EdgeRDDthat these vertices may be joined with
- defaultVal- the vertex attribute to use when creating missing vertices
- mergeFunc- the commutative, associative duplicate vertex attribute merge function
- evidence$16- (undocumented)
- Returns:
- (undocumented)
 
- 
fromEdgespublic static <VD> VertexRDD<VD> fromEdges(EdgeRDD<?> edges, int numPartitions, VD defaultVal, scala.reflect.ClassTag<VD> evidence$17) Constructs aVertexRDDcontaining all vertices referred to inedges. The vertices will be created with the attributedefaultVal. The resultingVertexRDDwill be joinable withedges.- Parameters:
- edges- the- EdgeRDDreferring to the vertices to create
- numPartitions- the desired number of partitions for the resulting- VertexRDD
- defaultVal- the vertex attribute to use when creating missing vertices
- evidence$17- (undocumented)
- Returns:
- (undocumented)
 
- 
computepublic scala.collection.Iterator<scala.Tuple2<Object,VD>> compute(Partition part, TaskContext context) Provides theRDD[(VertexId, VD)]equivalent output.
- 
reindexConstruct a new VertexRDD that is indexed by only the visible vertices. The resulting VertexRDD will be based on a different index and can no longer be quickly joined with this RDD.- Returns:
- (undocumented)
 
- 
filterRestricts the vertex set to the set of vertices satisfying the given predicate. This operation preserves the index for efficient joins with the original RDD, and it sets bits in the bitmask rather than allocating new memory.It is declared and defined here to allow refining the return type from RDD[(VertexId, VD)]toVertexRDD[VD].
- 
mapValuespublic abstract <VD2> VertexRDD<VD2> mapValues(scala.Function1<VD, VD2> f, scala.reflect.ClassTag<VD2> evidence$2) Maps each vertex attribute, preserving the index.- Parameters:
- f- the function applied to each value in the RDD
- evidence$2- (undocumented)
- Returns:
- a new VertexRDD with values obtained by applying fto each of the entries in the original VertexRDD
 
- 
mapValuespublic abstract <VD2> VertexRDD<VD2> mapValues(scala.Function2<Object, VD, VD2> f, scala.reflect.ClassTag<VD2> evidence$3) Maps each vertex attribute, additionally supplying the vertex ID.- Parameters:
- f- the function applied to each ID-value pair in the RDD
- evidence$3- (undocumented)
- Returns:
- a new VertexRDD with values obtained by applying fto each of the entries in the original VertexRDD. The resulting VertexRDD retains the same index.
 
- 
minusFor each VertexId present in boththisandother, minus will act as a set difference operation returning only those unique VertexId's present inthis.- Parameters:
- other- an RDD to run the set operation against
- Returns:
- (undocumented)
 
- 
minusFor each VertexId present in boththisandother, minus will act as a set difference operation returning only those unique VertexId's present inthis.- Parameters:
- other- a VertexRDD to run the set operation against
- Returns:
- (undocumented)
 
- 
diffFor each vertex present in boththisandother,diffreturns only those vertices with differing values; for values that are different, keeps the values fromother. This is only guaranteed to work if the VertexRDDs share a common ancestor.- Parameters:
- other- the other RDD[(VertexId, VD)] with which to diff against.
- Returns:
- (undocumented)
 
- 
diffFor each vertex present in boththisandother,diffreturns only those vertices with differing values; for values that are different, keeps the values fromother. This is only guaranteed to work if the VertexRDDs share a common ancestor.- Parameters:
- other- the other VertexRDD with which to diff against.
- Returns:
- (undocumented)
 
- 
leftZipJoinpublic abstract <VD2,VD3> VertexRDD<VD3> leftZipJoin(VertexRDD<VD2> other, scala.Function3<Object, VD, scala.Option<VD2>, VD3> f, scala.reflect.ClassTag<VD2> evidence$4, scala.reflect.ClassTag<VD3> evidence$5) Left joins this RDD with another VertexRDD with the same index. This function will fail if both VertexRDDs do not share the same index. The resulting vertex set contains an entry for each vertex inthis. Ifotheris missing any vertex in this VertexRDD,fis passedNone.- Parameters:
- other- the other VertexRDD with which to join.
- f- the function mapping a vertex id and its attributes in this and the other vertex set to a new vertex attribute.
- evidence$4- (undocumented)
- evidence$5- (undocumented)
- Returns:
- a VertexRDD containing the results of f
 
- 
leftJoinpublic abstract <VD2,VD3> VertexRDD<VD3> leftJoin(RDD<scala.Tuple2<Object, VD2>> other, scala.Function3<Object, VD, scala.Option<VD2>, VD3> f, scala.reflect.ClassTag<VD2> evidence$6, scala.reflect.ClassTag<VD3> evidence$7) Left joins this VertexRDD with an RDD containing vertex attribute pairs. If the other RDD is backed by a VertexRDD with the same index then the efficient<VD2,VD3>leftZipJoin(org.apache.spark.graphx.VertexRDD<VD2>,scala.Function3<java.lang.Object,VD,scala.Option<VD2>,VD3>,scala.reflect.ClassTag<VD2>,scala.reflect.ClassTag<VD3>)implementation is used. The resulting VertexRDD contains an entry for each vertex inthis. Ifotheris missing any vertex in this VertexRDD,fis passedNone. If there are duplicates, the vertex is picked arbitrarily.- Parameters:
- other- the other VertexRDD with which to join
- f- the function mapping a vertex id and its attributes in this and the other vertex set to a new vertex attribute.
- evidence$6- (undocumented)
- evidence$7- (undocumented)
- Returns:
- a VertexRDD containing all the vertices in this VertexRDD with the attributes emitted
 by f.
 
- 
innerZipJoinpublic abstract <U,VD2> VertexRDD<VD2> innerZipJoin(VertexRDD<U> other, scala.Function3<Object, VD, U, VD2> f, scala.reflect.ClassTag<U> evidence$8, scala.reflect.ClassTag<VD2> evidence$9) Efficiently inner joins this VertexRDD with another VertexRDD sharing the same index. See<U,VD2>innerJoin(org.apache.spark.rdd.RDD<scala.Tuple2<java.lang.Object,U>>,scala.Function3<java.lang.Object,VD,U,VD2>,scala.reflect.ClassTag<U>,scala.reflect.ClassTag<VD2>)for the behavior of the join.- Parameters:
- other- (undocumented)
- f- (undocumented)
- evidence$8- (undocumented)
- evidence$9- (undocumented)
- Returns:
- (undocumented)
 
- 
innerJoinpublic abstract <U,VD2> VertexRDD<VD2> innerJoin(RDD<scala.Tuple2<Object, U>> other, scala.Function3<Object, VD, U, VD2> f, scala.reflect.ClassTag<U> evidence$10, scala.reflect.ClassTag<VD2> evidence$11) Inner joins this VertexRDD with an RDD containing vertex attribute pairs. If the other RDD is backed by a VertexRDD with the same index then the efficient<U,VD2>innerZipJoin(org.apache.spark.graphx.VertexRDD<U>,scala.Function3<java.lang.Object,VD,U,VD2>,scala.reflect.ClassTag<U>,scala.reflect.ClassTag<VD2>)implementation is used.- Parameters:
- other- an RDD containing vertices to join. If there are multiple entries for the same vertex, one is picked arbitrarily. Use- <VD2>aggregateUsingIndex(org.apache.spark.rdd.RDD<scala.Tuple2<java.lang.Object,VD2>>,scala.Function2<VD2,VD2,VD2>,scala.reflect.ClassTag<VD2>)to merge multiple entries.
- f- the join function applied to corresponding values of- thisand- other
- evidence$10- (undocumented)
- evidence$11- (undocumented)
- Returns:
- a VertexRDD co-indexed with this, containing only vertices that appear in boththisandother, with values supplied byf
 
- 
aggregateUsingIndexpublic abstract <VD2> VertexRDD<VD2> aggregateUsingIndex(RDD<scala.Tuple2<Object, VD2>> messages, scala.Function2<VD2, VD2, VD2> reduceFunc, scala.reflect.ClassTag<VD2> evidence$12) Aggregates vertices inmessagesthat have the same ids usingreduceFunc, returning a VertexRDD co-indexed withthis.- Parameters:
- messages- an RDD containing messages to aggregate, where each message is a pair of its target vertex ID and the message data
- reduceFunc- the associative aggregation function for merging messages to the same vertex
- evidence$12- (undocumented)
- Returns:
- a VertexRDD co-indexed with this, containing only vertices that received messages. For those vertices, their values are the result of applyingreduceFuncto all received messages.
 
- 
reverseRoutingTablesReturns a newVertexRDDreflecting a reversal of all edge directions in the correspondingEdgeRDD.- Returns:
- (undocumented)
 
- 
withEdgesPrepares this VertexRDD for efficient joins with the given EdgeRDD.
 
-