org.apache.spark.rdd.RDD<scala.Tuple2<Object,VD>>

org.apache.spark.graphx.VertexRDD<VD>

Type Parameters:: VD - the vertex attribute associated with each vertex in the set.

All Implemented Interfaces:: Serializable, org.apache.spark.internal.Logging

Direct Known Subclasses:: VertexRDDImpl

public abstract class VertexRDD<VD> extends RDD<scala.Tuple2<Object,VD>>

Extends RDD[(VertexId, VD)] by ensuring that there is only one entry for each vertex and by pre-indexing the entries for fast, efficient joins. Two VertexRDDs with the same index can be joined efficiently. All operations except reindex() preserve the index. To construct a VertexRDD, use the VertexRDD object.

Additionally, stores routing information to enable joining the vertex attributes with an EdgeRDD.

See Also:

Serialized Form

Example:

Construct a VertexRDD from a plain RDD:


 // Construct an initial vertex set
 val someData: RDD[(VertexId, SomeType)] = loadData(someFile)
 val vset = VertexRDD(someData)
 // If there were redundant values in someData we would use a reduceFunc
 val vset2 = VertexRDD(someData, reduceFunc)
 // Finally we can use the VertexRDD to index another dataset
 val otherData: RDD[(VertexId, OtherType)] = loadData(otherFile)
 val vset3 = vset2.innerJoin(otherData) { (vid, a, b) => b }
 // Now we can construct very fast joins between the two sets
 val vset4: VertexRDD[(SomeType, OtherType)] = vset.leftJoin(vset3)

Nested Class Summary

Nested classes/interfaces inherited from interface org.apache.spark.internal.Logging
org.apache.spark.internal.Logging.LogStringContext, org.apache.spark.internal.Logging.SparkShellLoggingFilter
Constructor Summary

Constructors

Constructor

Description

VertexRDD(SparkContext sc, scala.collection.immutable.Seq<Dependency<?>> deps)
Method Summary

Modifier and Type

Method

Description

abstract <VD2> VertexRDD<VD2>

aggregateUsingIndex(RDD<scala.Tuple2<Object,VD2>> messages, scala.Function2<VD2,VD2,VD2> reduceFunc, scala.reflect.ClassTag<VD2> evidence$12)

Aggregates vertices in messages that have the same ids using reduceFunc, returning a VertexRDD co-indexed with this.

static <VD> VertexRDD<VD>

apply(RDD<scala.Tuple2<Object,VD>> vertices, EdgeRDD<?> edges, VD defaultVal, scala.Function2<VD,VD,VD> mergeFunc, scala.reflect.ClassTag<VD> evidence$16)

Constructs a VertexRDD from an RDD of vertex-attribute pairs.

static <VD> VertexRDD<VD>

apply(RDD<scala.Tuple2<Object,VD>> vertices, EdgeRDD<?> edges, VD defaultVal, scala.reflect.ClassTag<VD> evidence$15)

Constructs a VertexRDD from an RDD of vertex-attribute pairs.

static <VD> VertexRDD<VD>

apply(RDD<scala.Tuple2<Object,VD>> vertices, scala.reflect.ClassTag<VD> evidence$14)

Constructs a standalone VertexRDD (one that is not set up for efficient joins with an EdgeRDD) from an RDD of vertex-attribute pairs.

scala.collection.Iterator<scala.Tuple2<Object,VD>>

compute(Partition part, TaskContext context)

Provides the RDD[(VertexId, VD)] equivalent output.

abstract VertexRDD<VD>

diff(VertexRDD<VD> other)

For each vertex present in both this and other, diff returns only those vertices with differing values; for values that are different, keeps the values from other.

abstract VertexRDD<VD>

diff(RDD<scala.Tuple2<Object,VD>> other)

For each vertex present in both this and other, diff returns only those vertices with differing values; for values that are different, keeps the values from other.

VertexRDD<VD>

filter(scala.Function1<scala.Tuple2<Object,VD>,Object> pred)

Restricts the vertex set to the set of vertices satisfying the given predicate.

static <VD> VertexRDD<VD>

fromEdges(EdgeRDD<?> edges, int numPartitions, VD defaultVal, scala.reflect.ClassTag<VD> evidence$17)

Constructs a VertexRDD containing all vertices referred to in edges.

abstract <U, VD2> VertexRDD<VD2>

innerJoin(RDD<scala.Tuple2<Object,U>> other, scala.Function3<Object,VD,U,VD2> f, scala.reflect.ClassTag evidence$10, scala.reflect.ClassTag<VD2> evidence$11)

Inner joins this VertexRDD with an RDD containing vertex attribute pairs.

abstract <U, VD2> VertexRDD<VD2>

innerZipJoin(VertexRDD other, scala.Function3<Object,VD,U,VD2> f, scala.reflect.ClassTag evidence$8, scala.reflect.ClassTag<VD2> evidence$9)

Efficiently inner joins this VertexRDD with another VertexRDD sharing the same index.

abstract <VD2, VD3> VertexRDD<VD3>

leftJoin(RDD<scala.Tuple2<Object,VD2>> other, scala.Function3<Object,VD,scala.Option<VD2>,VD3> f, scala.reflect.ClassTag<VD2> evidence$6, scala.reflect.ClassTag<VD3> evidence$7)

Left joins this VertexRDD with an RDD containing vertex attribute pairs.

abstract <VD2, VD3> VertexRDD<VD3>

leftZipJoin(VertexRDD<VD2> other, scala.Function3<Object,VD,scala.Option<VD2>,VD3> f, scala.reflect.ClassTag<VD2> evidence$4, scala.reflect.ClassTag<VD3> evidence$5)

Left joins this RDD with another VertexRDD with the same index.

abstract <VD2> VertexRDD<VD2>

mapValues(scala.Function1<VD,VD2> f, scala.reflect.ClassTag<VD2> evidence$2)

Maps each vertex attribute, preserving the index.

abstract <VD2> VertexRDD<VD2>

mapValues(scala.Function2<Object,VD,VD2> f, scala.reflect.ClassTag<VD2> evidence$3)

Maps each vertex attribute, additionally supplying the vertex ID.

abstract VertexRDD<VD>

minus(VertexRDD<VD> other)

For each VertexId present in both this and other, minus will act as a set difference operation returning only those unique VertexId's present in this.

abstract VertexRDD<VD>

minus(RDD<scala.Tuple2<Object,VD>> other)

For each VertexId present in both this and other, minus will act as a set difference operation returning only those unique VertexId's present in this.

abstract VertexRDD<VD>

reindex()

Construct a new VertexRDD that is indexed by only the visible vertices.

abstract VertexRDD<VD>

reverseRoutingTables()

Returns a new VertexRDD reflecting a reversal of all edge directions in the corresponding EdgeRDD.

abstract VertexRDD<VD>

withEdges(EdgeRDD<?> edges)

Prepares this VertexRDD for efficient joins with the given EdgeRDD.

Methods inherited from class org.apache.spark.rdd.RDD
aggregate, barrier, cache, cartesian, checkpoint, cleanShuffleDependencies, coalesce, collect, collect, context, count, countApprox, countApproxDistinct, countApproxDistinct, countByValue, countByValueApprox, dependencies, distinct, distinct, doubleRDDToDoubleRDDFunctions, first, flatMap, fold, foreach, foreachPartition, getCheckpointFile, getNumPartitions, getResourceProfile, getStorageLevel, glom, groupBy, groupBy, groupBy, id, intersection, intersection, intersection, isCheckpointed, isEmpty, iterator, keyBy, localCheckpoint, map, mapPartitions, mapPartitionsWithEvaluator, mapPartitionsWithIndex, max, min, name, numericRDDToDoubleRDDFunctions, partitioner, partitions, persist, persist, pipe, pipe, pipe, preferredLocations, randomSplit, rddToAsyncRDDActions, rddToOrderedRDDFunctions, rddToPairRDDFunctions, rddToSequenceFileRDDFunctions, reduce, repartition, sample, saveAsObjectFile, saveAsTextFile, saveAsTextFile, setName, sortBy, sparkContext, subtract, subtract, subtract, take, takeOrdered, takeSample, toDebugString, toJavaRDD, toLocalIterator, top, toString, treeAggregate, treeAggregate, treeReduce, union, unpersist, withResources, zip, zipPartitions, zipPartitions, zipPartitions, zipPartitions, zipPartitions, zipPartitions, zipPartitionsWithEvaluator, zipWithIndex, zipWithUniqueId

Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait

Methods inherited from interface org.apache.spark.internal.Logging
initializeForcefully, initializeLogIfNecessary, initializeLogIfNecessary, initializeLogIfNecessary$default$2, isTraceEnabled, log, logBasedOnLevel, logDebug, logDebug, logDebug, logDebug, logError, logError, logError, logError, logInfo, logInfo, logInfo, logInfo, logName, LogStringContext, logTrace, logTrace, logTrace, logTrace, logWarning, logWarning, logWarning, logWarning, org$apache$spark$internal$Logging$$log_, org$apache$spark$internal$Logging$$log__$eq, withLogContext

Constructor Details
- VertexRDD
 
 public VertexRDD(SparkContext sc, scala.collection.immutable.Seq<Dependency<?>> deps)
Method Details
- apply
 
 public static <VD> VertexRDD<VD> apply(RDD<scala.Tuple2<Object,VD>> vertices, scala.reflect.ClassTag<VD> evidence$14)
 
 Constructs a standalone VertexRDD (one that is not set up for efficient joins with an EdgeRDD) from an RDD of vertex-attribute pairs. Duplicate entries are removed arbitrarily.
 
 Parameters:
 
 vertices - the collection of vertex-attribute pairs
 
 evidence$14 - (undocumented)
 
 Returns:
 
 (undocumented)
- apply
 
 public static <VD> VertexRDD<VD> apply(RDD<scala.Tuple2<Object,VD>> vertices, EdgeRDD<?> edges, VD defaultVal, scala.reflect.ClassTag<VD> evidence$15)
 
 Constructs a VertexRDD from an RDD of vertex-attribute pairs. Duplicate vertex entries are removed arbitrarily. The resulting VertexRDD will be joinable with edges, and any missing vertices referred to by edges will be created with the attribute defaultVal.
 
 Parameters:
 
 vertices - the collection of vertex-attribute pairs
 
 edges - the EdgeRDD that these vertices may be joined with
 
 defaultVal - the vertex attribute to use when creating missing vertices
 
 evidence$15 - (undocumented)
 
 Returns:
 
 (undocumented)
- apply
 
 public static <VD> VertexRDD<VD> apply(RDD<scala.Tuple2<Object,VD>> vertices, EdgeRDD<?> edges, VD defaultVal, scala.Function2<VD,VD,VD> mergeFunc, scala.reflect.ClassTag<VD> evidence$16)
 
 Constructs a VertexRDD from an RDD of vertex-attribute pairs. Duplicate vertex entries are merged using mergeFunc. The resulting VertexRDD will be joinable with edges, and any missing vertices referred to by edges will be created with the attribute defaultVal.
 
 Parameters:
 
 vertices - the collection of vertex-attribute pairs
 
 edges - the EdgeRDD that these vertices may be joined with
 
 defaultVal - the vertex attribute to use when creating missing vertices
 
 mergeFunc - the commutative, associative duplicate vertex attribute merge function
 
 evidence$16 - (undocumented)
 
 Returns:
 
 (undocumented)
- fromEdges
 
 public static <VD> VertexRDD<VD> fromEdges(EdgeRDD<?> edges, int numPartitions, VD defaultVal, scala.reflect.ClassTag<VD> evidence$17)
 
 Constructs a VertexRDD containing all vertices referred to in edges. The vertices will be created with the attribute defaultVal. The resulting VertexRDD will be joinable with edges.
 
 Parameters:
 
 edges - the EdgeRDD referring to the vertices to create
 
 numPartitions - the desired number of partitions for the resulting VertexRDD
 
 defaultVal - the vertex attribute to use when creating missing vertices
 
 evidence$17 - (undocumented)
 
 Returns:
 
 (undocumented)
- compute
 
 public scala.collection.Iterator<scala.Tuple2<Object,VD>> compute(Partition part, TaskContext context)
 
 Provides the RDD[(VertexId, VD)] equivalent output.
 
 Specified by:
 
 compute in class RDD<scala.Tuple2<Object,VD>>
 
 Parameters:
 
 part - (undocumented)
 
 context - (undocumented)
 
 Returns:
 
 (undocumented)
- reindex
 
 public abstract VertexRDD<VD> reindex()
 
 Construct a new VertexRDD that is indexed by only the visible vertices. The resulting VertexRDD will be based on a different index and can no longer be quickly joined with this RDD.
 
 Returns:
 
 (undocumented)
- filter
 
 public VertexRDD<VD> filter(scala.Function1<scala.Tuple2<Object,VD>,Object> pred)
 
 Restricts the vertex set to the set of vertices satisfying the given predicate. This operation preserves the index for efficient joins with the original RDD, and it sets bits in the bitmask rather than allocating new memory.
 It is declared and defined here to allow refining the return type from RDD[(VertexId, VD)] to VertexRDD[VD].
 
 Overrides:
 
 filter in class RDD<scala.Tuple2<Object,VD>>
 
 Parameters:
 
 pred - the user defined predicate, which takes a tuple to conform to the RDD[(VertexId, VD)] interface
 
 Returns:
 
 (undocumented)
- mapValues
 
 public abstract <VD2> VertexRDD<VD2> mapValues(scala.Function1<VD,VD2> f, scala.reflect.ClassTag<VD2> evidence$2)
 
 Maps each vertex attribute, preserving the index.
 
 Parameters:
 
 f - the function applied to each value in the RDD
 
 evidence$2 - (undocumented)
 
 Returns:
 
 a new VertexRDD with values obtained by applying f to each of the entries in the original VertexRDD
- mapValues
 
 public abstract <VD2> VertexRDD<VD2> mapValues(scala.Function2<Object,VD,VD2> f, scala.reflect.ClassTag<VD2> evidence$3)
 
 Maps each vertex attribute, additionally supplying the vertex ID.
 
 Parameters:
 
 f - the function applied to each ID-value pair in the RDD
 
 evidence$3 - (undocumented)
 
 Returns:
 
 a new VertexRDD with values obtained by applying f to each of the entries in the original VertexRDD. The resulting VertexRDD retains the same index.
- minus
 
 public abstract VertexRDD<VD> minus(RDD<scala.Tuple2<Object,VD>> other)
 
 For each VertexId present in both this and other, minus will act as a set difference operation returning only those unique VertexId's present in this.
 
 Parameters:
 
 other - an RDD to run the set operation against
 
 Returns:
 
 (undocumented)
- minus
 
 public abstract VertexRDD<VD> minus(VertexRDD<VD> other)
 
 For each VertexId present in both this and other, minus will act as a set difference operation returning only those unique VertexId's present in this.
 
 Parameters:
 
 other - a VertexRDD to run the set operation against
 
 Returns:
 
 (undocumented)
- diff
 
 public abstract VertexRDD<VD> diff(RDD<scala.Tuple2<Object,VD>> other)
 
 For each vertex present in both this and other, diff returns only those vertices with differing values; for values that are different, keeps the values from other. This is only guaranteed to work if the VertexRDDs share a common ancestor.
 
 Parameters:
 
 other - the other RDD[(VertexId, VD)] with which to diff against.
 
 Returns:
 
 (undocumented)
- diff
 
 public abstract VertexRDD<VD> diff(VertexRDD<VD> other)
 
 For each vertex present in both this and other, diff returns only those vertices with differing values; for values that are different, keeps the values from other. This is only guaranteed to work if the VertexRDDs share a common ancestor.
 
 Parameters:
 
 other - the other VertexRDD with which to diff against.
 
 Returns:
 
 (undocumented)
- leftZipJoin
 
 public abstract <VD2, VD3> VertexRDD<VD3> leftZipJoin(VertexRDD<VD2> other, scala.Function3<Object,VD,scala.Option<VD2>,VD3> f, scala.reflect.ClassTag<VD2> evidence$4, scala.reflect.ClassTag<VD3> evidence$5)
 
 Left joins this RDD with another VertexRDD with the same index. This function will fail if both VertexRDDs do not share the same index. The resulting vertex set contains an entry for each vertex in this. If other is missing any vertex in this VertexRDD, f is passed None.
 
 Parameters:
 
 other - the other VertexRDD with which to join.
 
 f - the function mapping a vertex id and its attributes in this and the other vertex set to a new vertex attribute.
 
 evidence$4 - (undocumented)
 
 evidence$5 - (undocumented)
 
 Returns:
 
 a VertexRDD containing the results of f
- leftJoin
 
 public abstract <VD2, VD3> VertexRDD<VD3> leftJoin(RDD<scala.Tuple2<Object,VD2>> other, scala.Function3<Object,VD,scala.Option<VD2>,VD3> f, scala.reflect.ClassTag<VD2> evidence$6, scala.reflect.ClassTag<VD3> evidence$7)
 
 Left joins this VertexRDD with an RDD containing vertex attribute pairs. If the other RDD is backed by a VertexRDD with the same index then the efficient <VD2,VD3>leftZipJoin(org.apache.spark.graphx.VertexRDD<VD2>,scala.Function3<java.lang.Object,VD,scala.Option<VD2>,VD3>,scala.reflect.ClassTag<VD2>,scala.reflect.ClassTag<VD3>) implementation is used. The resulting VertexRDD contains an entry for each vertex in this. If other is missing any vertex in this VertexRDD, f is passed None. If there are duplicates, the vertex is picked arbitrarily.
 
 Parameters:
 
 other - the other VertexRDD with which to join
 
 f - the function mapping a vertex id and its attributes in this and the other vertex set to a new vertex attribute.
 
 evidence$6 - (undocumented)
 
 evidence$7 - (undocumented)
 
 Returns:
 
 a VertexRDD containing all the vertices in this VertexRDD with the attributes emitted by f.
- innerZipJoin
 
 public abstract <U, VD2> VertexRDD<VD2> innerZipJoin(VertexRDD other, scala.Function3<Object,VD,U,VD2> f, scala.reflect.ClassTag evidence$8, scala.reflect.ClassTag<VD2> evidence$9)
 
 Efficiently inner joins this VertexRDD with another VertexRDD sharing the same index. See <U,VD2>innerJoin(org.apache.spark.rdd.RDD<scala.Tuple2<java.lang.Object,U>>,scala.Function3<java.lang.Object,VD,U,VD2>,scala.reflect.ClassTag,scala.reflect.ClassTag<VD2>) for the behavior of the join.
 
 Parameters:
 
 other - (undocumented)
 
 f - (undocumented)
 
 evidence$8 - (undocumented)
 
 evidence$9 - (undocumented)
 
 Returns:
 
 (undocumented)
- innerJoin
 
 public abstract <U, VD2> VertexRDD<VD2> innerJoin(RDD<scala.Tuple2<Object,U>> other, scala.Function3<Object,VD,U,VD2> f, scala.reflect.ClassTag evidence$10, scala.reflect.ClassTag<VD2> evidence$11)
 
 Inner joins this VertexRDD with an RDD containing vertex attribute pairs. If the other RDD is backed by a VertexRDD with the same index then the efficient <U,VD2>innerZipJoin(org.apache.spark.graphx.VertexRDD,scala.Function3<java.lang.Object,VD,U,VD2>,scala.reflect.ClassTag,scala.reflect.ClassTag<VD2>) implementation is used.
 
 Parameters:
 
 other - an RDD containing vertices to join. If there are multiple entries for the same vertex, one is picked arbitrarily. Use <VD2>aggregateUsingIndex(org.apache.spark.rdd.RDD<scala.Tuple2<java.lang.Object,VD2>>,scala.Function2<VD2,VD2,VD2>,scala.reflect.ClassTag<VD2>) to merge multiple entries.
 
 f - the join function applied to corresponding values of this and other
 
 evidence$10 - (undocumented)
 
 evidence$11 - (undocumented)
 
 Returns:
 
 a VertexRDD co-indexed with this, containing only vertices that appear in both this and other, with values supplied by f
- aggregateUsingIndex
 
 public abstract <VD2> VertexRDD<VD2> aggregateUsingIndex(RDD<scala.Tuple2<Object,VD2>> messages, scala.Function2<VD2,VD2,VD2> reduceFunc, scala.reflect.ClassTag<VD2> evidence$12)
 
 Aggregates vertices in messages that have the same ids using reduceFunc, returning a VertexRDD co-indexed with this.
 
 Parameters:
 
 messages - an RDD containing messages to aggregate, where each message is a pair of its target vertex ID and the message data
 
 reduceFunc - the associative aggregation function for merging messages to the same vertex
 
 evidence$12 - (undocumented)
 
 Returns:
 
 a VertexRDD co-indexed with this, containing only vertices that received messages. For those vertices, their values are the result of applying reduceFunc to all received messages.
- reverseRoutingTables
 
 public abstract VertexRDD<VD> reverseRoutingTables()
 
 Returns a new VertexRDD reflecting a reversal of all edge directions in the corresponding EdgeRDD.
 
 Returns:
 
 (undocumented)
- withEdges
 
 public abstract VertexRDD<VD> withEdges(EdgeRDD<?> edges)
 
 Prepares this VertexRDD for efficient joins with the given EdgeRDD.

Class VertexRDD<VD>

Nested Class Summary

Nested classes/interfaces inherited from interface org.apache.spark.internal.Logging

Constructor Summary

Method Summary

Methods inherited from class org.apache.spark.rdd.RDD

Methods inherited from class java.lang.Object

Methods inherited from interface org.apache.spark.internal.Logging

Constructor Details

VertexRDD

Method Details

apply

apply

apply

fromEdges

compute

reindex

filter

mapValues

mapValues

minus

minus

diff

diff

leftZipJoin

leftJoin

innerZipJoin

innerJoin

aggregateUsingIndex

reverseRoutingTables

withEdges