Home

EdgeRDD

EdgeRDD is a distributed dataset used for representing the edges of a graph in GraphX, the graph processing component of Apache Spark. It stores edges as records of type Edge[ED], where each record contains a source vertex ID (srcId), a destination vertex ID (dstId), and an attribute of type ED that describes the edge.

EdgeRDD, together with VertexRDD, forms the core data structures of a GraphX Graph. While VertexRDD holds vertex

Creation and partitioning: EdgeRDD can be constructed from a collection of edges or derived from an existing

Operations: EdgeRDD supports standard RDD-style transformations and graph-optimized operations such as mapEdges, filterEdges, and joins with

Performance and usage: EdgeRDD is designed for scalable graph analytics on large graphs, with caching and selective

See also: GraphX, VertexRDD, EdgeTriplet, Graph.

attributes,
EdgeRDD
holds
the
connection
information
between
vertices.
The
combination
allows
efficient
implementation
of
graph
algorithms
and
neighborhood
queries.
Graph.
It
is
partitioned
across
the
cluster
to
enable
parallel
processing.
Spark’s
RDD
infrastructure
governs
fault
tolerance
and
recomputation
through
lineage.
VertexRDD.
A
common
view
is
the
EdgeTriplet,
produced
by
joining
EdgeRDD
with
VertexRDD,
which
provides
access
to
both
endpoints’
attributes
and
the
edge
attribute
for
algorithmic
computations.
materialization
to
improve
iterative
algorithms.
It
can
be
converted
to
other
graph
representations
or
used
in
conjunction
with
GraphX
APIs
for
subgraphs,
aggregations,
and
motif
finding.