rddfoldByKey0sum
rddfoldByKey0sum is a shorthand description for the common Spark RDD pattern that aggregates values by key by folding with an initial zero value of 0 and summing the values. It uses the foldByKey operation on a pair RDD (K, V) to produce a new RDD of (K, W), where W is typically the numeric result type after folding.
In practice, this pattern applies when the goal is per-key summation across a distributed dataset. The zeroValue
- Scala: rdd.foldByKey(0)((acc, v) => acc + v)
- Scala with Long values: rdd.foldByKey(0L)((acc, v) => acc + v)
- Python: rdd.foldByKey(0, lambda acc, v: acc + v)
The resulting RDD has the type RDD[(K, W)], typically RDD[(K, Int)] or RDD[(K, Long)] depending on the
Limitations include the need for an appropriate associative and commutative function for reliable results; using non-commutative