shuffle

all articles

Spark shuffle – Case #2 – repartitioning skewed data

In the previous blog entry we reviewed a Spark scenario where calling the partitionBy method resulted in each task creating as many files as you had days…

The technicalities 10 min read

Spark shuffle – Case #1 – partitionBy and repartition

This is the first of a series of articles explaining the idea of how the shuffle operation works in Spark and how to use…