repartition

all articles

Spark shuffle – Case #3 – using salt in repartition

Why use salt in repartition? In the previous blog entry we saw how a skew in a processed dataset is affecting performance of Spark…

The technicalities 10 min read

Spark shuffle – Case #2 – repartitioning skewed data

In the previous blog entry we reviewed a Spark scenario where calling the partitionBy method resulted in each task creating as many files as you had days…

The technicalities 10 min read

Spark shuffle – Case #1 – partitionBy and repartition

This is the first of a series of articles explaining the idea of how the shuffle operation works in Spark and how to use…