all articles
The technicalities
10 min read
Spark shuffle – Case #3 – using salt in repartition
Why use salt in repartition? In the previous blog entry we saw how a skew in a processed dataset is affecting performance of Spark…
check out.
The technicalities
10 min read
Spark shuffle – Case #2 – repartitioning skewed data
In the previous blog entry we reviewed a Spark scenario where calling the partitionBy method resulted in each task creating as many files as you had days…
check out.
The technicalities
10 min read
Spark shuffle – Case #1 – partitionBy and repartition
This is the first of a series of articles explaining the idea of how the shuffle operation works in Spark and how to use…
check out.