The technicalities

Spark shuffle – Case #2 – repartitioning skewed data

In the previous blog entry we reviewed a Spark scenario where calling the partitionBy method resulted in each task creating as many files as you had days…

