SPARK

all articles

Unleashing Innovation: A Glimpse into Our Exciting Event Journey

Whether you’ve joined us in the past or are planning to attend our upcoming events, there’s always something exciting on the horizon. Let’s take a…

The technicalities 10 min read

Spark shuffle – Case #3 – using salt in repartition

Why use salt in repartition? In the previous blog entry we saw how a skew in a processed dataset is affecting performance of Spark…

The technicalities 10 min read

Spark shuffle – Case #2 – repartitioning skewed data

In the previous blog entry we reviewed a Spark scenario where calling the partitionBy method resulted in each task creating as many files as you had days…

The technicalities 10 min read

Spark shuffle – Case #1 – partitionBy and repartition

This is the first of a series of articles explaining the idea of how the shuffle operation works in Spark and how to use…