Apache Spark

obracajacy sie tekst
podwojna strzalka w dol
all articles
Apache Spark data pipelines
The technicalities 3 min read

RDD in Apache Spark

Learn how to utilize the RDD API in Apache Spark to check partition details or perform low-level operations. Despite being deprecated, the RDD API is accessible via the .rdd method on Datasets and DataFrames. Discover how to check the number of partitions with the getNumPartitions method and determine partition sizes using the glom function. Explore the remaining useful operations that RDD API offers for low-level hacking and internal Spark tasks.

check out.
Apache Spark data pipelines
The technicalities 5 min read

Datasets and DataFrames

Understanding Spark's .as[T] Method: Best Practices and Defensive Programming

check out.