job orchestration

all articles

Databricks – Photon

The Databricks platform offers two execution engines for the clients: the standard Apache Spark (available as an open-source application) and one with Photon enhancement that…

Apache Airflow 6 min read

Airflow — pools and mutexes.

Although the ideal data pipeline is made of idempotent and independent tasks, there are some cases when setting up a mutex (a.k.a. part of the…

Apache Airflow 7 min read

Passing information between DAGs in Airflow.

There are data pipelines where you must pass some values between tasks – not complete datasets, but ~ kilobytes. This can be managed even within…