all articles
Apache Airflow
5 min read
Databricks – Photon
The Databricks platform offers two execution engines for the clients: the standard Apache Spark (available as an open-source application) and one with Photon enhancement that…
check out.
Apache Airflow
6 min read
Airflow — pools and mutexes.
Although the ideal data pipeline is made of idempotent and independent tasks, there are some cases when setting up a mutex (a.k.a. part of the…
check out.
Apache Airflow
7 min read
Passing information between DAGs in Airflow.
There are data pipelines where you must pass some values between tasks – not complete datasets, but ~ kilobytes. This can be managed even within…
check out.
Apache Airflow
15 min read
Managing inter-DAG dependencies in Airflow
In the real world, data pipelines only sometimes come as a completely independent sequence of operations. Usually, they share dependences on one another, occasionally easy…
check out.