Monitoring Airflow jobs with TIG 1: system metrics
Like many server applications, Airflow can – and should – be monitored for metrics and logs. In this article, we will look into the former…
check out.Databricks – Photon
The Databricks platform offers two execution engines for the clients: the standard Apache Spark (available as an open-source application) and one with Photon enhancement that…
check out.Airflow — pools and mutexes.
Although the ideal data pipeline is made of idempotent and independent tasks, there are some cases when setting up a mutex (a.k.a. part of the…
check out.Passing information between DAGs in Airflow.
There are data pipelines where you must pass some values between tasks – not complete datasets, but ~ kilobytes. This can be managed even within…
check out.Managing inter-DAG dependencies in Airflow
In the real world, data pipelines only sometimes come as a completely independent sequence of operations. Usually, they share dependences on one another, occasionally easy…
check out.