Introducing Apache Airflow
Technology & Tools
An international telecommunication company based in Austria.
Using a well-known and supported tool instead of an in-house developed tool can free developers to let them work on other areas instead of reinventing the well-available wheel.
Ability to onboard more people into a standard tool – there is UI and documentation, which allows not only data engineers to understand the pipeline logic.
Hiring and introducing new people to a project using a well-known toolkit is easier and faster.
The migration began with working with the infrastructure team to deploy the Apache Airflow tool, following the best practices already in place. TantusData delivered scripts to recreate the deployment and ensured that sysops would be familiar with the tool and understand what and how it is deployed and configured and what security concerns must be addressed.
Having a deployed tool in place, TantusData began working with the data team to migrate a part of the existing pipeline. It is to rewrite a current part of the code and use it as an example of migrating other parts. The data team also had a chance to try using the tool on their own.
The PoC was concluded by showcasing monitoring options for the tool to ease up daily maintenance. TantusData also presented recommended next steps in terms of tips for the development team and the administration and maintenance.
The client was able to evaluate the Airflow tool, how it fits their platform and what they would need to do to integrate it.
The migration plan has brought up a discussion on reviewing and refactoring the data pipeline. Being able to see a visualisation of all tasks gives everyone a broad view and encourages discussing engineering matters like the manageability of graphs of over 100 nodes.