Case study

Introducing Apache Airflow

Transforming Telecom Data Pipelines for Enhanced Efficiency and Scalability
Share:

Challenge

A telecom with mature Big Data and Warehousing divisions wants to improve the orchestration of its data pipelines. The goal is to migrate from in-house projects to a widely adopted and supported toolkit. Projects to migrate are big and already in production. The migration process would have to happen online.

Solutions

TantusData came to demonstrate and deliver the PoC of a migration process to Apache Airflow. This included not only a working demo of the functionality of the tool but also working with sysops and data teams to teach them how to use the Airflow by themselves. By working together with the client and involving them in all steps of the process, TD shows how to carry out the migration and understand all what and why topics that may occur.

Technology & Tools

Client

An international telecommunication company based in Austria.

Opportunity

Using a well-known and supported tool instead of an in-house developed tool can free developers to let them work on other areas instead of reinventing the well-available wheel.
Ability to onboard more people into a standard tool – there is UI and documentation, which allows not only data engineers to understand the pipeline logic.
Hiring and introducing new people to a project using a well-known toolkit is easier and faster.

Delivery

The migration began with working with the infrastructure team to deploy the Apache Airflow tool, following the best practices already in place. TantusData delivered scripts to recreate the deployment and ensured that sysops would be familiar with the tool and understand what and how it is deployed and configured and what security concerns must be addressed.
Having a deployed tool in place, TantusData began working with the data team to migrate a part of the existing pipeline. It is to rewrite a current part of the code and use it as an example of migrating other parts. The data team also had a chance to try using the tool on their own.
The PoC was concluded by showcasing monitoring options for the tool to ease up daily maintenance. TantusData also presented recommended next steps in terms of tips for the development team and the administration and maintenance.

Effect

The client was able to evaluate the Airflow tool, how it fits their platform and what they would need to do to integrate it.
The migration plan has brought up a discussion on reviewing and refactoring the data pipeline. Being able to see a visualisation of all tasks gives everyone a broad view and encourages discussing engineering matters like the manageability of graphs of over 100 nodes.