<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Apache Airflow Archives - TantusData</title>
	<atom:link href="https://tantusdata.com/insights-categories/apache-airflow/feed/" rel="self" type="application/rss+xml" />
	<link>https://tantusdata.com/insights-categories/apache-airflow/</link>
	<description>That uncovers wisdom.</description>
	<lastBuildDate>Thu, 30 Jan 2025 13:43:39 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.7.1</generator>

<image>
	<url>https://tantusdata.com/app/uploads/2023/01/cropped-Favicon-32x32.png</url>
	<title>Apache Airflow Archives - TantusData</title>
	<link>https://tantusdata.com/insights-categories/apache-airflow/</link>
	<width>32</width>
	<height>32</height>
</image> 
	<item>
		<title>Monitoring Airflow jobs with TIG 1: system metrics</title>
		<link>https://tantusdata.com/insights/monitoring-airflow-jobs-with-tig-1-system-metrics/</link>
		
		<dc:creator><![CDATA[Amadeusz Kosik]]></dc:creator>
		<pubDate>Tue, 16 Apr 2024 10:25:34 +0000</pubDate>
				<category><![CDATA[Apache Airflow]]></category>
		<category><![CDATA[data pipelines]]></category>
		<category><![CDATA[devops]]></category>
		<category><![CDATA[monitoring]]></category>
		<category><![CDATA[sysops]]></category>
		<guid isPermaLink="false">https://tantusdata.com/?post_type=insights&#038;p=1907</guid>

					<description><![CDATA[<p>Like many server applications, Airflow can – and should – be monitored for metrics and logs. In this article, we will look into the former and the integration with the TIG stack. The goal is to pull example metrics into a time series database and visualize it in a web application. This article will focus [&#8230;]</p>
<p>The post <a href="https://tantusdata.com/insights/monitoring-airflow-jobs-with-tig-1-system-metrics/">Monitoring Airflow jobs with TIG 1: system metrics</a> appeared first on <a href="https://tantusdata.com">TantusData</a>.</p>
]]></description>
										<content:encoded><![CDATA[
<figure class="wp-block-image size-large"><img fetchpriority="high" decoding="async" width="1024" height="585" src="https://tantusdata.com/app/uploads/2024/04/Airflow-jobs-with-the-TIG-stack_1-1024x585.jpg" alt="" class="wp-image-2176" srcset="https://tantusdata.com/app/uploads/2024/04/Airflow-jobs-with-the-TIG-stack_1-1024x585.jpg 1024w, https://tantusdata.com/app/uploads/2024/04/Airflow-jobs-with-the-TIG-stack_1-300x171.jpg 300w, https://tantusdata.com/app/uploads/2024/04/Airflow-jobs-with-the-TIG-stack_1-768x439.jpg 768w, https://tantusdata.com/app/uploads/2024/04/Airflow-jobs-with-the-TIG-stack_1-1536x878.jpg 1536w, https://tantusdata.com/app/uploads/2024/04/Airflow-jobs-with-the-TIG-stack_1.jpg 1792w" sizes="(max-width: 1024px) 100vw, 1024px" /></figure>



<p>Like many server applications, Airflow can – and should – be monitored for metrics and logs. In this article, we will look into the former and the integration with the TIG stack. The goal is to pull example metrics into a time series database and visualize it in a web application. This article will focus on AF system metrics, like its performance, load or health. We will cover the topic of reporting the <em>data </em>metrics into the same database in an upcoming article, &#8220;TITLE&#8221;. So, stay tuned.&nbsp;</p>



<h2 class="wp-block-heading">Why?</h2>



<p>The Airflow itself finally got a <em>Cluster status</em> dashboard built-in. Therefore, it is valid to question the need for a separate dashboard that requires additional effort and maintenance. Why bother, then? There are several possible reasons to introduce a separate monitoring stack:</p>



<ul class="wp-block-list">
<li>Aggregated view: looking into one app is easier than browsing through 10 of them. TIG (or any other monitoring stack) can be used for monitoring multiple instances and apps and is easily integrated with custom ones.</li>
</ul>



<ul class="wp-block-list">
<li>Security: there is no need to give access to the AF itself (or any other application) to be able to see the metrics and pinpoint errors. It is aligned with the data mesh and any other data democratization approach.</li>



<li>Support friendly: your 1st line support has a starting point to check the status of the data processing and be able to call the right people in case of a problem.</li>
</ul>



<h2 class="wp-block-heading">The toolkit</h2>



<p>Out of the box (but with the right pip packages) Airflow supports sending its internal metrics to a statsd server. We can leverage that and set up such a server via <em>Telegraf</em> to proxy the metrics further into a time series database: InfluxDB. As a convenient UI, Grafana can be used.</p>



<figure class="wp-block-image size-full"><img decoding="async" width="1001" height="446" src="https://tantusdata.com/app/uploads/2024/02/example-architecture.png" alt="" class="wp-image-1910" srcset="https://tantusdata.com/app/uploads/2024/02/example-architecture.png 1001w, https://tantusdata.com/app/uploads/2024/02/example-architecture-300x134.png 300w, https://tantusdata.com/app/uploads/2024/02/example-architecture-768x342.png 768w" sizes="(max-width: 1001px) 100vw, 1001px" /></figure>



<h2 class="wp-block-heading">TIG stack configuration</h2>



<p>Firstly, let’s configure the TIG stack to accept the metrics:</p>



<ol class="wp-block-list">
<li>Install an InfluxDB instance. Nothing fancy here.</li>



<li>Install the Telegraf. In the configuration, look for the <em>statsd</em> input (which has to be enabled) and <em>InfluxDB </em>output. The latter will require at least authentication. Note that the Telegraf supports many more inputs and allows monitoring of, e.g. system resources (CPU, memory, disk space, etc.). This is a good idea to configure in a production environment.</li>



<li>Install Grafana and configure the <em>data source</em> to point it to the InfluxDB.</li>
</ol>



<p></p>



<p>We have prepared a docker compose-based demo in the GitHub repository with a pre-configured environment. You can see there the example configuration for Telegraf (<em>telegraf.conf</em>), InfluxDB (<em>influxdb.env</em>) and Grafana (<em>grafana-provisioning/datasources</em>). Please note that this is only for dev/demo purposes, and real production environments need to be set up more securely (including the use of HTTPS and secure password handling).</p>



<h2 class="wp-block-heading">Airflow</h2>



<p>On the Airflow side, there are two significant points to be addressed.</p>



<ol class="wp-block-list">
<li>Airflow requires the <em>Apache-airflow [statsd]</em> package to have the <em>statsd </em>client available – you can do it via pip.</li>



<li>In the airflow configuration file (usually: <em>Airflow.cfg</em>), the <em>[metrics]</em> section must be configured to enable the <em>statsd</em>, point it to the Telegraf instance and prefix the metrics (very useful in case of multiple Airflow instances).</li>
</ol>



<p>There are several metrics available to be sent in Airflow. You can see the complete list in the Airflow documentation <a href="https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/logging-monitoring/metrics.html">here</a>. There, you can also limit which ones are actually reported to the Telegraf. Be aware that even though most of the metrics are said to be reported in seconds, you need to validate them yourself (<a href="https://github.com/apache/airflow/issues/20804">as in this example</a>).</p>



<h2 class="wp-block-heading">Dashboards</h2>



<p>The last step is to configure Grafana and set up some dashboards. The UI offers a WYSIWYG editor that you can use to tailor it to <em>your</em> needs. The example available on the GitHub might serve as a starting point, as it shows:</p>



<ul class="wp-block-list">
<li>state of the Airflow executors (queues and tasks being run at the moment) to see whether any processing is going on,</li>



<li>state of the pools (default and the custom ones) to check potential bottlenecks if you use multiple pools,</li>



<li>task times to identify unexpected stragglers (and compare instances’ run times and find any performance challenges early on).</li>
</ul>



<figure class="wp-block-image size-large"><img decoding="async" width="1024" height="532" src="https://tantusdata.com/app/uploads/2024/02/grafana-dashboard-1024x532.png" alt="" class="wp-image-1912" srcset="https://tantusdata.com/app/uploads/2024/02/grafana-dashboard-1024x532.png 1024w, https://tantusdata.com/app/uploads/2024/02/grafana-dashboard-300x156.png 300w, https://tantusdata.com/app/uploads/2024/02/grafana-dashboard-768x399.png 768w, https://tantusdata.com/app/uploads/2024/02/grafana-dashboard-1536x798.png 1536w, https://tantusdata.com/app/uploads/2024/02/grafana-dashboard.png 1999w" sizes="(max-width: 1024px) 100vw, 1024px" /><figcaption class="wp-element-caption">An example dashboard, available on our GitHub. It contains metrics useful to monitor the load on the Airflow instance and identify potential overload or bottlenecks in the data pipeline processing.</figcaption></figure>



<h2 class="wp-block-heading">Summary</h2>



<p>In conclusion, for Airflow monitoring, you can use specialized metrics tools like TIG stack and aggregate all the metrics from multiple AF instances. The stack can accommodate AF system metrics and data from other applications, including your custom ones. An example of sending data quality metrics is what we will look into in the second part. There is a demo environment on our GitHub to see whether this works for you.</p>
<p>The post <a href="https://tantusdata.com/insights/monitoring-airflow-jobs-with-tig-1-system-metrics/">Monitoring Airflow jobs with TIG 1: system metrics</a> appeared first on <a href="https://tantusdata.com">TantusData</a>.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Databricks &#8211; Photon</title>
		<link>https://tantusdata.com/insights/databricks-photon/</link>
		
		<dc:creator><![CDATA[Amadeusz Kosik]]></dc:creator>
		<pubDate>Tue, 02 Apr 2024 12:52:56 +0000</pubDate>
				<category><![CDATA[Apache Airflow]]></category>
		<category><![CDATA[DAG dependencies]]></category>
		<category><![CDATA[data pipelines]]></category>
		<category><![CDATA[job orchestration]]></category>
		<guid isPermaLink="false">https://tantusdata.com/?post_type=insights&#038;p=1903</guid>

					<description><![CDATA[<p>The Databricks platform offers two execution engines for the clients: the standard Apache Spark (available as an open-source application) and one with Photon enhancement that brings a performance improvement (as well as extra pricing). Have you ever wondered where this speedup comes from and how it affects designing Apache Spark jobs? This article is based [&#8230;]</p>
<p>The post <a href="https://tantusdata.com/insights/databricks-photon/">Databricks &#8211; Photon</a> appeared first on <a href="https://tantusdata.com">TantusData</a>.</p>
]]></description>
										<content:encoded><![CDATA[
<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="585" src="https://tantusdata.com/app/uploads/2024/04/improving_performance_1-1024x585.jpg" alt="" class="wp-image-2178" srcset="https://tantusdata.com/app/uploads/2024/04/improving_performance_1-1024x585.jpg 1024w, https://tantusdata.com/app/uploads/2024/04/improving_performance_1-300x171.jpg 300w, https://tantusdata.com/app/uploads/2024/04/improving_performance_1-768x439.jpg 768w, https://tantusdata.com/app/uploads/2024/04/improving_performance_1-1536x878.jpg 1536w, https://tantusdata.com/app/uploads/2024/04/improving_performance_1.jpg 1792w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>The Databricks platform offers two execution engines for the clients: the standard Apache Spark (available as an open-source application) and one with Photon enhancement that brings a performance improvement (as well as extra pricing). Have you ever wondered where this speedup comes from and how it affects designing Apache Spark jobs?</p>



<p>This article is based on Berlkey’s paper on Photon by Databricks people, as that publication is the closest to the source, as the Photon engine is not an open-source project, and its source code is not available to the public.</p>



<h2 class="wp-block-heading">The general idea</h2>



<p>In a nutshell, Photon replaces the standard (bundled) query engine available in Apache Spark, using the same API. For some operations, mostly CPU-heavy ones, the Catalyst optimizer may decide to send the job to Photon instead of using the default execution path, all for performance reasons. In other words, it is an alternative way to execute Spark DAG tasks, not skipping/reordering/reorganizing the data engine.</p>



<h2 class="wp-block-heading">SIMD</h2>



<p>The SIMD stands for <em>Single Instruction, Multiple Data</em> and is one of the ground ideas for job optimisations in Photon. With current architectures running the same operation on multiple instances (values, rows, etc.), data can be optimised via <em>vectorisation</em>, even within a single thread. Photon is said to utilise those optimisations.</p>



<h2 class="wp-block-heading">C++ and Code generation</h2>



<p>The Photon engine is implemented in C++ instead of the JVM native languages (Scala, Java or others). The source paper points to performance reasons, including ‘hitting performance ceilings’. The communication with the rest of the Apache Spark is implemented via JNI. Databricks’ internal benchmarks indicate that the performance hit due to moving data in and out of JVM is not noticeable.&nbsp;</p>



<h2 class="wp-block-heading">Internal data format</h2>



<p>Photon engine uses columnar data representation (same as, e.g. Parquet data format) instead of row data (like the rest of Apache Spark). This is due to SIMD optimizations – kernel implementation that works best on columnar data. The memory management (calling, freeing, etc) is still done via Apache Spark’s memory manager. The data is kept off-heap, so transferring from Photon to Spark does not require copying the data.</p>



<p>When a shuffle operation is necessary, Photon writes a shuffle file and uses Spark API to execute the exchange. However, the data format is not compatible with vanilla spark, and a Photon shuffle read must follow the Photon shuffle write.</p>



<h2 class="wp-block-heading">When does it help?</h2>



<p>Photon is meant to address the CPU-heavy loads. This includes joins (especially hash join) and aggregations. On the other hand, being a non-JVM implementation, Photon obviously does not support UDFs or RDD API. Exact benchmarks and precise speedups are mentioned in the source paper.</p>



<h2 class="wp-block-heading">Sources</h2>



<ul class="wp-block-list">
<li>Source paper: <a href="https://people.eecs.berkeley.edu/~matei/papers/2022/sigmod_photon.pdf">Photon: A Fast Query Engine for Lakehouse Systems</a></li>
</ul>



<p>I hope this helps. Moreover, if you know any other good sources, do let us know on social media so that everyone can see them.</p>
<p>The post <a href="https://tantusdata.com/insights/databricks-photon/">Databricks &#8211; Photon</a> appeared first on <a href="https://tantusdata.com">TantusData</a>.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Airflow — pools and mutexes.</title>
		<link>https://tantusdata.com/insights/airflow-pools-and-mutexes/</link>
		
		<dc:creator><![CDATA[Amadeusz Kosik]]></dc:creator>
		<pubDate>Tue, 19 Mar 2024 13:47:32 +0000</pubDate>
				<category><![CDATA[Apache Airflow]]></category>
		<category><![CDATA[DAG dependencies]]></category>
		<category><![CDATA[data pipelines]]></category>
		<category><![CDATA[job orchestration]]></category>
		<guid isPermaLink="false">https://tantusdata.com/?post_type=insights&#038;p=1898</guid>

					<description><![CDATA[<p>Although the ideal data pipeline is made of idempotent and independent tasks, there are some cases when setting up a mutex (a.k.a. part of the job that cannot be run concurrently) is necessary. Fortunately, Airflow supports such cases and offers a few tools varying by complexity to implement such a pipeline. In this article, we [&#8230;]</p>
<p>The post <a href="https://tantusdata.com/insights/airflow-pools-and-mutexes/">Airflow — pools and mutexes.</a> appeared first on <a href="https://tantusdata.com">TantusData</a>.</p>
]]></description>
										<content:encoded><![CDATA[
<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="585" src="https://tantusdata.com/app/uploads/2024/03/airflow-1024x585.png" alt="" class="wp-image-2180" srcset="https://tantusdata.com/app/uploads/2024/03/airflow-1024x585.png 1024w, https://tantusdata.com/app/uploads/2024/03/airflow-300x171.png 300w, https://tantusdata.com/app/uploads/2024/03/airflow-768x439.png 768w, https://tantusdata.com/app/uploads/2024/03/airflow-1536x878.png 1536w, https://tantusdata.com/app/uploads/2024/03/airflow.png 1792w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Although the ideal data pipeline is made of idempotent and independent tasks, there are some cases when setting up a mutex (a.k.a. part of the job that cannot be run concurrently) is necessary. Fortunately, Airflow supports such cases and offers a few tools varying by complexity to implement such a pipeline.</p>



<p>In this article, we will look at the following DAG in AF. The graph itself is relatively simple; the catch is that <em>load_1 </em>and <em>load_2</em> operators cannot have concurrently running task instances. We will look at treating loads separately and looking at them as a group.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="408" src="https://tantusdata.com/app/uploads/2024/02/Airflow-DAG-1024x408.png" alt="" class="wp-image-1899" srcset="https://tantusdata.com/app/uploads/2024/02/Airflow-DAG-1024x408.png 1024w, https://tantusdata.com/app/uploads/2024/02/Airflow-DAG-300x120.png 300w, https://tantusdata.com/app/uploads/2024/02/Airflow-DAG-768x306.png 768w, https://tantusdata.com/app/uploads/2024/02/Airflow-DAG-1536x612.png 1536w, https://tantusdata.com/app/uploads/2024/02/Airflow-DAG.png 1586w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Therefore, there are three scenarios we will look into:</p>



<ol class="wp-block-list">
<li>Only one instance of the operator can be running.</li>



<li>One instance is that the operator cannot run before the older ones are successful.</li>



<li>Only one instance of a group of operators.</li>
</ol>



<p>The examples are using annotation syntax for Airflow. Still, the concept stays the same for the 1.0 compatible approach – instead of annotation params, use them in any <em>Operator</em> class constructor.</p>



<h2 class="wp-block-heading">Mutex on an operator</h2>



<p>The first scenario is that only one operator instance can be running at a time. If there are multiple runs ready to be scheduled, it does not matter which one goes first. One solution would be using the <em>max_active_tis_per_dag</em> option with the value of 1.&nbsp;</p>



<div class="wp-block-kevinbatdorf-code-block-pro" data-code-block-pro-font-family="Code-Pro-JetBrains-Mono" style="font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#2e3440ff"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="@task(
   max_active_tis_per_dag=1
)" style="color:#d8dee9ff;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki nord" style="background-color: #2e3440ff" tabindex="0"><code><span class="line"><span style="color: #D08770">@task</span><span style="color: #D8DEE9FF">(</span></span>
<span class="line"><span style="color: #D8DEE9FF">   </span><span style="color: #D08770">max_active_tis_per_dag</span><span style="color: #81A1C1">=</span><span style="color: #B48EAD">1</span></span>
<span class="line"><span style="color: #D8DEE9FF">)</span></span></code></pre></div>



<p></p>



<h2 class="wp-block-heading">Dependency on past runs</h2>



<p>The second example assumes that the <em>nth</em> batch cannot start before <em>the nth-1</em> one is completed successfully (or at least marked so in Airflow). For this use case, AF offers <em>depends_on_past</em> flag. In this case, you have to be careful and pay some attention to the state of the latest runs. One failed, upstream-failed, <em>or </em>waiting task can halt all future DAG runs.</p>



<div class="wp-block-kevinbatdorf-code-block-pro" data-code-block-pro-font-family="Code-Pro-JetBrains-Mono" style="font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#2e3440ff"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="@task(
   depends_on_past=True
)" style="color:#d8dee9ff;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki nord" style="background-color: #2e3440ff" tabindex="0"><code><span class="line"><span style="color: #D08770">@task</span><span style="color: #D8DEE9FF">(</span></span>
<span class="line"><span style="color: #D8DEE9FF">   </span><span style="color: #D08770">depends_on_past</span><span style="color: #81A1C1">=</span><span style="color: #D08770">True</span></span>
<span class="line"><span style="color: #D8DEE9FF">)</span></span></code></pre></div>



<p></p>



<h2 class="wp-block-heading">Pools &#8211; mutex across multiple operators</h2>



<p>The most complex approach is required if we need to group multiple operators to make them share a mutex. One way to do it is to put them into one custom pool and limit it to accommodate only one task simultaneously – either by setting a pool with 1 slot or assigning a high number of required slots to each operator.</p>



<div class="wp-block-kevinbatdorf-code-block-pro" data-code-block-pro-font-family="Code-Pro-JetBrains-Mono" style="font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#2e3440ff"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="load_pool = Pool.create_or_update_pool(
   name=&quot;load_pool&quot;,
   slots=1,
   description=&quot;Pool for data load tasks.&quot;
)" style="color:#d8dee9ff;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki nord" style="background-color: #2e3440ff" tabindex="0"><code><span class="line"><span style="color: #D8DEE9">load_pool</span><span style="color: #D8DEE9FF"> </span><span style="color: #81A1C1">=</span><span style="color: #D8DEE9FF"> </span><span style="color: #D8DEE9">Pool</span><span style="color: #ECEFF4">.</span><span style="color: #88C0D0">create_or_update_pool</span><span style="color: #D8DEE9FF">(</span></span>
<span class="line"><span style="color: #D8DEE9FF">   </span><span style="color: #D8DEE9">name</span><span style="color: #81A1C1">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">load_pool</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">   </span><span style="color: #D8DEE9">slots</span><span style="color: #81A1C1">=</span><span style="color: #B48EAD">1</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">   </span><span style="color: #D8DEE9">description</span><span style="color: #81A1C1">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">Pool for data load tasks.</span><span style="color: #ECEFF4">&quot;</span></span>
<span class="line"><span style="color: #D8DEE9FF">)</span></span></code></pre></div>



<p></p>



<div class="wp-block-kevinbatdorf-code-block-pro" data-code-block-pro-font-family="Code-Pro-JetBrains-Mono" style="font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#2e3440ff"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="@task(
   pool=load_pool.pool
)" style="color:#d8dee9ff;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki nord" style="background-color: #2e3440ff" tabindex="0"><code><span class="line"><span style="color: #D08770">@task</span><span style="color: #D8DEE9FF">(</span></span>
<span class="line"><span style="color: #D8DEE9FF">   </span><span style="color: #D08770">pool</span><span style="color: #81A1C1">=</span><span style="color: #D8DEE9">load_pool</span><span style="color: #ECEFF4">.</span><span style="color: #D8DEE9">pool</span></span>
<span class="line"><span style="color: #D8DEE9FF">)</span></span></code></pre></div>



<p></p>



<h2 class="wp-block-heading">Source</h2>



<p>The source code for all examples and the docker environment to run them is available on GitHub.</p>
<p>The post <a href="https://tantusdata.com/insights/airflow-pools-and-mutexes/">Airflow — pools and mutexes.</a> appeared first on <a href="https://tantusdata.com">TantusData</a>.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Passing information between DAGs in Airflow.</title>
		<link>https://tantusdata.com/insights/passing-information-between-dags-in-airflow/</link>
		
		<dc:creator><![CDATA[Amadeusz Kosik]]></dc:creator>
		<pubDate>Thu, 08 Feb 2024 13:34:54 +0000</pubDate>
				<category><![CDATA[Apache Airflow]]></category>
		<category><![CDATA[DAG dependencies]]></category>
		<category><![CDATA[data pipelines]]></category>
		<category><![CDATA[job orchestration]]></category>
		<guid isPermaLink="false">https://tantusdata.com/?post_type=insights&#038;p=1893</guid>

					<description><![CDATA[<p>There are data pipelines where you must pass some values between tasks &#8211; not complete datasets, but ~ kilobytes. This can be managed even within the Airflow itself. As always, multiple options are available &#8211; let’s review some of them. In this article, we are looking at sharing data between DAGs, which are connected via [&#8230;]</p>
<p>The post <a href="https://tantusdata.com/insights/passing-information-between-dags-in-airflow/">Passing information between DAGs in Airflow.</a> appeared first on <a href="https://tantusdata.com">TantusData</a>.</p>
]]></description>
										<content:encoded><![CDATA[
<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="585" src="https://tantusdata.com/app/uploads/2024/02/dependencies_1-1024x585.jpg" alt="" class="wp-image-2182" srcset="https://tantusdata.com/app/uploads/2024/02/dependencies_1-1024x585.jpg 1024w, https://tantusdata.com/app/uploads/2024/02/dependencies_1-300x171.jpg 300w, https://tantusdata.com/app/uploads/2024/02/dependencies_1-768x439.jpg 768w, https://tantusdata.com/app/uploads/2024/02/dependencies_1-1536x878.jpg 1536w, https://tantusdata.com/app/uploads/2024/02/dependencies_1.jpg 1792w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>There are data pipelines where you must pass some values between tasks &#8211; not complete datasets, but ~ kilobytes. This can be managed even within the Airflow itself. As always, multiple options are available &#8211; let’s review some of them.</p>



<p>In this article, we are looking at sharing data between DAGs, which are connected via run dependencies. Let’s assume that each DAG needs to be run daily, and the first DAG generates some important data for the second DAG.&nbsp;</p>



<h2 class="wp-block-heading">XCom</h2>



<p>XCom would be the first and the recommended approach. It works well with out-of-the-box features like ExternalTaskSensor:</p>



<div class="wp-block-kevinbatdorf-code-block-pro" data-code-block-pro-font-family="Code-Pro-JetBrains-Mono" style="font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#2e3440ff"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="with DAG(
       dag_id=&quot;xcom-sink&quot;,
       schedule_interval=&quot;@daily&quot;,
       start_date=datetime(2023, 7, 1),
       catchup=True,
) as xcom_sink:
   ExternalTaskSensor(
       task_id=&quot;wait-for-dependency&quot;,
       external_dag_id=&quot;xcom-source&quot;,
       external_task_id=&quot;update-hive-table-events-triggers&quot;
   ) &gt;&gt; BashOperator(
       task_id=&quot;show-xcom&quot;,
       bash_command=&quot;echo {{ ti.xcom_pull(dag_id='xcom-source', task_ids='update-hive-table-events-triggers') }}&quot;
   )" style="color:#d8dee9ff;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki nord" style="background-color: #2e3440ff" tabindex="0"><code><span class="line"><span style="color: #81A1C1">with</span><span style="color: #D8DEE9FF"> </span><span style="color: #88C0D0">DAG</span><span style="color: #D8DEE9FF">(</span></span>
<span class="line"><span style="color: #D8DEE9FF">       </span><span style="color: #D8DEE9">dag_id</span><span style="color: #81A1C1">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">xcom-sink</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">       </span><span style="color: #D8DEE9">schedule_interval</span><span style="color: #81A1C1">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">@daily</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">       </span><span style="color: #D8DEE9">start_date</span><span style="color: #81A1C1">=</span><span style="color: #88C0D0">datetime</span><span style="color: #D8DEE9FF">(</span><span style="color: #B48EAD">2023</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> </span><span style="color: #B48EAD">7</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> </span><span style="color: #B48EAD">1</span><span style="color: #D8DEE9FF">)</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">       </span><span style="color: #D8DEE9">catchup</span><span style="color: #81A1C1">=</span><span style="color: #D8DEE9">True</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">) </span><span style="color: #81A1C1">as</span><span style="color: #D8DEE9FF"> xcom_sink:</span></span>
<span class="line"><span style="color: #D8DEE9FF">   </span><span style="color: #88C0D0">ExternalTaskSensor</span><span style="color: #D8DEE9FF">(</span></span>
<span class="line"><span style="color: #D8DEE9FF">       </span><span style="color: #D8DEE9">task_id</span><span style="color: #81A1C1">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">wait-for-dependency</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">       </span><span style="color: #D8DEE9">external_dag_id</span><span style="color: #81A1C1">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">xcom-source</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">       </span><span style="color: #D8DEE9">external_task_id</span><span style="color: #81A1C1">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">update-hive-table-events-triggers</span><span style="color: #ECEFF4">&quot;</span></span>
<span class="line"><span style="color: #D8DEE9FF">   ) </span><span style="color: #81A1C1">&gt;&gt;</span><span style="color: #D8DEE9FF"> </span><span style="color: #88C0D0">BashOperator</span><span style="color: #D8DEE9FF">(</span></span>
<span class="line"><span style="color: #D8DEE9FF">       </span><span style="color: #D8DEE9">task_id</span><span style="color: #81A1C1">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">show-xcom</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">       </span><span style="color: #D8DEE9">bash_command</span><span style="color: #81A1C1">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">echo {{ ti.xcom_pull(dag_id=&#39;xcom-source&#39;, task_ids=&#39;update-hive-table-events-triggers&#39;) }}</span><span style="color: #ECEFF4">&quot;</span></span>
<span class="line"><span style="color: #D8DEE9FF">   )</span></span></code></pre></div>



<p></p>



<h2 class="wp-block-heading">XCom vs run id</h2>



<p>The XCom is identified by the DAG ID, task ID and run ID. If you want to run a single DAG with a custom run ID, you have to ensure there is an XCom value for that run ID already created. This complicates issuing manual runs.</p>



<h2 class="wp-block-heading">When not to XCom?</h2>



<p>Rather than discussing cases that are tailored to use XCom, let’s focus on examples that are not well supported. Basically, XCom does not work well with Datasets, and you might get some quirky results here: <a href="https://github.com/apache/airflow/discussions/33069">https://github.com/apache/airflow/discussions/33069</a>.&nbsp;</p>



<p>With datasets, you need to refer to the <em>last past value</em> of XCom, effectively losing all benefits of tight coupling. Using that parameter, you will need to deal with race conditions: if the not-latest sink DAG is restarted, it will receive an incorrect value from the source. Moreover, let’s consider a design with multiple sources and a single sink:</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="475" src="https://tantusdata.com/app/uploads/2024/02/example-dag-1024x475.png" alt="" class="wp-image-1894" srcset="https://tantusdata.com/app/uploads/2024/02/example-dag-1024x475.png 1024w, https://tantusdata.com/app/uploads/2024/02/example-dag-300x139.png 300w, https://tantusdata.com/app/uploads/2024/02/example-dag-768x357.png 768w, https://tantusdata.com/app/uploads/2024/02/example-dag-1536x713.png 1536w, https://tantusdata.com/app/uploads/2024/02/example-dag.png 1999w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Suppose the <em>events</em> dataset gets updated twice and <em>the users&#8217;</em> dataset only once. In that case, the <em>events-with-users</em> will receive only the latest value of the former one. It is up to you to decide whether this behaviour is expected or unacceptable.</p>



<h2 class="wp-block-heading">Variables</h2>



<p>In rare cases, you can use <em>Variables</em> from Airflow to solve the issue. <em>Variables</em> are built-in mechanisms in Airflow that provide a global state (or configuration) for all DAGs to read and write. You can use it to implement <a href="https://java-design-patterns.com/patterns/registry/"><em>the Registry</em> code pattern</a>:</p>



<ol class="wp-block-list">
<li>Put a hashmap of <em>the date of run -&gt; metadata</em> into a variable</li>



<li>Use the <em>execution date</em> (named <em>logical_date</em> in the newer versions) as the hashmap key.</li>



<li>Use <em>PythonOperator</em> to update the variable. Read can be done by either Python code or templating.</li>



<li>Use sensors or datasets to schedule DAGs in the correct order.</li>
</ol>



<h2 class="wp-block-heading">Beware!</h2>



<p>Before you go with the variable route, please keep in mind that compared to XComs, variables are way more costly to maintain. You might need to keep track of the variables’ sizes, implement error handling in your <em>PythonOperators</em> and be mindful of any unsolicited changes in the variables’ values.</p>



<h2 class="wp-block-heading">External system?</h2>



<p>There is always an option of using an external data service to synchronise, similarly to using built-in variables. This may come in the form of data paths on HDFS / S3 / other storage, batch load dates in the database or such. However, this solution creates an inferior design, as you would end up with disadvantages of the variables approach and new implicit dependencies between DAGs and external services.</p>



<h2 class="wp-block-heading">Summary</h2>



<figure class="wp-block-table"><table class="has-fixed-layout"><tbody><tr><td><mark style="background-color:rgba(0, 0, 0, 0);color:#c8855a" class="has-inline-color">XCom + dataset</mark></td><td><mark style="background-color:rgba(0, 0, 0, 0);color:#c8855a" class="has-inline-color">XCom + sensor</mark></td><td><mark style="background-color:rgba(0, 0, 0, 0);color:#c8855a" class="has-inline-color">Variables</mark></td><td><mark style="background-color:rgba(0, 0, 0, 0);color:#c8855a" class="has-inline-color">External</mark></td></tr><tr><td>+ no tight coupling between DAGs</td><td>+ supports 1:1 task run relationships</td><td>+ works with both sensors and datasets</td><td>+ works even with 3rd party systems</td></tr><tr><td>&#8211; race conditions,<br><br>&#8211; no guarantee for 1:1 run relationships</td><td>&#8211; a bit tighter coupling<br><br>&#8211; for manual run one must supply by hand the exec_date</td><td>&#8211; handles only data sharing, does not handle orchestration<br><br>&#8211; requires way more effort than XCom</td><td>&#8211; requires even more effort<br><br>&#8211; hidden dependencies outside AF<br><br>&#8211; complicated architecture</td></tr></tbody></table><figcaption class="wp-element-caption">Comparison</figcaption></figure>



<h2 class="wp-block-heading">Source</h2>



<p>The source code for both XCom and <em>Variable</em> examples and the docker environment to run them is available on GitHub.</p>
<p>The post <a href="https://tantusdata.com/insights/passing-information-between-dags-in-airflow/">Passing information between DAGs in Airflow.</a> appeared first on <a href="https://tantusdata.com">TantusData</a>.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Managing inter-DAG dependencies in Airflow</title>
		<link>https://tantusdata.com/insights/managing-inter-dag-dependencies-in-airflow/</link>
		
		<dc:creator><![CDATA[Amadeusz Kosik]]></dc:creator>
		<pubDate>Fri, 01 Sep 2023 11:00:00 +0000</pubDate>
				<category><![CDATA[Apache Airflow]]></category>
		<category><![CDATA[DAG dependencies]]></category>
		<category><![CDATA[data pipelines]]></category>
		<category><![CDATA[data-aware scheduling]]></category>
		<category><![CDATA[performance tuning]]></category>
		<guid isPermaLink="false">https://tantusdata.com/?post_type=insights&#038;p=1731</guid>

					<description><![CDATA[<p>In the real world, data pipelines only sometimes come as a completely independent sequence of operations. Usually, they share dependences on one another, occasionally easy 1:1 ones, sometimes more complicated. Here is a short list of what Apache Airflow has to offer for handling those relationships. The general setting We will look into a situation [&#8230;]</p>
<p>The post <a href="https://tantusdata.com/insights/managing-inter-dag-dependencies-in-airflow/">Managing inter-DAG dependencies in Airflow</a> appeared first on <a href="https://tantusdata.com">TantusData</a>.</p>
]]></description>
										<content:encoded><![CDATA[
<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="576" src="https://tantusdata.com/app/uploads/2023/08/TantusData.interDag.dependencies-copy-1024x576.jpg" alt="" class="wp-image-1732" srcset="https://tantusdata.com/app/uploads/2023/08/TantusData.interDag.dependencies-copy-1024x576.jpg 1024w, https://tantusdata.com/app/uploads/2023/08/TantusData.interDag.dependencies-copy-300x169.jpg 300w, https://tantusdata.com/app/uploads/2023/08/TantusData.interDag.dependencies-copy-768x432.jpg 768w, https://tantusdata.com/app/uploads/2023/08/TantusData.interDag.dependencies-copy-1536x864.jpg 1536w, https://tantusdata.com/app/uploads/2023/08/TantusData.interDag.dependencies-copy-2048x1152.jpg 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>In the real world, data pipelines only sometimes come as a completely independent sequence of operations. Usually, they share dependences on one another, occasionally easy 1:1 ones, sometimes more complicated. Here is a short list of what Apache Airflow has to offer for handling those relationships.</p>



<h1 class="wp-block-heading">The general setting</h1>



<p>We will look into a situation where parent data pipelines (called DAGs in Airflowish) create data used (consumed) by children pipelines. The dependency looks like this:</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="300" src="https://tantusdata.com/app/uploads/2023/08/Graph.1-copy-1024x300.jpg" alt="" class="wp-image-1759" srcset="https://tantusdata.com/app/uploads/2023/08/Graph.1-copy-1024x300.jpg 1024w, https://tantusdata.com/app/uploads/2023/08/Graph.1-copy-300x88.jpg 300w, https://tantusdata.com/app/uploads/2023/08/Graph.1-copy-768x225.jpg 768w, https://tantusdata.com/app/uploads/2023/08/Graph.1-copy-1536x450.jpg 1536w, https://tantusdata.com/app/uploads/2023/08/Graph.1-copy-2048x600.jpg 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>The silent assumption is that we cannot just look at the metadata (file creation time, last row’s created timestamp or alike), as it might show that data is being loaded but not exactly fully loaded &#8211; therefore, we are waiting for the entire dataset to be available.</p>



<h1 class="wp-block-heading">External tools</h1>



<p>Using an external tool is always available, not only for Airflow data pipelines. The most straightforward implementation would be using an 0-byte file on NFS, HDFS or any other shareable filesystem to mark success for a given dataset.&nbsp;&nbsp;</p>



<p>This example is based on HDFS, but you could also use:</p>



<ul class="wp-block-list">
<li>NFS,</li>



<li>FTP/SFTP location,</li>



<li>metadata stored in a SQL or NoSQL database,</li>



<li>any other accessible non-airflow related data storage.</li>
</ul>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="475" src="https://tantusdata.com/app/uploads/2023/08/graph.2-1-1024x475.jpg" alt="" class="wp-image-1751" srcset="https://tantusdata.com/app/uploads/2023/08/graph.2-1-1024x475.jpg 1024w, https://tantusdata.com/app/uploads/2023/08/graph.2-1-300x139.jpg 300w, https://tantusdata.com/app/uploads/2023/08/graph.2-1-768x356.jpg 768w, https://tantusdata.com/app/uploads/2023/08/graph.2-1-1536x713.jpg 1536w, https://tantusdata.com/app/uploads/2023/08/graph.2-1-2048x950.jpg 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<h2 class="wp-block-heading">Source code</h2>



<p>For the current (4.1.0) version of the apache-airflow-providers-apache-hdfs, there is no HDFS operator; only sensors are available. Therefore a bit of walkaround (e.g. BashOperator) has to be used to create a marker file.</p>



<div class="wp-block-kevinbatdorf-code-block-pro" data-code-block-pro-font-family="Code-Pro-JetBrains-Mono" style="font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#2e3440ff"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="from __future__ import annotations


from datetime import datetime


from airflow.models import DAG
from airflow.operators.bash import BashOperator
from airflow.providers.apache.hdfs.sensors.web_hdfs import WebHdfsSensor
from airflow.providers.apache.spark.operators.spark_submit import SparkSubmitOperator
from airflow.providers.sftp.sensors.sftp import SFTPSensor


HDFS_CONNECTION_ID = &quot;hdfs-14&quot;
SFTP_CONNECTION_ID = &quot;sftp-21&quot;


with DAG(
	dag_id=&quot;external-events&quot;,
	schedule=&quot;@daily&quot;,
	start_date=datetime(2023, 7, 1),
) as dag_events:
	SFTPSensor(
    		task_id=&quot;wait-for-csv&quot;,
	    	path=&quot;/inbox/events/{{ dag_run.logical_date | ds }}.csv&quot;,
    		sftp_conn_id=SFTP_CONNECTION_ID
	) &gt;&gt; SparkSubmitOperator(
	    	task_id=&quot;csv-to-parquet&quot;,
    		application=&quot;/spark/applications/csv-to-parquet.py&quot;
	) &gt;&gt; SparkSubmitOperator(
	    	task_id=&quot;update-hive-table&quot;,
    		application=&quot;/spark/applications/update-hive-table.py&quot;
	) &gt;&gt; BashOperator(
	    	task_id=&quot;create-hdfs-marker-file&quot;,
    		bash_command=&quot;hdfs dfs -touch /marker/events/{{ dag_run.logical_date | ds }}.success&quot;
	)


with DAG(
	dag_id=&quot;external-users&quot;,
	schedule=&quot;@daily&quot;,
	start_date=datetime(2023, 7, 1),
) as dag_users:
	SFTPSensor(
    		task_id=&quot;wait-for-csv&quot;,
	    	path=&quot;/inbox/users/{{ dag_run.logical_date | ds }}.csv&quot;,
    		sftp_conn_id=SFTP_CONNECTION_ID
	) &gt;&gt; SparkSubmitOperator(
	    	task_id=&quot;csv-to-parquet&quot;,
    		application=&quot;/spark/applications/csv-to-parquet.py&quot;
	) &gt;&gt; SparkSubmitOperator(
	    	task_id=&quot;update-hive-table&quot;,
    		application=&quot;/spark/applications/update-hive-table.py&quot;
	) &gt;&gt; BashOperator(
	    	task_id=&quot;create-hdfs-marker-file&quot;,
    		bash_command=&quot;hdfs dfs -touch /marker/users/{{ dag_run.logical_date | ds }}.success&quot;
	)


with DAG(
	dag_id=&quot;external-events-with-users&quot;,
	schedule=&quot;@daily&quot;,
	start_date=datetime(2023, 7, 1),
) as dag_events_with_users:
	[
    	WebHdfsSensor(
        	task_id=&quot;wait-for-marker-events&quot;,
        	filepath=&quot;/marker/events/{{ dag_run.logical_date | ds }}.success&quot;,
        	webhdfs_conn_id=HDFS_CONNECTION_ID
    	),
    	WebHdfsSensor(
        	task_id=&quot;wait-for-marker-users&quot;,
        	filepath=&quot;/marker/users/{{ dag_run.logical_date | ds }}.success&quot;,
        	webhdfs_conn_id=HDFS_CONNECTION_ID
    	)
	] &gt;&gt; SparkSubmitOperator(
	    	task_id=&quot;compute-events-with-users&quot;,
    		application=&quot;/spark/applications/compute-events-with-users.py&quot;
	) &gt;&gt; SparkSubmitOperator(
	    	task_id=&quot;update-hive-table&quot;,
    		application=&quot;/spark/applications/update-hive-table.py&quot;
	) &gt;&gt; BashOperator(
	    	task_id=&quot;create-hdfs-marker-file&quot;,
    		bash_command=&quot;hdfs dfs -touch /marker/events-with-users/{{ dag_run.logical_date | ds }}.success&quot;
	)


with DAG(
	dag_id=&quot;external-reports-1&quot;,
	schedule=&quot;@daily&quot;,
	start_date=datetime(2023, 7, 1),
) as dag_reports_1:
	WebHdfsSensor(
	    	task_id=&quot;wait-for-events-with-users&quot;,
    		filepath=&quot;/marker/events-with-users/{{ dag_run.logical_date | ds }}.success&quot;,
	    	webhdfs_conn_id=HDFS_CONNECTION_ID
	) &gt;&gt; SparkSubmitOperator(
	    	task_id=&quot;compute-report&quot;,
    		application=&quot;/spark/applications/compute-report-1.py&quot;
	)


with DAG(
	dag_id=&quot;external-reports-2&quot;,
	schedule=&quot;@daily&quot;,
	start_date=datetime(2023, 7, 1),
) as dag_reports_2:
	WebHdfsSensor(
    		task_id=&quot;wait-for-events-with-users&quot;,
	    	filepath=&quot;/marker/events-with-users/{{ dag_run.logical_date | ds }}.success&quot;,
    		webhdfs_conn_id=HDFS_CONNECTION_ID
	) &gt;&gt; SparkSubmitOperator(
    		task_id=&quot;compute-report&quot;,
	    	application=&quot;/spark/applications/compute-report-2.py&quot;
	)" style="color:#d8dee9ff;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki nord" style="background-color: #2e3440ff" tabindex="0"><code><span class="line"><span style="color: #D8DEE9">from</span><span style="color: #D8DEE9FF"> </span><span style="color: #D8DEE9">__future__</span><span style="color: #D8DEE9FF"> </span><span style="color: #81A1C1">import</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">annotations</span></span>
<span class="line"></span>
<span class="line"></span>
<span class="line"><span style="color: #81A1C1">from</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">datetime</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">import</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">datetime</span></span>
<span class="line"></span>
<span class="line"></span>
<span class="line"><span style="color: #81A1C1">from</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">airflow</span><span style="color: #D8DEE9FF">.</span><span style="color: #8FBCBB">models</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">import</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">DAG</span></span>
<span class="line"><span style="color: #81A1C1">from</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">airflow</span><span style="color: #D8DEE9FF">.</span><span style="color: #8FBCBB">operators</span><span style="color: #D8DEE9FF">.</span><span style="color: #8FBCBB">bash</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">import</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">BashOperator</span></span>
<span class="line"><span style="color: #81A1C1">from</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">airflow</span><span style="color: #D8DEE9FF">.</span><span style="color: #8FBCBB">providers</span><span style="color: #D8DEE9FF">.</span><span style="color: #8FBCBB">apache</span><span style="color: #D8DEE9FF">.</span><span style="color: #8FBCBB">hdfs</span><span style="color: #D8DEE9FF">.</span><span style="color: #8FBCBB">sensors</span><span style="color: #D8DEE9FF">.</span><span style="color: #8FBCBB">web_hdfs</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">import</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">WebHdfsSensor</span></span>
<span class="line"><span style="color: #81A1C1">from</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">airflow</span><span style="color: #D8DEE9FF">.</span><span style="color: #8FBCBB">providers</span><span style="color: #D8DEE9FF">.</span><span style="color: #8FBCBB">apache</span><span style="color: #D8DEE9FF">.</span><span style="color: #8FBCBB">spark</span><span style="color: #D8DEE9FF">.</span><span style="color: #8FBCBB">operators</span><span style="color: #D8DEE9FF">.</span><span style="color: #8FBCBB">spark_submit</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">import</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">SparkSubmitOperator</span></span>
<span class="line"><span style="color: #81A1C1">from</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">airflow</span><span style="color: #D8DEE9FF">.</span><span style="color: #8FBCBB">providers</span><span style="color: #D8DEE9FF">.</span><span style="color: #8FBCBB">sftp</span><span style="color: #D8DEE9FF">.</span><span style="color: #8FBCBB">sensors</span><span style="color: #D8DEE9FF">.</span><span style="color: #8FBCBB">sftp</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">import</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">SFTPSensor</span></span>
<span class="line"></span>
<span class="line"></span>
<span class="line"><span style="color: #8FBCBB">HDFS_CONNECTION_ID</span><span style="color: #D8DEE9FF"> = </span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">hdfs-14</span><span style="color: #ECEFF4">&quot;</span></span>
<span class="line"><span style="color: #8FBCBB">SFTP_CONNECTION_ID</span><span style="color: #D8DEE9FF"> = </span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">sftp-21</span><span style="color: #ECEFF4">&quot;</span></span>
<span class="line"></span>
<span class="line"></span>
<span class="line"><span style="color: #8FBCBB">with</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">DAG</span><span style="color: #D8DEE9FF">(</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">dag_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">external-events</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">schedule</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">@daily</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">start_date</span><span style="color: #D8DEE9FF">=</span><span style="color: #8FBCBB">datetime</span><span style="color: #D8DEE9FF">(2023</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> 7</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> 1)</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">) </span><span style="color: #8FBCBB">as</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">dag_events</span><span style="color: #D8DEE9FF">:</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">SFTPSensor</span><span style="color: #D8DEE9FF">(</span></span>
<span class="line"><span style="color: #D8DEE9FF">    		</span><span style="color: #8FBCBB">task_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">wait-for-csv</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">	    	</span><span style="color: #8FBCBB">path</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">/inbox/events/{{ dag_run.logical_date | ds }}.csv</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">    		</span><span style="color: #8FBCBB">sftp_conn_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #8FBCBB">SFTP_CONNECTION_ID</span></span>
<span class="line"><span style="color: #D8DEE9FF">	) &gt;&gt; </span><span style="color: #8FBCBB">SparkSubmitOperator</span><span style="color: #D8DEE9FF">(</span></span>
<span class="line"><span style="color: #D8DEE9FF">	    	</span><span style="color: #8FBCBB">task_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">csv-to-parquet</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">    		</span><span style="color: #8FBCBB">application</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">/spark/applications/csv-to-parquet.py</span><span style="color: #ECEFF4">&quot;</span></span>
<span class="line"><span style="color: #D8DEE9FF">	) &gt;&gt; </span><span style="color: #8FBCBB">SparkSubmitOperator</span><span style="color: #D8DEE9FF">(</span></span>
<span class="line"><span style="color: #D8DEE9FF">	    	</span><span style="color: #8FBCBB">task_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">update-hive-table</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">    		</span><span style="color: #8FBCBB">application</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">/spark/applications/update-hive-table.py</span><span style="color: #ECEFF4">&quot;</span></span>
<span class="line"><span style="color: #D8DEE9FF">	) &gt;&gt; </span><span style="color: #8FBCBB">BashOperator</span><span style="color: #D8DEE9FF">(</span></span>
<span class="line"><span style="color: #D8DEE9FF">	    	</span><span style="color: #8FBCBB">task_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">create-hdfs-marker-file</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">    		</span><span style="color: #8FBCBB">bash_command</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">hdfs dfs -touch /marker/events/{{ dag_run.logical_date | ds }}.success</span><span style="color: #ECEFF4">&quot;</span></span>
<span class="line"><span style="color: #D8DEE9FF">	)</span></span>
<span class="line"></span>
<span class="line"></span>
<span class="line"><span style="color: #8FBCBB">with</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">DAG</span><span style="color: #D8DEE9FF">(</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">dag_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">external-users</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">schedule</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">@daily</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">start_date</span><span style="color: #D8DEE9FF">=</span><span style="color: #8FBCBB">datetime</span><span style="color: #D8DEE9FF">(2023</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> 7</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> 1)</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">) </span><span style="color: #8FBCBB">as</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">dag_users</span><span style="color: #D8DEE9FF">:</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">SFTPSensor</span><span style="color: #D8DEE9FF">(</span></span>
<span class="line"><span style="color: #D8DEE9FF">    		</span><span style="color: #8FBCBB">task_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">wait-for-csv</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">	    	</span><span style="color: #8FBCBB">path</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">/inbox/users/{{ dag_run.logical_date | ds }}.csv</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">    		</span><span style="color: #8FBCBB">sftp_conn_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #8FBCBB">SFTP_CONNECTION_ID</span></span>
<span class="line"><span style="color: #D8DEE9FF">	) &gt;&gt; </span><span style="color: #8FBCBB">SparkSubmitOperator</span><span style="color: #D8DEE9FF">(</span></span>
<span class="line"><span style="color: #D8DEE9FF">	    	</span><span style="color: #8FBCBB">task_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">csv-to-parquet</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">    		</span><span style="color: #8FBCBB">application</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">/spark/applications/csv-to-parquet.py</span><span style="color: #ECEFF4">&quot;</span></span>
<span class="line"><span style="color: #D8DEE9FF">	) &gt;&gt; </span><span style="color: #8FBCBB">SparkSubmitOperator</span><span style="color: #D8DEE9FF">(</span></span>
<span class="line"><span style="color: #D8DEE9FF">	    	</span><span style="color: #8FBCBB">task_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">update-hive-table</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">    		</span><span style="color: #8FBCBB">application</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">/spark/applications/update-hive-table.py</span><span style="color: #ECEFF4">&quot;</span></span>
<span class="line"><span style="color: #D8DEE9FF">	) &gt;&gt; </span><span style="color: #8FBCBB">BashOperator</span><span style="color: #D8DEE9FF">(</span></span>
<span class="line"><span style="color: #D8DEE9FF">	    	</span><span style="color: #8FBCBB">task_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">create-hdfs-marker-file</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">    		</span><span style="color: #8FBCBB">bash_command</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">hdfs dfs -touch /marker/users/{{ dag_run.logical_date | ds }}.success</span><span style="color: #ECEFF4">&quot;</span></span>
<span class="line"><span style="color: #D8DEE9FF">	)</span></span>
<span class="line"></span>
<span class="line"></span>
<span class="line"><span style="color: #8FBCBB">with</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">DAG</span><span style="color: #D8DEE9FF">(</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">dag_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">external-events-with-users</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">schedule</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">@daily</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">start_date</span><span style="color: #D8DEE9FF">=</span><span style="color: #8FBCBB">datetime</span><span style="color: #D8DEE9FF">(2023</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> 7</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> 1)</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">) </span><span style="color: #8FBCBB">as</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">dag_events_with_users</span><span style="color: #D8DEE9FF">:</span></span>
<span class="line"><span style="color: #D8DEE9FF">	[</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	</span><span style="color: #8FBCBB">WebHdfsSensor</span><span style="color: #D8DEE9FF">(</span></span>
<span class="line"><span style="color: #D8DEE9FF">        	</span><span style="color: #8FBCBB">task_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">wait-for-marker-events</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">        	</span><span style="color: #8FBCBB">filepath</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">/marker/events/{{ dag_run.logical_date | ds }}.success</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">        	</span><span style="color: #8FBCBB">webhdfs_conn_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #8FBCBB">HDFS_CONNECTION_ID</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	)</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	</span><span style="color: #8FBCBB">WebHdfsSensor</span><span style="color: #D8DEE9FF">(</span></span>
<span class="line"><span style="color: #D8DEE9FF">        	</span><span style="color: #8FBCBB">task_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">wait-for-marker-users</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">        	</span><span style="color: #8FBCBB">filepath</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">/marker/users/{{ dag_run.logical_date | ds }}.success</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">        	</span><span style="color: #8FBCBB">webhdfs_conn_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #8FBCBB">HDFS_CONNECTION_ID</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	)</span></span>
<span class="line"><span style="color: #D8DEE9FF">	] &gt;&gt; </span><span style="color: #8FBCBB">SparkSubmitOperator</span><span style="color: #D8DEE9FF">(</span></span>
<span class="line"><span style="color: #D8DEE9FF">	    	</span><span style="color: #8FBCBB">task_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">compute-events-with-users</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">    		</span><span style="color: #8FBCBB">application</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">/spark/applications/compute-events-with-users.py</span><span style="color: #ECEFF4">&quot;</span></span>
<span class="line"><span style="color: #D8DEE9FF">	) &gt;&gt; </span><span style="color: #8FBCBB">SparkSubmitOperator</span><span style="color: #D8DEE9FF">(</span></span>
<span class="line"><span style="color: #D8DEE9FF">	    	</span><span style="color: #8FBCBB">task_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">update-hive-table</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">    		</span><span style="color: #8FBCBB">application</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">/spark/applications/update-hive-table.py</span><span style="color: #ECEFF4">&quot;</span></span>
<span class="line"><span style="color: #D8DEE9FF">	) &gt;&gt; </span><span style="color: #8FBCBB">BashOperator</span><span style="color: #D8DEE9FF">(</span></span>
<span class="line"><span style="color: #D8DEE9FF">	    	</span><span style="color: #8FBCBB">task_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">create-hdfs-marker-file</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">    		</span><span style="color: #8FBCBB">bash_command</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">hdfs dfs -touch /marker/events-with-users/{{ dag_run.logical_date | ds }}.success</span><span style="color: #ECEFF4">&quot;</span></span>
<span class="line"><span style="color: #D8DEE9FF">	)</span></span>
<span class="line"></span>
<span class="line"></span>
<span class="line"><span style="color: #8FBCBB">with</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">DAG</span><span style="color: #D8DEE9FF">(</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">dag_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">external-reports-1</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">schedule</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">@daily</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">start_date</span><span style="color: #D8DEE9FF">=</span><span style="color: #8FBCBB">datetime</span><span style="color: #D8DEE9FF">(2023</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> 7</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> 1)</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">) </span><span style="color: #8FBCBB">as</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">dag_reports_1</span><span style="color: #D8DEE9FF">:</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">WebHdfsSensor</span><span style="color: #D8DEE9FF">(</span></span>
<span class="line"><span style="color: #D8DEE9FF">	    	</span><span style="color: #8FBCBB">task_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">wait-for-events-with-users</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">    		</span><span style="color: #8FBCBB">filepath</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">/marker/events-with-users/{{ dag_run.logical_date | ds }}.success</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">	    	</span><span style="color: #8FBCBB">webhdfs_conn_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #8FBCBB">HDFS_CONNECTION_ID</span></span>
<span class="line"><span style="color: #D8DEE9FF">	) &gt;&gt; </span><span style="color: #8FBCBB">SparkSubmitOperator</span><span style="color: #D8DEE9FF">(</span></span>
<span class="line"><span style="color: #D8DEE9FF">	    	</span><span style="color: #8FBCBB">task_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">compute-report</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">    		</span><span style="color: #8FBCBB">application</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">/spark/applications/compute-report-1.py</span><span style="color: #ECEFF4">&quot;</span></span>
<span class="line"><span style="color: #D8DEE9FF">	)</span></span>
<span class="line"></span>
<span class="line"></span>
<span class="line"><span style="color: #8FBCBB">with</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">DAG</span><span style="color: #D8DEE9FF">(</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">dag_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">external-reports-2</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">schedule</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">@daily</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">start_date</span><span style="color: #D8DEE9FF">=</span><span style="color: #8FBCBB">datetime</span><span style="color: #D8DEE9FF">(2023</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> 7</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> 1)</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">) </span><span style="color: #8FBCBB">as</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">dag_reports_2</span><span style="color: #D8DEE9FF">:</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">WebHdfsSensor</span><span style="color: #D8DEE9FF">(</span></span>
<span class="line"><span style="color: #D8DEE9FF">    		</span><span style="color: #8FBCBB">task_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">wait-for-events-with-users</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">	    	</span><span style="color: #8FBCBB">filepath</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">/marker/events-with-users/{{ dag_run.logical_date | ds }}.success</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">    		</span><span style="color: #8FBCBB">webhdfs_conn_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #8FBCBB">HDFS_CONNECTION_ID</span></span>
<span class="line"><span style="color: #D8DEE9FF">	) &gt;&gt; </span><span style="color: #8FBCBB">SparkSubmitOperator</span><span style="color: #D8DEE9FF">(</span></span>
<span class="line"><span style="color: #D8DEE9FF">    		</span><span style="color: #8FBCBB">task_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">compute-report</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">	    	</span><span style="color: #8FBCBB">application</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">/spark/applications/compute-report-2.py</span><span style="color: #ECEFF4">&quot;</span></span>
<span class="line"><span style="color: #D8DEE9FF">	)</span></span></code></pre></div>



<p></p>



<p>This is a helpful technique if Airflow is to be integrated with other tools. On the other hand, the dependency is very implicit. It must be refactored carefully to maintain the contract between data pipelines.</p>



<h1 class="wp-block-heading">Airflow datasets</h1>



<p>Since version 2.4, Airflow has offered data-aware scheduling based on the concept of datasets. The general idea is:</p>



<p>1. Operators define datasets to which they publish data. The dataset in Airflow is just metadata &#8211; AF does not handle the data itself.</p>



<p>2. DAGs specify datasets that they depend on. Each time all dependency datasets are updated (once or many times), the dependent DAG is triggered. This setting replaces the time-based schedule.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="475" src="https://tantusdata.com/app/uploads/2023/08/graph.3-1024x475.jpg" alt="" class="wp-image-1753" srcset="https://tantusdata.com/app/uploads/2023/08/graph.3-1024x475.jpg 1024w, https://tantusdata.com/app/uploads/2023/08/graph.3-300x139.jpg 300w, https://tantusdata.com/app/uploads/2023/08/graph.3-768x356.jpg 768w, https://tantusdata.com/app/uploads/2023/08/graph.3-1536x713.jpg 1536w, https://tantusdata.com/app/uploads/2023/08/graph.3-2048x950.jpg 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Although this feature does not give any control over the time of the execution nor tie specific runs (there is no guarantee that DAG gets run on every separate dependency run &#8211; several runs may get squashed into a single dependent run), a user receives out-of-the-box a friendly UI for browsing datasets and DAGs relationships.&nbsp;</p>



<h2 class="wp-block-heading">Source code</h2>



<div class="wp-block-kevinbatdorf-code-block-pro" data-code-block-pro-font-family="Code-Pro-JetBrains-Mono" style="font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#2e3440ff"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="from __future__ import annotations


from airflow import DAG, Dataset
from airflow.providers.apache.spark.operators.spark_submit import SparkSubmitOperator
from airflow.providers.sftp.sensors.sftp import SFTPSensor
from pendulum import datetime




dataset_users = Dataset(&quot;af://datasets/users&quot;)
dataset_events = Dataset(&quot;af://datasets/events&quot;)
dataset_events_with_users = Dataset(&quot;af://datasets/events-with-users&quot;)
dataset_reports_1 = Dataset(&quot;af://datasets/reports-1&quot;)
dataset_reports_2 = Dataset(&quot;af://datasets/reports-2&quot;)


SFTP_CONNECTION_ID = &quot;sftp-21&quot;


with DAG(
	dag_id=&quot;dataset-events&quot;,
	catchup=True,
	start_date=datetime(2023, 7, 1, tz=&quot;UTC&quot;),
	schedule=&quot;@daily&quot;,
	max_active_runs=1,
) as dag_producer_events:
	SFTPSensor(
        	task_id=&quot;wait-for-csv&quot;,
    	    	path=&quot;/inbox/events/{{ dag_run.logical_date | ds }}.csv&quot;,
    	    	sftp_conn_id=SFTP_CONNECTION_ID
	) &gt;&gt; SparkSubmitOperator(
    	    	task_id=&quot;csv-to-parquet&quot;,
    	    	application=&quot;/spark/applications/csv-to-parquet.py&quot;
	) &gt;&gt; SparkSubmitOperator(
    	    	task_id=&quot;update-hive-table&quot;,
    	    	application=&quot;/spark/applications/update-hive-table.py&quot;,
    	    	outlets=[dataset_events]
	)


with DAG(
	dag_id=&quot;dataset-users&quot;,
	catchup=True,
	start_date=datetime(2023, 7, 1, tz=&quot;UTC&quot;),
	schedule=&quot;@daily&quot;,
	max_active_runs=1,
) as dag_producer_users:
	SFTPSensor(
        	task_id=&quot;wait-for-csv&quot;,
    	    	path=&quot;/inbox/users/{{ dag_run.logical_date | ds }}.csv&quot;,
    	    	sftp_conn_id=SFTP_CONNECTION_ID
	) &gt;&gt; SparkSubmitOperator(
    	    	task_id=&quot;csv-to-parquet&quot;,
    	    	application=&quot;/spark/applications/csv-to-parquet.py&quot;
	) &gt;&gt; SparkSubmitOperator(
    	    	task_id=&quot;update-hive-table&quot;,
    	    	application=&quot;/spark/applications/update-hive-table.py&quot;,
    	    	outlets=[dataset_users]
	)


with DAG(
	dag_id=&quot;dataset-events-with-users&quot;,
	catchup=True,
	start_date=datetime(2023, 7, 1, tz=&quot;UTC&quot;),
	schedule=[dataset_events, dataset_users],
	max_active_runs=1,
) as dag_processor_events_with_users:
	SparkSubmitOperator(
    	    	ask_id=&quot;compute-events-with-users&quot;,
    	    	application=&quot;/spark/applications/compute-events-with-users.py&quot;
	) &gt;&gt; SparkSubmitOperator(
    	    	task_id=&quot;update-hive-table&quot;,
    	    	application=&quot;/spark/applications/update-hive-table.py&quot;,
    	    	outlets=[dataset_events_with_users]
	)


with DAG(
	dag_id=&quot;dataset-reports-1&quot;,
	catchup=True,
	start_date=datetime(2023, 7, 1, tz=&quot;UTC&quot;),
	schedule=[dataset_events_with_users],
	max_active_runs=1,
) as dag_processor_reports_1:
	SparkSubmitOperator(
    	    	task_id=&quot;compute-report&quot;,
    	    	application=&quot;/spark/applications/compute-report-1.py&quot;,
    	    	outlets=[dataset_reports_1]
	)


with DAG(
	dag_id=&quot;dataset-reports-2&quot;,
	catchup=True,
	start_date=datetime(2023, 7, 1, tz=&quot;UTC&quot;),
	schedule=[dataset_events_with_users],
	max_active_runs=1,
) as dag_processor_reports_2:
	SparkSubmitOperator(
    	    	task_id=&quot;compute-report&quot;,
    	    	application=&quot;/spark/applications/compute-report-2.py&quot;,
    	    	outlets=[dataset_reports_2]
	)" style="color:#d8dee9ff;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki nord" style="background-color: #2e3440ff" tabindex="0"><code><span class="line"><span style="color: #D8DEE9">from</span><span style="color: #D8DEE9FF"> </span><span style="color: #D8DEE9">__future__</span><span style="color: #D8DEE9FF"> </span><span style="color: #81A1C1">import</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">annotations</span></span>
<span class="line"></span>
<span class="line"></span>
<span class="line"><span style="color: #81A1C1">from</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">airflow</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">import</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">DAG</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">Dataset</span></span>
<span class="line"><span style="color: #81A1C1">from</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">airflow</span><span style="color: #D8DEE9FF">.</span><span style="color: #8FBCBB">providers</span><span style="color: #D8DEE9FF">.</span><span style="color: #8FBCBB">apache</span><span style="color: #D8DEE9FF">.</span><span style="color: #8FBCBB">spark</span><span style="color: #D8DEE9FF">.</span><span style="color: #8FBCBB">operators</span><span style="color: #D8DEE9FF">.</span><span style="color: #8FBCBB">spark_submit</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">import</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">SparkSubmitOperator</span></span>
<span class="line"><span style="color: #81A1C1">from</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">airflow</span><span style="color: #D8DEE9FF">.</span><span style="color: #8FBCBB">providers</span><span style="color: #D8DEE9FF">.</span><span style="color: #8FBCBB">sftp</span><span style="color: #D8DEE9FF">.</span><span style="color: #8FBCBB">sensors</span><span style="color: #D8DEE9FF">.</span><span style="color: #8FBCBB">sftp</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">import</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">SFTPSensor</span></span>
<span class="line"><span style="color: #81A1C1">from</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">pendulum</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">import</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">datetime</span></span>
<span class="line"></span>
<span class="line"></span>
<span class="line"></span>
<span class="line"></span>
<span class="line"><span style="color: #8FBCBB">dataset_users</span><span style="color: #D8DEE9FF"> = </span><span style="color: #8FBCBB">Dataset</span><span style="color: #D8DEE9FF">(</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">af://datasets/users</span><span style="color: #ECEFF4">&quot;</span><span style="color: #D8DEE9FF">)</span></span>
<span class="line"><span style="color: #8FBCBB">dataset_events</span><span style="color: #D8DEE9FF"> = </span><span style="color: #8FBCBB">Dataset</span><span style="color: #D8DEE9FF">(</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">af://datasets/events</span><span style="color: #ECEFF4">&quot;</span><span style="color: #D8DEE9FF">)</span></span>
<span class="line"><span style="color: #8FBCBB">dataset_events_with_users</span><span style="color: #D8DEE9FF"> = </span><span style="color: #8FBCBB">Dataset</span><span style="color: #D8DEE9FF">(</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">af://datasets/events-with-users</span><span style="color: #ECEFF4">&quot;</span><span style="color: #D8DEE9FF">)</span></span>
<span class="line"><span style="color: #8FBCBB">dataset_reports_1</span><span style="color: #D8DEE9FF"> = </span><span style="color: #8FBCBB">Dataset</span><span style="color: #D8DEE9FF">(</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">af://datasets/reports-1</span><span style="color: #ECEFF4">&quot;</span><span style="color: #D8DEE9FF">)</span></span>
<span class="line"><span style="color: #8FBCBB">dataset_reports_2</span><span style="color: #D8DEE9FF"> = </span><span style="color: #8FBCBB">Dataset</span><span style="color: #D8DEE9FF">(</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">af://datasets/reports-2</span><span style="color: #ECEFF4">&quot;</span><span style="color: #D8DEE9FF">)</span></span>
<span class="line"></span>
<span class="line"></span>
<span class="line"><span style="color: #8FBCBB">SFTP_CONNECTION_ID</span><span style="color: #D8DEE9FF"> = </span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">sftp-21</span><span style="color: #ECEFF4">&quot;</span></span>
<span class="line"></span>
<span class="line"></span>
<span class="line"><span style="color: #8FBCBB">with</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">DAG</span><span style="color: #D8DEE9FF">(</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">dag_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">dataset-events</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">catchup</span><span style="color: #D8DEE9FF">=</span><span style="color: #8FBCBB">True</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">start_date</span><span style="color: #D8DEE9FF">=</span><span style="color: #8FBCBB">datetime</span><span style="color: #D8DEE9FF">(2023</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> 7</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> 1</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">tz</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">UTC</span><span style="color: #ECEFF4">&quot;</span><span style="color: #D8DEE9FF">)</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">schedule</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">@daily</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">max_active_runs</span><span style="color: #D8DEE9FF">=1</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">) </span><span style="color: #8FBCBB">as</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">dag_producer_events</span><span style="color: #D8DEE9FF">:</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">SFTPSensor</span><span style="color: #D8DEE9FF">(</span></span>
<span class="line"><span style="color: #D8DEE9FF">        	</span><span style="color: #8FBCBB">task_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">wait-for-csv</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	    	</span><span style="color: #8FBCBB">path</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">/inbox/events/{{ dag_run.logical_date | ds }}.csv</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	    	</span><span style="color: #8FBCBB">sftp_conn_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #8FBCBB">SFTP_CONNECTION_ID</span></span>
<span class="line"><span style="color: #D8DEE9FF">	) &gt;&gt; </span><span style="color: #8FBCBB">SparkSubmitOperator</span><span style="color: #D8DEE9FF">(</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	    	</span><span style="color: #8FBCBB">task_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">csv-to-parquet</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	    	</span><span style="color: #8FBCBB">application</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">/spark/applications/csv-to-parquet.py</span><span style="color: #ECEFF4">&quot;</span></span>
<span class="line"><span style="color: #D8DEE9FF">	) &gt;&gt; </span><span style="color: #8FBCBB">SparkSubmitOperator</span><span style="color: #D8DEE9FF">(</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	    	</span><span style="color: #8FBCBB">task_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">update-hive-table</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	    	</span><span style="color: #8FBCBB">application</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">/spark/applications/update-hive-table.py</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	    	</span><span style="color: #8FBCBB">outlets</span><span style="color: #D8DEE9FF">=[</span><span style="color: #8FBCBB">dataset_events</span><span style="color: #D8DEE9FF">]</span></span>
<span class="line"><span style="color: #D8DEE9FF">	)</span></span>
<span class="line"></span>
<span class="line"></span>
<span class="line"><span style="color: #8FBCBB">with</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">DAG</span><span style="color: #D8DEE9FF">(</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">dag_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">dataset-users</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">catchup</span><span style="color: #D8DEE9FF">=</span><span style="color: #8FBCBB">True</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">start_date</span><span style="color: #D8DEE9FF">=</span><span style="color: #8FBCBB">datetime</span><span style="color: #D8DEE9FF">(2023</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> 7</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> 1</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">tz</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">UTC</span><span style="color: #ECEFF4">&quot;</span><span style="color: #D8DEE9FF">)</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">schedule</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">@daily</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">max_active_runs</span><span style="color: #D8DEE9FF">=1</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">) </span><span style="color: #8FBCBB">as</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">dag_producer_users</span><span style="color: #D8DEE9FF">:</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">SFTPSensor</span><span style="color: #D8DEE9FF">(</span></span>
<span class="line"><span style="color: #D8DEE9FF">        	</span><span style="color: #8FBCBB">task_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">wait-for-csv</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	    	</span><span style="color: #8FBCBB">path</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">/inbox/users/{{ dag_run.logical_date | ds }}.csv</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	    	</span><span style="color: #8FBCBB">sftp_conn_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #8FBCBB">SFTP_CONNECTION_ID</span></span>
<span class="line"><span style="color: #D8DEE9FF">	) &gt;&gt; </span><span style="color: #8FBCBB">SparkSubmitOperator</span><span style="color: #D8DEE9FF">(</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	    	</span><span style="color: #8FBCBB">task_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">csv-to-parquet</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	    	</span><span style="color: #8FBCBB">application</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">/spark/applications/csv-to-parquet.py</span><span style="color: #ECEFF4">&quot;</span></span>
<span class="line"><span style="color: #D8DEE9FF">	) &gt;&gt; </span><span style="color: #8FBCBB">SparkSubmitOperator</span><span style="color: #D8DEE9FF">(</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	    	</span><span style="color: #8FBCBB">task_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">update-hive-table</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	    	</span><span style="color: #8FBCBB">application</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">/spark/applications/update-hive-table.py</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	    	</span><span style="color: #8FBCBB">outlets</span><span style="color: #D8DEE9FF">=[</span><span style="color: #8FBCBB">dataset_users</span><span style="color: #D8DEE9FF">]</span></span>
<span class="line"><span style="color: #D8DEE9FF">	)</span></span>
<span class="line"></span>
<span class="line"></span>
<span class="line"><span style="color: #8FBCBB">with</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">DAG</span><span style="color: #D8DEE9FF">(</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">dag_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">dataset-events-with-users</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">catchup</span><span style="color: #D8DEE9FF">=</span><span style="color: #8FBCBB">True</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">start_date</span><span style="color: #D8DEE9FF">=</span><span style="color: #8FBCBB">datetime</span><span style="color: #D8DEE9FF">(2023</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> 7</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> 1</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">tz</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">UTC</span><span style="color: #ECEFF4">&quot;</span><span style="color: #D8DEE9FF">)</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">schedule</span><span style="color: #D8DEE9FF">=[</span><span style="color: #8FBCBB">dataset_events</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">dataset_users</span><span style="color: #D8DEE9FF">]</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">max_active_runs</span><span style="color: #D8DEE9FF">=1</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">) </span><span style="color: #8FBCBB">as</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">dag_processor_events_with_users</span><span style="color: #D8DEE9FF">:</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">SparkSubmitOperator</span><span style="color: #D8DEE9FF">(</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	    	</span><span style="color: #8FBCBB">ask_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">compute-events-with-users</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	    	</span><span style="color: #8FBCBB">application</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">/spark/applications/compute-events-with-users.py</span><span style="color: #ECEFF4">&quot;</span></span>
<span class="line"><span style="color: #D8DEE9FF">	) &gt;&gt; </span><span style="color: #8FBCBB">SparkSubmitOperator</span><span style="color: #D8DEE9FF">(</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	    	</span><span style="color: #8FBCBB">task_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">update-hive-table</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	    	</span><span style="color: #8FBCBB">application</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">/spark/applications/update-hive-table.py</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	    	</span><span style="color: #8FBCBB">outlets</span><span style="color: #D8DEE9FF">=[</span><span style="color: #8FBCBB">dataset_events_with_users</span><span style="color: #D8DEE9FF">]</span></span>
<span class="line"><span style="color: #D8DEE9FF">	)</span></span>
<span class="line"></span>
<span class="line"></span>
<span class="line"><span style="color: #8FBCBB">with</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">DAG</span><span style="color: #D8DEE9FF">(</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">dag_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">dataset-reports-1</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">catchup</span><span style="color: #D8DEE9FF">=</span><span style="color: #8FBCBB">True</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">start_date</span><span style="color: #D8DEE9FF">=</span><span style="color: #8FBCBB">datetime</span><span style="color: #D8DEE9FF">(2023</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> 7</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> 1</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">tz</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">UTC</span><span style="color: #ECEFF4">&quot;</span><span style="color: #D8DEE9FF">)</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">schedule</span><span style="color: #D8DEE9FF">=[</span><span style="color: #8FBCBB">dataset_events_with_users</span><span style="color: #D8DEE9FF">]</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">max_active_runs</span><span style="color: #D8DEE9FF">=1</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">) </span><span style="color: #8FBCBB">as</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">dag_processor_reports_1</span><span style="color: #D8DEE9FF">:</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">SparkSubmitOperator</span><span style="color: #D8DEE9FF">(</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	    	</span><span style="color: #8FBCBB">task_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">compute-report</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	    	</span><span style="color: #8FBCBB">application</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">/spark/applications/compute-report-1.py</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	    	</span><span style="color: #8FBCBB">outlets</span><span style="color: #D8DEE9FF">=[</span><span style="color: #8FBCBB">dataset_reports_1</span><span style="color: #D8DEE9FF">]</span></span>
<span class="line"><span style="color: #D8DEE9FF">	)</span></span>
<span class="line"></span>
<span class="line"></span>
<span class="line"><span style="color: #8FBCBB">with</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">DAG</span><span style="color: #D8DEE9FF">(</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">dag_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">dataset-reports-2</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">catchup</span><span style="color: #D8DEE9FF">=</span><span style="color: #8FBCBB">True</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">start_date</span><span style="color: #D8DEE9FF">=</span><span style="color: #8FBCBB">datetime</span><span style="color: #D8DEE9FF">(2023</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> 7</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> 1</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">tz</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">UTC</span><span style="color: #ECEFF4">&quot;</span><span style="color: #D8DEE9FF">)</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">schedule</span><span style="color: #D8DEE9FF">=[</span><span style="color: #8FBCBB">dataset_events_with_users</span><span style="color: #D8DEE9FF">]</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">max_active_runs</span><span style="color: #D8DEE9FF">=1</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">) </span><span style="color: #8FBCBB">as</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">dag_processor_reports_2</span><span style="color: #D8DEE9FF">:</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">SparkSubmitOperator</span><span style="color: #D8DEE9FF">(</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	    	</span><span style="color: #8FBCBB">task_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">compute-report</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	    	</span><span style="color: #8FBCBB">application</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">/spark/applications/compute-report-2.py</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	    	</span><span style="color: #8FBCBB">outlets</span><span style="color: #D8DEE9FF">=[</span><span style="color: #8FBCBB">dataset_reports_2</span><span style="color: #D8DEE9FF">]</span></span>
<span class="line"><span style="color: #D8DEE9FF">	)</span></span></code></pre></div>



<p></p>



<h1 class="wp-block-heading">Trigger the external DAG</h1>



<p>Suppose the functionality of the datasets is too limited or an older version of the AF is used. In that case, there are alternative ways to orchestrate the data pipelines. TriggerDagRunOperator represents the push approach. The idea behind this operator is to trigger a run of a specified DAG with the option of supplying custom parameters. Obviously, this feature does not allow to create a dependency on more than one source dataset. However, one dataset can still run multiple dependents.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="300" src="https://tantusdata.com/app/uploads/2023/08/graph4-1024x300.jpg" alt="" class="wp-image-1742" srcset="https://tantusdata.com/app/uploads/2023/08/graph4-1024x300.jpg 1024w, https://tantusdata.com/app/uploads/2023/08/graph4-300x88.jpg 300w, https://tantusdata.com/app/uploads/2023/08/graph4-768x225.jpg 768w, https://tantusdata.com/app/uploads/2023/08/graph4-1536x450.jpg 1536w, https://tantusdata.com/app/uploads/2023/08/graph4-2048x600.jpg 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>Since this approach uses the DAG name (id) to identify the dependent pipeline, you must be careful when refactoring those names. One way to look up the dependent (and depending) DAGs and tasks is to use the Browse &gt; DAG Dependencies tab that shows a graph of trigger and sensor relationships.</p>



<h2 class="wp-block-heading">Source code</h2>



<div class="wp-block-kevinbatdorf-code-block-pro" data-code-block-pro-font-family="Code-Pro-JetBrains-Mono" style="font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#2e3440ff"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="from __future__ import annotations


from datetime import datetime
from airflow.models import DAG
from airflow.operators.bash import BashOperator
from airflow.operators.trigger_dagrun import TriggerDagRunOperator
from airflow.providers.apache.spark.operators.spark_submit import SparkSubmitOperator
from airflow.providers.sftp.sensors.sftp import SFTPSensor
from airflow.utils.task_group import TaskGroup
from airflow.utils.trigger_rule import TriggerRule




SFTP_CONNECTION_ID = &quot;sftp-21&quot;




with DAG(
	dag_id=&quot;trigger-events-with-users&quot;,
	schedule=&quot;@daily&quot;,
	start_date=datetime(2023, 7, 1),
) as dag_events_with_users:
	with TaskGroup(group_id=&quot;gather-events&quot;) as group_events:
    	SFTPSensor(
        	task_id=&quot;wait-for-csv&quot;,
        	path=&quot;/inbox/events/{{ dag_run.logical_date | ds }}.csv&quot;,
        	sftp_conn_id=SFTP_CONNECTION_ID
    	) &gt;&gt; SparkSubmitOperator(
        	task_id=&quot;csv-to-parquet&quot;,
        	application=&quot;/spark/applications/csv-to-parquet.py&quot;
    	) &gt;&gt; SparkSubmitOperator(
        	task_id=&quot;update-hive-table&quot;,
        	application=&quot;/spark/applications/update-hive-table.py&quot;
    	)


	with TaskGroup(group_id=&quot;gather-users&quot;) as group_users:
    	SFTPSensor(
        	task_id=&quot;wait-for-csv&quot;,
        	path=&quot;/inbox/users/{{ dag_run.logical_date | ds }}.csv&quot;,
        	sftp_conn_id=SFTP_CONNECTION_ID
    	) &gt;&gt; SparkSubmitOperator(
        	task_id=&quot;csv-to-parquet&quot;,
        	application=&quot;/spark/applications/csv-to-parquet.py&quot;
    	) &gt;&gt; SparkSubmitOperator(
        	task_id=&quot;update-hive-table&quot;,
        	application=&quot;/spark/applications/update-hive-table.py&quot;
    	)


	[group_events, group_users] &gt;&gt; SparkSubmitOperator(
    	    	task_id=&quot;compute-events-with-users&quot;,
    	    	application=&quot;/spark/applications/compute-events-with-users.py&quot;
	) &gt;&gt; SparkSubmitOperator(
    	    	task_id=&quot;update-hive-table&quot;,
    	    	application=&quot;/spark/applications/update-hive-table.py&quot;
	) &gt;&gt; [
    	    	TriggerDagRunOperator(
        	    	task_id=&quot;trigger-reports-1&quot;,
    	        	trigger_dag_id=&quot;trigger-reports-1&quot;,
        	    	trigger_rule=TriggerRule.ALL_SUCCESS,
    	    	),
    	    	TriggerDagRunOperator(
        	    	task_id=&quot;trigger-reports-2&quot;,
    	        	trigger_dag_id=&quot;trigger-reports-2&quot;,
           	 	trigger_rule=TriggerRule.ALL_SUCCESS,
    	    	)
	]


with DAG(
	dag_id=&quot;trigger-reports-1&quot;,
	catchup=True,
	start_date=datetime(2023, 7, 1),
) as dag_processor_reports_1:
	SparkSubmitOperator(
    	    	task_id=&quot;compute-report&quot;,
    	    	application=&quot;/spark/applications/compute-report-1.py&quot;,
	)


with DAG(
	dag_id=&quot;trigger-reports-2&quot;,
	catchup=True,
	start_date=datetime(2023, 7, 1),
) as dag_processor_reports_2:
	SparkSubmitOperator(
    	    	task_id=&quot;compute-report&quot;,
    	    	application=&quot;/spark/applications/compute-report-2.py&quot;,
	)" style="color:#d8dee9ff;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki nord" style="background-color: #2e3440ff" tabindex="0"><code><span class="line"><span style="color: #D8DEE9">from</span><span style="color: #D8DEE9FF"> </span><span style="color: #D8DEE9">__future__</span><span style="color: #D8DEE9FF"> </span><span style="color: #81A1C1">import</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">annotations</span></span>
<span class="line"></span>
<span class="line"></span>
<span class="line"><span style="color: #81A1C1">from</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">datetime</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">import</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">datetime</span></span>
<span class="line"><span style="color: #81A1C1">from</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">airflow</span><span style="color: #D8DEE9FF">.</span><span style="color: #8FBCBB">models</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">import</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">DAG</span></span>
<span class="line"><span style="color: #81A1C1">from</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">airflow</span><span style="color: #D8DEE9FF">.</span><span style="color: #8FBCBB">operators</span><span style="color: #D8DEE9FF">.</span><span style="color: #8FBCBB">bash</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">import</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">BashOperator</span></span>
<span class="line"><span style="color: #81A1C1">from</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">airflow</span><span style="color: #D8DEE9FF">.</span><span style="color: #8FBCBB">operators</span><span style="color: #D8DEE9FF">.</span><span style="color: #8FBCBB">trigger_dagrun</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">import</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">TriggerDagRunOperator</span></span>
<span class="line"><span style="color: #81A1C1">from</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">airflow</span><span style="color: #D8DEE9FF">.</span><span style="color: #8FBCBB">providers</span><span style="color: #D8DEE9FF">.</span><span style="color: #8FBCBB">apache</span><span style="color: #D8DEE9FF">.</span><span style="color: #8FBCBB">spark</span><span style="color: #D8DEE9FF">.</span><span style="color: #8FBCBB">operators</span><span style="color: #D8DEE9FF">.</span><span style="color: #8FBCBB">spark_submit</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">import</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">SparkSubmitOperator</span></span>
<span class="line"><span style="color: #81A1C1">from</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">airflow</span><span style="color: #D8DEE9FF">.</span><span style="color: #8FBCBB">providers</span><span style="color: #D8DEE9FF">.</span><span style="color: #8FBCBB">sftp</span><span style="color: #D8DEE9FF">.</span><span style="color: #8FBCBB">sensors</span><span style="color: #D8DEE9FF">.</span><span style="color: #8FBCBB">sftp</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">import</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">SFTPSensor</span></span>
<span class="line"><span style="color: #81A1C1">from</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">airflow</span><span style="color: #D8DEE9FF">.</span><span style="color: #8FBCBB">utils</span><span style="color: #D8DEE9FF">.</span><span style="color: #8FBCBB">task_group</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">import</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">TaskGroup</span></span>
<span class="line"><span style="color: #81A1C1">from</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">airflow</span><span style="color: #D8DEE9FF">.</span><span style="color: #8FBCBB">utils</span><span style="color: #D8DEE9FF">.</span><span style="color: #8FBCBB">trigger_rule</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">import</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">TriggerRule</span></span>
<span class="line"></span>
<span class="line"></span>
<span class="line"></span>
<span class="line"></span>
<span class="line"><span style="color: #8FBCBB">SFTP_CONNECTION_ID</span><span style="color: #D8DEE9FF"> = </span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">sftp-21</span><span style="color: #ECEFF4">&quot;</span></span>
<span class="line"></span>
<span class="line"></span>
<span class="line"></span>
<span class="line"></span>
<span class="line"><span style="color: #8FBCBB">with</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">DAG</span><span style="color: #D8DEE9FF">(</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">dag_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">trigger-events-with-users</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">schedule</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">@daily</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">start_date</span><span style="color: #D8DEE9FF">=</span><span style="color: #8FBCBB">datetime</span><span style="color: #D8DEE9FF">(2023</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> 7</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> 1)</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">) </span><span style="color: #8FBCBB">as</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">dag_events_with_users</span><span style="color: #D8DEE9FF">:</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">with</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">TaskGroup</span><span style="color: #D8DEE9FF">(</span><span style="color: #8FBCBB">group_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">gather-events</span><span style="color: #ECEFF4">&quot;</span><span style="color: #D8DEE9FF">) </span><span style="color: #8FBCBB">as</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">group_events</span><span style="color: #D8DEE9FF">:</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	</span><span style="color: #8FBCBB">SFTPSensor</span><span style="color: #D8DEE9FF">(</span></span>
<span class="line"><span style="color: #D8DEE9FF">        	</span><span style="color: #8FBCBB">task_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">wait-for-csv</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">        	</span><span style="color: #8FBCBB">path</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">/inbox/events/{{ dag_run.logical_date | ds }}.csv</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">        	</span><span style="color: #8FBCBB">sftp_conn_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #8FBCBB">SFTP_CONNECTION_ID</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	) &gt;&gt; </span><span style="color: #8FBCBB">SparkSubmitOperator</span><span style="color: #D8DEE9FF">(</span></span>
<span class="line"><span style="color: #D8DEE9FF">        	</span><span style="color: #8FBCBB">task_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">csv-to-parquet</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">        	</span><span style="color: #8FBCBB">application</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">/spark/applications/csv-to-parquet.py</span><span style="color: #ECEFF4">&quot;</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	) &gt;&gt; </span><span style="color: #8FBCBB">SparkSubmitOperator</span><span style="color: #D8DEE9FF">(</span></span>
<span class="line"><span style="color: #D8DEE9FF">        	</span><span style="color: #8FBCBB">task_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">update-hive-table</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">        	</span><span style="color: #8FBCBB">application</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">/spark/applications/update-hive-table.py</span><span style="color: #ECEFF4">&quot;</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	)</span></span>
<span class="line"></span>
<span class="line"></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">with</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">TaskGroup</span><span style="color: #D8DEE9FF">(</span><span style="color: #8FBCBB">group_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">gather-users</span><span style="color: #ECEFF4">&quot;</span><span style="color: #D8DEE9FF">) </span><span style="color: #8FBCBB">as</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">group_users</span><span style="color: #D8DEE9FF">:</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	</span><span style="color: #8FBCBB">SFTPSensor</span><span style="color: #D8DEE9FF">(</span></span>
<span class="line"><span style="color: #D8DEE9FF">        	</span><span style="color: #8FBCBB">task_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">wait-for-csv</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">        	</span><span style="color: #8FBCBB">path</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">/inbox/users/{{ dag_run.logical_date | ds }}.csv</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">        	</span><span style="color: #8FBCBB">sftp_conn_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #8FBCBB">SFTP_CONNECTION_ID</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	) &gt;&gt; </span><span style="color: #8FBCBB">SparkSubmitOperator</span><span style="color: #D8DEE9FF">(</span></span>
<span class="line"><span style="color: #D8DEE9FF">        	</span><span style="color: #8FBCBB">task_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">csv-to-parquet</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">        	</span><span style="color: #8FBCBB">application</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">/spark/applications/csv-to-parquet.py</span><span style="color: #ECEFF4">&quot;</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	) &gt;&gt; </span><span style="color: #8FBCBB">SparkSubmitOperator</span><span style="color: #D8DEE9FF">(</span></span>
<span class="line"><span style="color: #D8DEE9FF">        	</span><span style="color: #8FBCBB">task_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">update-hive-table</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">        	</span><span style="color: #8FBCBB">application</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">/spark/applications/update-hive-table.py</span><span style="color: #ECEFF4">&quot;</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	)</span></span>
<span class="line"></span>
<span class="line"></span>
<span class="line"><span style="color: #D8DEE9FF">	[</span><span style="color: #8FBCBB">group_events</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">group_users</span><span style="color: #D8DEE9FF">] &gt;&gt; </span><span style="color: #8FBCBB">SparkSubmitOperator</span><span style="color: #D8DEE9FF">(</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	    	</span><span style="color: #8FBCBB">task_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">compute-events-with-users</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	    	</span><span style="color: #8FBCBB">application</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">/spark/applications/compute-events-with-users.py</span><span style="color: #ECEFF4">&quot;</span></span>
<span class="line"><span style="color: #D8DEE9FF">	) &gt;&gt; </span><span style="color: #8FBCBB">SparkSubmitOperator</span><span style="color: #D8DEE9FF">(</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	    	</span><span style="color: #8FBCBB">task_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">update-hive-table</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	    	</span><span style="color: #8FBCBB">application</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">/spark/applications/update-hive-table.py</span><span style="color: #ECEFF4">&quot;</span></span>
<span class="line"><span style="color: #D8DEE9FF">	) &gt;&gt; [</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	    	</span><span style="color: #8FBCBB">TriggerDagRunOperator</span><span style="color: #D8DEE9FF">(</span></span>
<span class="line"><span style="color: #D8DEE9FF">        	    	</span><span style="color: #8FBCBB">task_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">trigger-reports-1</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	        	</span><span style="color: #8FBCBB">trigger_dag_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">trigger-reports-1</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">        	    	</span><span style="color: #8FBCBB">trigger_rule</span><span style="color: #D8DEE9FF">=</span><span style="color: #8FBCBB">TriggerRule</span><span style="color: #D8DEE9FF">.</span><span style="color: #8FBCBB">ALL_SUCCESS</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	    	)</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	    	</span><span style="color: #8FBCBB">TriggerDagRunOperator</span><span style="color: #D8DEE9FF">(</span></span>
<span class="line"><span style="color: #D8DEE9FF">        	    	</span><span style="color: #8FBCBB">task_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">trigger-reports-2</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	        	</span><span style="color: #8FBCBB">trigger_dag_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">trigger-reports-2</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">           	 	</span><span style="color: #8FBCBB">trigger_rule</span><span style="color: #D8DEE9FF">=</span><span style="color: #8FBCBB">TriggerRule</span><span style="color: #D8DEE9FF">.</span><span style="color: #8FBCBB">ALL_SUCCESS</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	    	)</span></span>
<span class="line"><span style="color: #D8DEE9FF">	]</span></span>
<span class="line"></span>
<span class="line"></span>
<span class="line"><span style="color: #8FBCBB">with</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">DAG</span><span style="color: #D8DEE9FF">(</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">dag_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">trigger-reports-1</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">catchup</span><span style="color: #D8DEE9FF">=</span><span style="color: #8FBCBB">True</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">start_date</span><span style="color: #D8DEE9FF">=</span><span style="color: #8FBCBB">datetime</span><span style="color: #D8DEE9FF">(2023</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> 7</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> 1)</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">) </span><span style="color: #8FBCBB">as</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">dag_processor_reports_1</span><span style="color: #D8DEE9FF">:</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">SparkSubmitOperator</span><span style="color: #D8DEE9FF">(</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	    	</span><span style="color: #8FBCBB">task_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">compute-report</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	    	</span><span style="color: #8FBCBB">application</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">/spark/applications/compute-report-1.py</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">	)</span></span>
<span class="line"></span>
<span class="line"></span>
<span class="line"><span style="color: #8FBCBB">with</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">DAG</span><span style="color: #D8DEE9FF">(</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">dag_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">trigger-reports-2</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">catchup</span><span style="color: #D8DEE9FF">=</span><span style="color: #8FBCBB">True</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">start_date</span><span style="color: #D8DEE9FF">=</span><span style="color: #8FBCBB">datetime</span><span style="color: #D8DEE9FF">(2023</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> 7</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> 1)</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">) </span><span style="color: #8FBCBB">as</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">dag_processor_reports_2</span><span style="color: #D8DEE9FF">:</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">SparkSubmitOperator</span><span style="color: #D8DEE9FF">(</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	    	</span><span style="color: #8FBCBB">task_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">compute-report</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	    	</span><span style="color: #8FBCBB">application</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">/spark/applications/compute-report-2.py</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">	)</span></span></code></pre></div>



<p></p>



<h1 class="wp-block-heading">The sensor for external DAG</h1>



<p>Using the sensor is exactly the opposite of triggering some other DAG. In this approach, we tell DAG to pause and wait until some other DAG (or a specific task in the DAG) is completed. We still need to specify appropriate schedules for both pipelines, however.</p>



<p>The sensor approach is very flexible. Unlike the datasets approach, each sensor instance can wait for a specific DAG/task/execution date combination, which allows waiting with specific time offset or model aggregation relationships (hourly to daily, daily to weekly, etc.). Unlike the triggering approach, a DAG can model waiting for all preconditions with the sensor approach.</p>



<p>Precisely, like when using DAG triggering, sensor dependency can be checked on the Browse &gt; DAG Dependencies tab.&nbsp;</p>



<h2 class="wp-block-heading">Source code</h2>



<div class="wp-block-kevinbatdorf-code-block-pro" data-code-block-pro-font-family="Code-Pro-JetBrains-Mono" style="font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#2e3440ff"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="from __future__ import annotations


from datetime import datetime, timedelta
from airflow.models import DAG
from airflow.providers.apache.spark.operators.spark_submit import SparkSubmitOperator
from airflow.providers.sftp.sensors.sftp import SFTPSensor
from airflow.sensors.external_task import ExternalTaskSensor




SFTP_CONNECTION_ID = &quot;sftp-21&quot;




with DAG(
    	dag_id=&quot;sensor-events&quot;,
    	schedule_interval=&quot;@daily&quot;,
    	start_date=datetime(2023, 7, 1),
    	catchup=True,
) as sensor_events:
	SFTPSensor(
    	    	task_id=&quot;wait-for-csv&quot;,
    	    	path=&quot;/inbox/events/{{ dag_run.logical_date | ds }}.csv&quot;,
    	    	sftp_conn_id=SFTP_CONNECTION_ID
	) &gt;&gt; SparkSubmitOperator(
    	    	task_id=&quot;csv-to-parquet&quot;,
    	    	application=&quot;/spark/applications/csv-to-parquet.py&quot;
	) &gt;&gt; SparkSubmitOperator(
    	    	task_id=&quot;update-hive-table&quot;,
    	    	application=&quot;/spark/applications/update-hive-table.py&quot;
	)


with DAG(
    	dag_id=&quot;sensor-users&quot;,
    	schedule_interval=&quot;@daily&quot;,
    	start_date=datetime(2023, 7, 1),
    	catchup=True,
) as sensor_users:
	SFTPSensor(
    	    	task_id=&quot;wait-for-csv&quot;,
    	    	path=&quot;/inbox/users/{{ dag_run.logical_date | ds }}.csv&quot;,
    	    	sftp_conn_id=SFTP_CONNECTION_ID
	) &gt;&gt; SparkSubmitOperator(
    	    	task_id=&quot;csv-to-parquet&quot;,
    	    	application=&quot;/spark/applications/csv-to-parquet.py&quot;
	) &gt;&gt; SparkSubmitOperator(
    	    	task_id=&quot;update-hive-table&quot;,
    	    	application=&quot;/spark/applications/update-hive-table.py&quot;
	)


with DAG(
    	dag_id=&quot;sensor-events-with-users&quot;,
    	schedule_interval=&quot;@daily&quot;,
    	start_date=datetime(2023, 7, 1),
    	catchup=True,
) as dag_daily:
	[
    	    	ExternalTaskSensor(
         	   	task_id=&quot;wait-for-events&quot;,
        	    	external_dag_id=&quot;sensor-events&quot;,
    	        	check_existence=True
    	    	),
    	    	ExternalTaskSensor(
         	   	task_id=&quot;wait-for-users&quot;,
        	    	external_dag_id=&quot;sensor-users&quot;,
    	        	check_existence=True
    	    	)
	] &gt;&gt; SparkSubmitOperator(
    	    	task_id=&quot;compute-events-with-users&quot;,
    	    	application=&quot;/spark/applications/compute-events-with-users.py&quot;
	) &gt;&gt; SparkSubmitOperator(
    	    	task_id=&quot;update-hive-table&quot;,
    	    	application=&quot;/spark/applications/update-hive-table.py&quot;
	)


with DAG(
    	dag_id=&quot;sensors-reports-1&quot;,
    	schedule=&quot;@daily&quot;,
    	start_date=datetime(2023, 7, 1),
) as dag_reports_1:
	ExternalTaskSensor(
        		task_id=&quot;wait-for-events-with-users&quot;,
    	    	external_dag_id=&quot;sensor-events-with-users&quot;,
    	    	check_existence=True
	) &gt;&gt; SparkSubmitOperator(
    	    	task_id=&quot;compute-report&quot;,
    	    	application=&quot;/spark/applications/compute-report-1.py&quot;
	)


with DAG(
    	dag_id=&quot;sensors-reports-2&quot;,
    	schedule=&quot;@daily&quot;,
    	start_date=datetime(2023, 7, 1),
) as dag_reports_2:
	ExternalTaskSensor(
    	    	task_id=&quot;wait-for-events-with-users&quot;,
    	    	external_dag_id=&quot;sensor-events-with-users&quot;,
    	    	check_existence=True
	) &gt;&gt; SparkSubmitOperator(
    	    	task_id=&quot;compute-report&quot;,
    	    	application=&quot;/spark/applications/compute-report-2.py&quot;
	)" style="color:#d8dee9ff;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki nord" style="background-color: #2e3440ff" tabindex="0"><code><span class="line"><span style="color: #D8DEE9">from</span><span style="color: #D8DEE9FF"> </span><span style="color: #D8DEE9">__future__</span><span style="color: #D8DEE9FF"> </span><span style="color: #81A1C1">import</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">annotations</span></span>
<span class="line"></span>
<span class="line"></span>
<span class="line"><span style="color: #81A1C1">from</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">datetime</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">import</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">datetime</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">timedelta</span></span>
<span class="line"><span style="color: #81A1C1">from</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">airflow</span><span style="color: #D8DEE9FF">.</span><span style="color: #8FBCBB">models</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">import</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">DAG</span></span>
<span class="line"><span style="color: #81A1C1">from</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">airflow</span><span style="color: #D8DEE9FF">.</span><span style="color: #8FBCBB">providers</span><span style="color: #D8DEE9FF">.</span><span style="color: #8FBCBB">apache</span><span style="color: #D8DEE9FF">.</span><span style="color: #8FBCBB">spark</span><span style="color: #D8DEE9FF">.</span><span style="color: #8FBCBB">operators</span><span style="color: #D8DEE9FF">.</span><span style="color: #8FBCBB">spark_submit</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">import</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">SparkSubmitOperator</span></span>
<span class="line"><span style="color: #81A1C1">from</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">airflow</span><span style="color: #D8DEE9FF">.</span><span style="color: #8FBCBB">providers</span><span style="color: #D8DEE9FF">.</span><span style="color: #8FBCBB">sftp</span><span style="color: #D8DEE9FF">.</span><span style="color: #8FBCBB">sensors</span><span style="color: #D8DEE9FF">.</span><span style="color: #8FBCBB">sftp</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">import</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">SFTPSensor</span></span>
<span class="line"><span style="color: #81A1C1">from</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">airflow</span><span style="color: #D8DEE9FF">.</span><span style="color: #8FBCBB">sensors</span><span style="color: #D8DEE9FF">.</span><span style="color: #8FBCBB">external_task</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">import</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">ExternalTaskSensor</span></span>
<span class="line"></span>
<span class="line"></span>
<span class="line"></span>
<span class="line"></span>
<span class="line"><span style="color: #8FBCBB">SFTP_CONNECTION_ID</span><span style="color: #D8DEE9FF"> = </span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">sftp-21</span><span style="color: #ECEFF4">&quot;</span></span>
<span class="line"></span>
<span class="line"></span>
<span class="line"></span>
<span class="line"></span>
<span class="line"><span style="color: #8FBCBB">with</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">DAG</span><span style="color: #D8DEE9FF">(</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	</span><span style="color: #8FBCBB">dag_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">sensor-events</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	</span><span style="color: #8FBCBB">schedule_interval</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">@daily</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	</span><span style="color: #8FBCBB">start_date</span><span style="color: #D8DEE9FF">=</span><span style="color: #8FBCBB">datetime</span><span style="color: #D8DEE9FF">(2023</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> 7</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> 1)</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	</span><span style="color: #8FBCBB">catchup</span><span style="color: #D8DEE9FF">=</span><span style="color: #8FBCBB">True</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">) </span><span style="color: #8FBCBB">as</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">sensor_events</span><span style="color: #D8DEE9FF">:</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">SFTPSensor</span><span style="color: #D8DEE9FF">(</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	    	</span><span style="color: #8FBCBB">task_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">wait-for-csv</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	    	</span><span style="color: #8FBCBB">path</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">/inbox/events/{{ dag_run.logical_date | ds }}.csv</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	    	</span><span style="color: #8FBCBB">sftp_conn_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #8FBCBB">SFTP_CONNECTION_ID</span></span>
<span class="line"><span style="color: #D8DEE9FF">	) &gt;&gt; </span><span style="color: #8FBCBB">SparkSubmitOperator</span><span style="color: #D8DEE9FF">(</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	    	</span><span style="color: #8FBCBB">task_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">csv-to-parquet</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	    	</span><span style="color: #8FBCBB">application</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">/spark/applications/csv-to-parquet.py</span><span style="color: #ECEFF4">&quot;</span></span>
<span class="line"><span style="color: #D8DEE9FF">	) &gt;&gt; </span><span style="color: #8FBCBB">SparkSubmitOperator</span><span style="color: #D8DEE9FF">(</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	    	</span><span style="color: #8FBCBB">task_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">update-hive-table</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	    	</span><span style="color: #8FBCBB">application</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">/spark/applications/update-hive-table.py</span><span style="color: #ECEFF4">&quot;</span></span>
<span class="line"><span style="color: #D8DEE9FF">	)</span></span>
<span class="line"></span>
<span class="line"></span>
<span class="line"><span style="color: #8FBCBB">with</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">DAG</span><span style="color: #D8DEE9FF">(</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	</span><span style="color: #8FBCBB">dag_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">sensor-users</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	</span><span style="color: #8FBCBB">schedule_interval</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">@daily</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	</span><span style="color: #8FBCBB">start_date</span><span style="color: #D8DEE9FF">=</span><span style="color: #8FBCBB">datetime</span><span style="color: #D8DEE9FF">(2023</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> 7</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> 1)</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	</span><span style="color: #8FBCBB">catchup</span><span style="color: #D8DEE9FF">=</span><span style="color: #8FBCBB">True</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">) </span><span style="color: #8FBCBB">as</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">sensor_users</span><span style="color: #D8DEE9FF">:</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">SFTPSensor</span><span style="color: #D8DEE9FF">(</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	    	</span><span style="color: #8FBCBB">task_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">wait-for-csv</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	    	</span><span style="color: #8FBCBB">path</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">/inbox/users/{{ dag_run.logical_date | ds }}.csv</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	    	</span><span style="color: #8FBCBB">sftp_conn_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #8FBCBB">SFTP_CONNECTION_ID</span></span>
<span class="line"><span style="color: #D8DEE9FF">	) &gt;&gt; </span><span style="color: #8FBCBB">SparkSubmitOperator</span><span style="color: #D8DEE9FF">(</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	    	</span><span style="color: #8FBCBB">task_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">csv-to-parquet</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	    	</span><span style="color: #8FBCBB">application</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">/spark/applications/csv-to-parquet.py</span><span style="color: #ECEFF4">&quot;</span></span>
<span class="line"><span style="color: #D8DEE9FF">	) &gt;&gt; </span><span style="color: #8FBCBB">SparkSubmitOperator</span><span style="color: #D8DEE9FF">(</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	    	</span><span style="color: #8FBCBB">task_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">update-hive-table</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	    	</span><span style="color: #8FBCBB">application</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">/spark/applications/update-hive-table.py</span><span style="color: #ECEFF4">&quot;</span></span>
<span class="line"><span style="color: #D8DEE9FF">	)</span></span>
<span class="line"></span>
<span class="line"></span>
<span class="line"><span style="color: #8FBCBB">with</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">DAG</span><span style="color: #D8DEE9FF">(</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	</span><span style="color: #8FBCBB">dag_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">sensor-events-with-users</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	</span><span style="color: #8FBCBB">schedule_interval</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">@daily</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	</span><span style="color: #8FBCBB">start_date</span><span style="color: #D8DEE9FF">=</span><span style="color: #8FBCBB">datetime</span><span style="color: #D8DEE9FF">(2023</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> 7</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> 1)</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	</span><span style="color: #8FBCBB">catchup</span><span style="color: #D8DEE9FF">=</span><span style="color: #8FBCBB">True</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">) </span><span style="color: #8FBCBB">as</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">dag_daily</span><span style="color: #D8DEE9FF">:</span></span>
<span class="line"><span style="color: #D8DEE9FF">	[</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	    	</span><span style="color: #8FBCBB">ExternalTaskSensor</span><span style="color: #D8DEE9FF">(</span></span>
<span class="line"><span style="color: #D8DEE9FF">         	   	</span><span style="color: #8FBCBB">task_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">wait-for-events</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">        	    	</span><span style="color: #8FBCBB">external_dag_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">sensor-events</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	        	</span><span style="color: #8FBCBB">check_existence</span><span style="color: #D8DEE9FF">=</span><span style="color: #8FBCBB">True</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	    	)</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	    	</span><span style="color: #8FBCBB">ExternalTaskSensor</span><span style="color: #D8DEE9FF">(</span></span>
<span class="line"><span style="color: #D8DEE9FF">         	   	</span><span style="color: #8FBCBB">task_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">wait-for-users</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">        	    	</span><span style="color: #8FBCBB">external_dag_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">sensor-users</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	        	</span><span style="color: #8FBCBB">check_existence</span><span style="color: #D8DEE9FF">=</span><span style="color: #8FBCBB">True</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	    	)</span></span>
<span class="line"><span style="color: #D8DEE9FF">	] &gt;&gt; </span><span style="color: #8FBCBB">SparkSubmitOperator</span><span style="color: #D8DEE9FF">(</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	    	</span><span style="color: #8FBCBB">task_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">compute-events-with-users</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	    	</span><span style="color: #8FBCBB">application</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">/spark/applications/compute-events-with-users.py</span><span style="color: #ECEFF4">&quot;</span></span>
<span class="line"><span style="color: #D8DEE9FF">	) &gt;&gt; </span><span style="color: #8FBCBB">SparkSubmitOperator</span><span style="color: #D8DEE9FF">(</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	    	</span><span style="color: #8FBCBB">task_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">update-hive-table</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	    	</span><span style="color: #8FBCBB">application</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">/spark/applications/update-hive-table.py</span><span style="color: #ECEFF4">&quot;</span></span>
<span class="line"><span style="color: #D8DEE9FF">	)</span></span>
<span class="line"></span>
<span class="line"></span>
<span class="line"><span style="color: #8FBCBB">with</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">DAG</span><span style="color: #D8DEE9FF">(</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	</span><span style="color: #8FBCBB">dag_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">sensors-reports-1</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	</span><span style="color: #8FBCBB">schedule</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">@daily</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	</span><span style="color: #8FBCBB">start_date</span><span style="color: #D8DEE9FF">=</span><span style="color: #8FBCBB">datetime</span><span style="color: #D8DEE9FF">(2023</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> 7</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> 1)</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">) </span><span style="color: #8FBCBB">as</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">dag_reports_1</span><span style="color: #D8DEE9FF">:</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">ExternalTaskSensor</span><span style="color: #D8DEE9FF">(</span></span>
<span class="line"><span style="color: #D8DEE9FF">        		</span><span style="color: #8FBCBB">task_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">wait-for-events-with-users</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	    	</span><span style="color: #8FBCBB">external_dag_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">sensor-events-with-users</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	    	</span><span style="color: #8FBCBB">check_existence</span><span style="color: #D8DEE9FF">=</span><span style="color: #8FBCBB">True</span></span>
<span class="line"><span style="color: #D8DEE9FF">	) &gt;&gt; </span><span style="color: #8FBCBB">SparkSubmitOperator</span><span style="color: #D8DEE9FF">(</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	    	</span><span style="color: #8FBCBB">task_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">compute-report</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	    	</span><span style="color: #8FBCBB">application</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">/spark/applications/compute-report-1.py</span><span style="color: #ECEFF4">&quot;</span></span>
<span class="line"><span style="color: #D8DEE9FF">	)</span></span>
<span class="line"></span>
<span class="line"></span>
<span class="line"><span style="color: #8FBCBB">with</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">DAG</span><span style="color: #D8DEE9FF">(</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	</span><span style="color: #8FBCBB">dag_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">sensors-reports-2</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	</span><span style="color: #8FBCBB">schedule</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">@daily</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	</span><span style="color: #8FBCBB">start_date</span><span style="color: #D8DEE9FF">=</span><span style="color: #8FBCBB">datetime</span><span style="color: #D8DEE9FF">(2023</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> 7</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> 1)</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">) </span><span style="color: #8FBCBB">as</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">dag_reports_2</span><span style="color: #D8DEE9FF">:</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">ExternalTaskSensor</span><span style="color: #D8DEE9FF">(</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	    	</span><span style="color: #8FBCBB">task_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">wait-for-events-with-users</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	    	</span><span style="color: #8FBCBB">external_dag_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">sensor-events-with-users</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	    	</span><span style="color: #8FBCBB">check_existence</span><span style="color: #D8DEE9FF">=</span><span style="color: #8FBCBB">True</span></span>
<span class="line"><span style="color: #D8DEE9FF">	) &gt;&gt; </span><span style="color: #8FBCBB">SparkSubmitOperator</span><span style="color: #D8DEE9FF">(</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	    	</span><span style="color: #8FBCBB">task_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">compute-report</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	    	</span><span style="color: #8FBCBB">application</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">/spark/applications/compute-report-2.py</span><span style="color: #ECEFF4">&quot;</span></span>
<span class="line"><span style="color: #D8DEE9FF">	)</span></span></code></pre></div>



<p></p>



<h1 class="wp-block-heading">Short comparison</h1>



<figure class="wp-block-table wp-block-table--scrolled"><table><tbody><tr><td></td><td><strong><mark style="background-color:rgba(0, 0, 0, 0);color:#c8855a" class="has-inline-color">External</mark></strong></td><td><strong><mark style="background-color:rgba(0, 0, 0, 0);color:#c8855a" class="has-inline-color">Dataset</mark></strong></td><td><strong><mark style="background-color:rgba(0, 0, 0, 0);color:#c8855a" class="has-inline-color">Trigger</mark></strong></td><td><strong><mark style="background-color:rgba(0, 0, 0, 0);color:#c8855a" class="has-inline-color">Sensor</mark></strong></td></tr><tr><td>Multiple triggers for a single DAG</td><td>Yes</td><td>No</td><td>Yes</td><td>Yes (although a bit obscure)</td></tr><tr><td>Multiple requirements to run a single DAG</td><td>Yes</td><td>Yes</td><td>No</td><td>Yes</td></tr><tr><td>Dependencies on different time interval</td><td>Yes</td><td>No</td><td>No</td><td>Yes</td></tr><tr><td><strong>Strong points</strong></td><td>Integration with 3rd party tools.</td><td>Good lineage support in the UI.Push scheduling.</td><td>Push scheduling.</td><td>Allows execution time manipulation (daily to weekly DAGs).</td></tr><tr><td><strong>Weak points</strong></td><td>Sensitive to refactoring.</td><td>No control over schedule intervals.</td><td>Weak error handling. Any DAG can depend on only up to one DAG.</td><td>Pull scheduling.</td></tr></tbody></table><figcaption class="wp-element-caption">The options compared</figcaption></figure>



<h1 class="wp-block-heading">Example scenarios</h1>



<p>The end goal of running Airflow is to orchestrate an actual pipeline, so let’s discuss some actual-life examples where DAG dependencies are necessary.</p>



<h2 class="wp-block-heading">Breaking down complex DAG</h2>



<p>First, DAG is an example of a complex DAG used to calculate multiple outputs (exports) from some external inputs. For clarity, all operations are split into waiting (sensors), preprocessing and reports computation. The screenshot shows them in the form of a Graph view from Airflow.</p>



<p>The assumption is that all operations consume whole input datasets &#8211; they are not restricted to, e.g. current hour only. Otherwise, using sensors or external markers would be the right option.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="575" src="https://tantusdata.com/app/uploads/2023/08/graph.5-1024x575.jpg" alt="" class="wp-image-1756" srcset="https://tantusdata.com/app/uploads/2023/08/graph.5-1024x575.jpg 1024w, https://tantusdata.com/app/uploads/2023/08/graph.5-300x169.jpg 300w, https://tantusdata.com/app/uploads/2023/08/graph.5-768x432.jpg 768w, https://tantusdata.com/app/uploads/2023/08/graph.5-1536x863.jpg 1536w, https://tantusdata.com/app/uploads/2023/08/graph.5-2048x1151.jpg 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>This situation is where datasets (or, according to the docs &#8211; data-aware scheduling) show good results. For each branching point, we define an outlet &#8211; abstraction over the dataset produced by each operator (remember that it is only the metadata &#8211; handling the actual data is not done by Airflow and never should be). Then we define each chain of operations that takes those datasets as a separate DAG with datasets defined as the schedule. As a result, the DAGs are short &amp; simple, and we got a bit of data lineage in the Datasets tab of Airflow for free:</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="300" src="https://tantusdata.com/app/uploads/2023/08/graph.6-1024x300.jpg" alt="" class="wp-image-1744" srcset="https://tantusdata.com/app/uploads/2023/08/graph.6-1024x300.jpg 1024w, https://tantusdata.com/app/uploads/2023/08/graph.6-300x88.jpg 300w, https://tantusdata.com/app/uploads/2023/08/graph.6-768x225.jpg 768w, https://tantusdata.com/app/uploads/2023/08/graph.6-1536x450.jpg 1536w, https://tantusdata.com/app/uploads/2023/08/graph.6-2048x600.jpg 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<h3 class="wp-block-heading">Source code</h3>



<div class="wp-block-kevinbatdorf-code-block-pro" data-code-block-pro-font-family="Code-Pro-JetBrains-Mono" style="font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#2e3440ff"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="from __future__ import annotations


from datetime import datetime


from airflow import Dataset
from airflow.models import DAG
from airflow.operators.bash import BashOperator
from airflow.operators.empty import EmptyOperator
from airflow.sensors.bash import BashSensor


dataset_clients = Dataset(&quot;af://dataset/clients&quot;)
dataset_events = Dataset(&quot;af://dataset/events&quot;)
dataset_transactions = Dataset(&quot;af://dataset/transactions&quot;)
dataset_users = Dataset(&quot;af://dataset/users&quot;)


with DAG(
	dag_id=&quot;complex-dag-clients&quot;,
	schedule_interval=&quot;@daily&quot;,
	start_date=datetime(2023, 7, 1),
) as complex_dag_clients:
	sensor_clients = BashSensor(task_id=&quot;wait-clients&quot;, bash_command=&quot;true&quot;)


	op_preprocess_step_1 = EmptyOperator(task_id=&quot;preprocess-step-1&quot;)
	op_preprocess_step_2 = EmptyOperator(task_id=&quot;preprocess-step-2&quot;)
	op_preprocess_step_3 = EmptyOperator(task_id=&quot;preprocess-step-3&quot;)
	op_preprocess_step_4 = EmptyOperator(task_id=&quot;preprocess-step-4&quot;, outlets=[dataset_clients])


	sensor_clients &gt;&gt; op_preprocess_step_1 &gt;&gt; \
    	[op_preprocess_step_2 &gt;&gt; op_preprocess_step_3] &gt;&gt; \
    	op_preprocess_step_4


with DAG(
	dag_id=&quot;complex-dag-events&quot;,
	schedule_interval=&quot;@daily&quot;,
	start_date=datetime(2023, 7, 1),
) as complex_dag_events:
	sensor_events = BashSensor(task_id=&quot;wait-events&quot;, bash_command=&quot;true&quot;)


	op_preprocess_step_1 = EmptyOperator(task_id=&quot;preprocess-step-1&quot;)
	op_preprocess_step_2 = EmptyOperator(task_id=&quot;preprocess-step-2&quot;)
	op_preprocess_step_3 = EmptyOperator(task_id=&quot;preprocess-step-3&quot;, outlets=[dataset_events])


	sensor_events &gt;&gt; op_preprocess_step_1 &gt;&gt; op_preprocess_step_2 &gt;&gt; op_preprocess_step_3


with DAG(
	dag_id=&quot;complex-dag-transactions&quot;,
	schedule_interval=&quot;@daily&quot;,
	start_date=datetime(2023, 7, 1),
) as complex_dag_transactions:
	sensor_gateway_01 = BashSensor(task_id=&quot;wait-gateway-01&quot;, bash_command=&quot;true&quot;)
	sensor_gateway_02 = BashSensor(task_id=&quot;wait-gateway-02&quot;, bash_command=&quot;true&quot;)
	sensor_gateway_03 = BashSensor(task_id=&quot;wait-gateway-03&quot;, bash_command=&quot;true&quot;)
	sensor_gateway_04 = BashSensor(task_id=&quot;wait-gateway-04&quot;, bash_command=&quot;true&quot;)
	sensor_gateway_05 = BashSensor(task_id=&quot;wait-gateway-05&quot;, bash_command=&quot;true&quot;)


	op_preprocess_step_1 = EmptyOperator(task_id=&quot;preprocess-step-1&quot;)
	op_preprocess_step_2 = EmptyOperator(task_id=&quot;preprocess-step-2&quot;)
	op_preprocess_step_3 = EmptyOperator(task_id=&quot;preprocess-step-3&quot;)
	op_preprocess_step_4 = EmptyOperator(task_id=&quot;preprocess-step-4&quot;)
	op_preprocess_step_5 = EmptyOperator(task_id=&quot;preprocess-step-5&quot;, outlets=[dataset_transactions])


	[
    	sensor_gateway_01,
    	sensor_gateway_02,
    	sensor_gateway_03,
    	sensor_gateway_04,
    	sensor_gateway_05
	] &gt;&gt; op_preprocess_step_1 &gt;&gt; op_preprocess_step_2 &gt;&gt; op_preprocess_step_3 &gt;&gt; \
    	op_preprocess_step_4 &gt;&gt; op_preprocess_step_5


with DAG(
	dag_id=&quot;complex-dag-users&quot;,
	schedule_interval=&quot;@daily&quot;,
	start_date=datetime(2023, 7, 1),
) as complex_dag_users:
	sensor_users = BashSensor(task_id=&quot;wait-users&quot;, bash_command=&quot;true&quot;)


	op_preprocess_step_1 = EmptyOperator(task_id=&quot;preprocess-step-1&quot;)
	op_preprocess_step_2 = EmptyOperator(task_id=&quot;preprocess-step-2&quot;)
	op_preprocess_step_3 = EmptyOperator(task_id=&quot;preprocess-step-3&quot;)
	op_preprocess_step_4 = EmptyOperator(task_id=&quot;preprocess-step-4&quot;)
	op_preprocess_step_5 = EmptyOperator(task_id=&quot;preprocess-step-5&quot;)
	op_preprocess_step_6 = EmptyOperator(task_id=&quot;preprocess-step-6&quot;, outlets=[dataset_users])


	sensor_users &gt;&gt; [op_preprocess_step_1, op_preprocess_step_2] &gt;&gt; op_preprocess_step_3 &gt;&gt; \
    	[op_preprocess_step_4, op_preprocess_step_5] &gt;&gt; op_preprocess_step_6


with DAG(
	dag_id=&quot;complex-dag-report-clients&quot;,
	schedule=[dataset_clients, dataset_events],
	start_date=datetime(2023, 7, 1),
) as complex_dag_report_clients:
	op_rich_clients = EmptyOperator(task_id=&quot;enrich-clients&quot;)
	op_report_clients = BashOperator(task_id=&quot;report-clients&quot;, bash_command=&quot;sleep 3&quot;)


	op_rich_clients &gt;&gt; op_report_clients


with DAG(
	dag_id=&quot;complex-dag-report-users&quot;,
	schedule=[dataset_events, dataset_users],
	start_date=datetime(2023, 7, 1),
) as complex_dag_report_users:
	op_rich_users = EmptyOperator(task_id=&quot;enrich-events&quot;)
	op_report_users = BashOperator(task_id=&quot;report-users&quot;, bash_command=&quot;sleep 3&quot;)


	op_rich_users &gt;&gt; op_report_users


with DAG(
	dag_id=&quot;complex-dag-report-full&quot;,
	schedule=[dataset_events, dataset_transactions, dataset_users],
	start_date=datetime(2023, 7, 1),
) as complex_dag_report_full:
	op_report_users = BashOperator(task_id=&quot;report-full&quot;, bash_command=&quot;sleep 3&quot;)" style="color:#d8dee9ff;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki nord" style="background-color: #2e3440ff" tabindex="0"><code><span class="line"><span style="color: #D8DEE9">from</span><span style="color: #D8DEE9FF"> </span><span style="color: #D8DEE9">__future__</span><span style="color: #D8DEE9FF"> </span><span style="color: #81A1C1">import</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">annotations</span></span>
<span class="line"></span>
<span class="line"></span>
<span class="line"><span style="color: #81A1C1">from</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">datetime</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">import</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">datetime</span></span>
<span class="line"></span>
<span class="line"></span>
<span class="line"><span style="color: #81A1C1">from</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">airflow</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">import</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">Dataset</span></span>
<span class="line"><span style="color: #81A1C1">from</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">airflow</span><span style="color: #D8DEE9FF">.</span><span style="color: #8FBCBB">models</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">import</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">DAG</span></span>
<span class="line"><span style="color: #81A1C1">from</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">airflow</span><span style="color: #D8DEE9FF">.</span><span style="color: #8FBCBB">operators</span><span style="color: #D8DEE9FF">.</span><span style="color: #8FBCBB">bash</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">import</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">BashOperator</span></span>
<span class="line"><span style="color: #81A1C1">from</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">airflow</span><span style="color: #D8DEE9FF">.</span><span style="color: #8FBCBB">operators</span><span style="color: #D8DEE9FF">.</span><span style="color: #8FBCBB">empty</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">import</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">EmptyOperator</span></span>
<span class="line"><span style="color: #81A1C1">from</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">airflow</span><span style="color: #D8DEE9FF">.</span><span style="color: #8FBCBB">sensors</span><span style="color: #D8DEE9FF">.</span><span style="color: #8FBCBB">bash</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">import</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">BashSensor</span></span>
<span class="line"></span>
<span class="line"></span>
<span class="line"><span style="color: #8FBCBB">dataset_clients</span><span style="color: #D8DEE9FF"> = </span><span style="color: #8FBCBB">Dataset</span><span style="color: #D8DEE9FF">(</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">af://dataset/clients</span><span style="color: #ECEFF4">&quot;</span><span style="color: #D8DEE9FF">)</span></span>
<span class="line"><span style="color: #8FBCBB">dataset_events</span><span style="color: #D8DEE9FF"> = </span><span style="color: #8FBCBB">Dataset</span><span style="color: #D8DEE9FF">(</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">af://dataset/events</span><span style="color: #ECEFF4">&quot;</span><span style="color: #D8DEE9FF">)</span></span>
<span class="line"><span style="color: #8FBCBB">dataset_transactions</span><span style="color: #D8DEE9FF"> = </span><span style="color: #8FBCBB">Dataset</span><span style="color: #D8DEE9FF">(</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">af://dataset/transactions</span><span style="color: #ECEFF4">&quot;</span><span style="color: #D8DEE9FF">)</span></span>
<span class="line"><span style="color: #8FBCBB">dataset_users</span><span style="color: #D8DEE9FF"> = </span><span style="color: #8FBCBB">Dataset</span><span style="color: #D8DEE9FF">(</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">af://dataset/users</span><span style="color: #ECEFF4">&quot;</span><span style="color: #D8DEE9FF">)</span></span>
<span class="line"></span>
<span class="line"></span>
<span class="line"><span style="color: #8FBCBB">with</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">DAG</span><span style="color: #D8DEE9FF">(</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">dag_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">complex-dag-clients</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">schedule_interval</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">@daily</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">start_date</span><span style="color: #D8DEE9FF">=</span><span style="color: #8FBCBB">datetime</span><span style="color: #D8DEE9FF">(2023</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> 7</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> 1)</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">) </span><span style="color: #8FBCBB">as</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">complex_dag_clients</span><span style="color: #D8DEE9FF">:</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">sensor_clients</span><span style="color: #D8DEE9FF"> = </span><span style="color: #8FBCBB">BashSensor</span><span style="color: #D8DEE9FF">(</span><span style="color: #8FBCBB">task_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">wait-clients</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">bash_command</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">true</span><span style="color: #ECEFF4">&quot;</span><span style="color: #D8DEE9FF">)</span></span>
<span class="line"></span>
<span class="line"></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">op_preprocess_step_1</span><span style="color: #D8DEE9FF"> = </span><span style="color: #8FBCBB">EmptyOperator</span><span style="color: #D8DEE9FF">(</span><span style="color: #8FBCBB">task_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">preprocess-step-1</span><span style="color: #ECEFF4">&quot;</span><span style="color: #D8DEE9FF">)</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">op_preprocess_step_2</span><span style="color: #D8DEE9FF"> = </span><span style="color: #8FBCBB">EmptyOperator</span><span style="color: #D8DEE9FF">(</span><span style="color: #8FBCBB">task_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">preprocess-step-2</span><span style="color: #ECEFF4">&quot;</span><span style="color: #D8DEE9FF">)</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">op_preprocess_step_3</span><span style="color: #D8DEE9FF"> = </span><span style="color: #8FBCBB">EmptyOperator</span><span style="color: #D8DEE9FF">(</span><span style="color: #8FBCBB">task_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">preprocess-step-3</span><span style="color: #ECEFF4">&quot;</span><span style="color: #D8DEE9FF">)</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">op_preprocess_step_4</span><span style="color: #D8DEE9FF"> = </span><span style="color: #8FBCBB">EmptyOperator</span><span style="color: #D8DEE9FF">(</span><span style="color: #8FBCBB">task_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">preprocess-step-4</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">outlets</span><span style="color: #D8DEE9FF">=[</span><span style="color: #8FBCBB">dataset_clients</span><span style="color: #D8DEE9FF">])</span></span>
<span class="line"></span>
<span class="line"></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">sensor_clients</span><span style="color: #D8DEE9FF"> &gt;&gt; </span><span style="color: #8FBCBB">op_preprocess_step_1</span><span style="color: #D8DEE9FF"> &gt;&gt; \</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	[</span><span style="color: #8FBCBB">op_preprocess_step_2</span><span style="color: #D8DEE9FF"> &gt;&gt; </span><span style="color: #8FBCBB">op_preprocess_step_3</span><span style="color: #D8DEE9FF">] &gt;&gt; \</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	</span><span style="color: #8FBCBB">op_preprocess_step_4</span></span>
<span class="line"></span>
<span class="line"></span>
<span class="line"><span style="color: #8FBCBB">with</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">DAG</span><span style="color: #D8DEE9FF">(</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">dag_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">complex-dag-events</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">schedule_interval</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">@daily</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">start_date</span><span style="color: #D8DEE9FF">=</span><span style="color: #8FBCBB">datetime</span><span style="color: #D8DEE9FF">(2023</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> 7</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> 1)</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">) </span><span style="color: #8FBCBB">as</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">complex_dag_events</span><span style="color: #D8DEE9FF">:</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">sensor_events</span><span style="color: #D8DEE9FF"> = </span><span style="color: #8FBCBB">BashSensor</span><span style="color: #D8DEE9FF">(</span><span style="color: #8FBCBB">task_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">wait-events</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">bash_command</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">true</span><span style="color: #ECEFF4">&quot;</span><span style="color: #D8DEE9FF">)</span></span>
<span class="line"></span>
<span class="line"></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">op_preprocess_step_1</span><span style="color: #D8DEE9FF"> = </span><span style="color: #8FBCBB">EmptyOperator</span><span style="color: #D8DEE9FF">(</span><span style="color: #8FBCBB">task_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">preprocess-step-1</span><span style="color: #ECEFF4">&quot;</span><span style="color: #D8DEE9FF">)</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">op_preprocess_step_2</span><span style="color: #D8DEE9FF"> = </span><span style="color: #8FBCBB">EmptyOperator</span><span style="color: #D8DEE9FF">(</span><span style="color: #8FBCBB">task_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">preprocess-step-2</span><span style="color: #ECEFF4">&quot;</span><span style="color: #D8DEE9FF">)</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">op_preprocess_step_3</span><span style="color: #D8DEE9FF"> = </span><span style="color: #8FBCBB">EmptyOperator</span><span style="color: #D8DEE9FF">(</span><span style="color: #8FBCBB">task_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">preprocess-step-3</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">outlets</span><span style="color: #D8DEE9FF">=[</span><span style="color: #8FBCBB">dataset_events</span><span style="color: #D8DEE9FF">])</span></span>
<span class="line"></span>
<span class="line"></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">sensor_events</span><span style="color: #D8DEE9FF"> &gt;&gt; </span><span style="color: #8FBCBB">op_preprocess_step_1</span><span style="color: #D8DEE9FF"> &gt;&gt; </span><span style="color: #8FBCBB">op_preprocess_step_2</span><span style="color: #D8DEE9FF"> &gt;&gt; </span><span style="color: #8FBCBB">op_preprocess_step_3</span></span>
<span class="line"></span>
<span class="line"></span>
<span class="line"><span style="color: #8FBCBB">with</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">DAG</span><span style="color: #D8DEE9FF">(</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">dag_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">complex-dag-transactions</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">schedule_interval</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">@daily</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">start_date</span><span style="color: #D8DEE9FF">=</span><span style="color: #8FBCBB">datetime</span><span style="color: #D8DEE9FF">(2023</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> 7</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> 1)</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">) </span><span style="color: #8FBCBB">as</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">complex_dag_transactions</span><span style="color: #D8DEE9FF">:</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">sensor_gateway_01</span><span style="color: #D8DEE9FF"> = </span><span style="color: #8FBCBB">BashSensor</span><span style="color: #D8DEE9FF">(</span><span style="color: #8FBCBB">task_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">wait-gateway-01</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">bash_command</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">true</span><span style="color: #ECEFF4">&quot;</span><span style="color: #D8DEE9FF">)</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">sensor_gateway_02</span><span style="color: #D8DEE9FF"> = </span><span style="color: #8FBCBB">BashSensor</span><span style="color: #D8DEE9FF">(</span><span style="color: #8FBCBB">task_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">wait-gateway-02</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">bash_command</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">true</span><span style="color: #ECEFF4">&quot;</span><span style="color: #D8DEE9FF">)</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">sensor_gateway_03</span><span style="color: #D8DEE9FF"> = </span><span style="color: #8FBCBB">BashSensor</span><span style="color: #D8DEE9FF">(</span><span style="color: #8FBCBB">task_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">wait-gateway-03</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">bash_command</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">true</span><span style="color: #ECEFF4">&quot;</span><span style="color: #D8DEE9FF">)</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">sensor_gateway_04</span><span style="color: #D8DEE9FF"> = </span><span style="color: #8FBCBB">BashSensor</span><span style="color: #D8DEE9FF">(</span><span style="color: #8FBCBB">task_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">wait-gateway-04</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">bash_command</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">true</span><span style="color: #ECEFF4">&quot;</span><span style="color: #D8DEE9FF">)</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">sensor_gateway_05</span><span style="color: #D8DEE9FF"> = </span><span style="color: #8FBCBB">BashSensor</span><span style="color: #D8DEE9FF">(</span><span style="color: #8FBCBB">task_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">wait-gateway-05</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">bash_command</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">true</span><span style="color: #ECEFF4">&quot;</span><span style="color: #D8DEE9FF">)</span></span>
<span class="line"></span>
<span class="line"></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">op_preprocess_step_1</span><span style="color: #D8DEE9FF"> = </span><span style="color: #8FBCBB">EmptyOperator</span><span style="color: #D8DEE9FF">(</span><span style="color: #8FBCBB">task_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">preprocess-step-1</span><span style="color: #ECEFF4">&quot;</span><span style="color: #D8DEE9FF">)</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">op_preprocess_step_2</span><span style="color: #D8DEE9FF"> = </span><span style="color: #8FBCBB">EmptyOperator</span><span style="color: #D8DEE9FF">(</span><span style="color: #8FBCBB">task_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">preprocess-step-2</span><span style="color: #ECEFF4">&quot;</span><span style="color: #D8DEE9FF">)</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">op_preprocess_step_3</span><span style="color: #D8DEE9FF"> = </span><span style="color: #8FBCBB">EmptyOperator</span><span style="color: #D8DEE9FF">(</span><span style="color: #8FBCBB">task_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">preprocess-step-3</span><span style="color: #ECEFF4">&quot;</span><span style="color: #D8DEE9FF">)</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">op_preprocess_step_4</span><span style="color: #D8DEE9FF"> = </span><span style="color: #8FBCBB">EmptyOperator</span><span style="color: #D8DEE9FF">(</span><span style="color: #8FBCBB">task_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">preprocess-step-4</span><span style="color: #ECEFF4">&quot;</span><span style="color: #D8DEE9FF">)</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">op_preprocess_step_5</span><span style="color: #D8DEE9FF"> = </span><span style="color: #8FBCBB">EmptyOperator</span><span style="color: #D8DEE9FF">(</span><span style="color: #8FBCBB">task_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">preprocess-step-5</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">outlets</span><span style="color: #D8DEE9FF">=[</span><span style="color: #8FBCBB">dataset_transactions</span><span style="color: #D8DEE9FF">])</span></span>
<span class="line"></span>
<span class="line"></span>
<span class="line"><span style="color: #D8DEE9FF">	[</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	</span><span style="color: #8FBCBB">sensor_gateway_01</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	</span><span style="color: #8FBCBB">sensor_gateway_02</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	</span><span style="color: #8FBCBB">sensor_gateway_03</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	</span><span style="color: #8FBCBB">sensor_gateway_04</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	</span><span style="color: #8FBCBB">sensor_gateway_05</span></span>
<span class="line"><span style="color: #D8DEE9FF">	] &gt;&gt; </span><span style="color: #8FBCBB">op_preprocess_step_1</span><span style="color: #D8DEE9FF"> &gt;&gt; </span><span style="color: #8FBCBB">op_preprocess_step_2</span><span style="color: #D8DEE9FF"> &gt;&gt; </span><span style="color: #8FBCBB">op_preprocess_step_3</span><span style="color: #D8DEE9FF"> &gt;&gt; \</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	</span><span style="color: #8FBCBB">op_preprocess_step_4</span><span style="color: #D8DEE9FF"> &gt;&gt; </span><span style="color: #8FBCBB">op_preprocess_step_5</span></span>
<span class="line"></span>
<span class="line"></span>
<span class="line"><span style="color: #8FBCBB">with</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">DAG</span><span style="color: #D8DEE9FF">(</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">dag_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">complex-dag-users</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">schedule_interval</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">@daily</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">start_date</span><span style="color: #D8DEE9FF">=</span><span style="color: #8FBCBB">datetime</span><span style="color: #D8DEE9FF">(2023</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> 7</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> 1)</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">) </span><span style="color: #8FBCBB">as</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">complex_dag_users</span><span style="color: #D8DEE9FF">:</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">sensor_users</span><span style="color: #D8DEE9FF"> = </span><span style="color: #8FBCBB">BashSensor</span><span style="color: #D8DEE9FF">(</span><span style="color: #8FBCBB">task_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">wait-users</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">bash_command</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">true</span><span style="color: #ECEFF4">&quot;</span><span style="color: #D8DEE9FF">)</span></span>
<span class="line"></span>
<span class="line"></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">op_preprocess_step_1</span><span style="color: #D8DEE9FF"> = </span><span style="color: #8FBCBB">EmptyOperator</span><span style="color: #D8DEE9FF">(</span><span style="color: #8FBCBB">task_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">preprocess-step-1</span><span style="color: #ECEFF4">&quot;</span><span style="color: #D8DEE9FF">)</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">op_preprocess_step_2</span><span style="color: #D8DEE9FF"> = </span><span style="color: #8FBCBB">EmptyOperator</span><span style="color: #D8DEE9FF">(</span><span style="color: #8FBCBB">task_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">preprocess-step-2</span><span style="color: #ECEFF4">&quot;</span><span style="color: #D8DEE9FF">)</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">op_preprocess_step_3</span><span style="color: #D8DEE9FF"> = </span><span style="color: #8FBCBB">EmptyOperator</span><span style="color: #D8DEE9FF">(</span><span style="color: #8FBCBB">task_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">preprocess-step-3</span><span style="color: #ECEFF4">&quot;</span><span style="color: #D8DEE9FF">)</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">op_preprocess_step_4</span><span style="color: #D8DEE9FF"> = </span><span style="color: #8FBCBB">EmptyOperator</span><span style="color: #D8DEE9FF">(</span><span style="color: #8FBCBB">task_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">preprocess-step-4</span><span style="color: #ECEFF4">&quot;</span><span style="color: #D8DEE9FF">)</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">op_preprocess_step_5</span><span style="color: #D8DEE9FF"> = </span><span style="color: #8FBCBB">EmptyOperator</span><span style="color: #D8DEE9FF">(</span><span style="color: #8FBCBB">task_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">preprocess-step-5</span><span style="color: #ECEFF4">&quot;</span><span style="color: #D8DEE9FF">)</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">op_preprocess_step_6</span><span style="color: #D8DEE9FF"> = </span><span style="color: #8FBCBB">EmptyOperator</span><span style="color: #D8DEE9FF">(</span><span style="color: #8FBCBB">task_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">preprocess-step-6</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">outlets</span><span style="color: #D8DEE9FF">=[</span><span style="color: #8FBCBB">dataset_users</span><span style="color: #D8DEE9FF">])</span></span>
<span class="line"></span>
<span class="line"></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">sensor_users</span><span style="color: #D8DEE9FF"> &gt;&gt; [</span><span style="color: #8FBCBB">op_preprocess_step_1</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">op_preprocess_step_2</span><span style="color: #D8DEE9FF">] &gt;&gt; </span><span style="color: #8FBCBB">op_preprocess_step_3</span><span style="color: #D8DEE9FF"> &gt;&gt; \</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	[</span><span style="color: #8FBCBB">op_preprocess_step_4</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">op_preprocess_step_5</span><span style="color: #D8DEE9FF">] &gt;&gt; </span><span style="color: #8FBCBB">op_preprocess_step_6</span></span>
<span class="line"></span>
<span class="line"></span>
<span class="line"><span style="color: #8FBCBB">with</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">DAG</span><span style="color: #D8DEE9FF">(</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">dag_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">complex-dag-report-clients</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">schedule</span><span style="color: #D8DEE9FF">=[</span><span style="color: #8FBCBB">dataset_clients</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">dataset_events</span><span style="color: #D8DEE9FF">]</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">start_date</span><span style="color: #D8DEE9FF">=</span><span style="color: #8FBCBB">datetime</span><span style="color: #D8DEE9FF">(2023</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> 7</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> 1)</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">) </span><span style="color: #8FBCBB">as</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">complex_dag_report_clients</span><span style="color: #D8DEE9FF">:</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">op_rich_clients</span><span style="color: #D8DEE9FF"> = </span><span style="color: #8FBCBB">EmptyOperator</span><span style="color: #D8DEE9FF">(</span><span style="color: #8FBCBB">task_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">enrich-clients</span><span style="color: #ECEFF4">&quot;</span><span style="color: #D8DEE9FF">)</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">op_report_clients</span><span style="color: #D8DEE9FF"> = </span><span style="color: #8FBCBB">BashOperator</span><span style="color: #D8DEE9FF">(</span><span style="color: #8FBCBB">task_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">report-clients</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">bash_command</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">sleep 3</span><span style="color: #ECEFF4">&quot;</span><span style="color: #D8DEE9FF">)</span></span>
<span class="line"></span>
<span class="line"></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">op_rich_clients</span><span style="color: #D8DEE9FF"> &gt;&gt; </span><span style="color: #8FBCBB">op_report_clients</span></span>
<span class="line"></span>
<span class="line"></span>
<span class="line"><span style="color: #8FBCBB">with</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">DAG</span><span style="color: #D8DEE9FF">(</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">dag_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">complex-dag-report-users</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">schedule</span><span style="color: #D8DEE9FF">=[</span><span style="color: #8FBCBB">dataset_events</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">dataset_users</span><span style="color: #D8DEE9FF">]</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">start_date</span><span style="color: #D8DEE9FF">=</span><span style="color: #8FBCBB">datetime</span><span style="color: #D8DEE9FF">(2023</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> 7</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> 1)</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">) </span><span style="color: #8FBCBB">as</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">complex_dag_report_users</span><span style="color: #D8DEE9FF">:</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">op_rich_users</span><span style="color: #D8DEE9FF"> = </span><span style="color: #8FBCBB">EmptyOperator</span><span style="color: #D8DEE9FF">(</span><span style="color: #8FBCBB">task_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">enrich-events</span><span style="color: #ECEFF4">&quot;</span><span style="color: #D8DEE9FF">)</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">op_report_users</span><span style="color: #D8DEE9FF"> = </span><span style="color: #8FBCBB">BashOperator</span><span style="color: #D8DEE9FF">(</span><span style="color: #8FBCBB">task_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">report-users</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">bash_command</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">sleep 3</span><span style="color: #ECEFF4">&quot;</span><span style="color: #D8DEE9FF">)</span></span>
<span class="line"></span>
<span class="line"></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">op_rich_users</span><span style="color: #D8DEE9FF"> &gt;&gt; </span><span style="color: #8FBCBB">op_report_users</span></span>
<span class="line"></span>
<span class="line"></span>
<span class="line"><span style="color: #8FBCBB">with</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">DAG</span><span style="color: #D8DEE9FF">(</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">dag_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">complex-dag-report-full</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">schedule</span><span style="color: #D8DEE9FF">=[</span><span style="color: #8FBCBB">dataset_events</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">dataset_transactions</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">dataset_users</span><span style="color: #D8DEE9FF">]</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">start_date</span><span style="color: #D8DEE9FF">=</span><span style="color: #8FBCBB">datetime</span><span style="color: #D8DEE9FF">(2023</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> 7</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> 1)</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">) </span><span style="color: #8FBCBB">as</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">complex_dag_report_full</span><span style="color: #D8DEE9FF">:</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">op_report_users</span><span style="color: #D8DEE9FF"> = </span><span style="color: #8FBCBB">BashOperator</span><span style="color: #D8DEE9FF">(</span><span style="color: #8FBCBB">task_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">report-full</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">bash_command</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">sleep 3</span><span style="color: #ECEFF4">&quot;</span><span style="color: #D8DEE9FF">)</span></span></code></pre></div>



<p></p>



<h2 class="wp-block-heading">Daily to weekly transition</h2>



<p>This example covers any time-based aggregation of the data. Regardless of the actual time intervals, the general idea is that multiple runs of one DAG are used as input for the other one:</p>



<ul class="wp-block-list">
<li>24 runs of the hourly pipeline for daily one,</li>



<li>7 runs of the daily DAG for the weekly one.</li>
</ul>



<p>This setting limits available options to using an external tool or implementing DAG-run sensors. Let’s take a look at the latter one.</p>



<figure class="wp-block-image size-large"><img loading="lazy" decoding="async" width="1024" height="300" src="https://tantusdata.com/app/uploads/2023/08/graph7-1024x300.jpg" alt="" class="wp-image-1739" srcset="https://tantusdata.com/app/uploads/2023/08/graph7-1024x300.jpg 1024w, https://tantusdata.com/app/uploads/2023/08/graph7-300x88.jpg 300w, https://tantusdata.com/app/uploads/2023/08/graph7-768x225.jpg 768w, https://tantusdata.com/app/uploads/2023/08/graph7-1536x450.jpg 1536w, https://tantusdata.com/app/uploads/2023/08/graph7-2048x600.jpg 2048w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /></figure>



<p>The idea behind this diagram is simple &#8211; the daily pipeline is unaffected by the relationship. At the same time, the weekly one starts with seven sensors of ExternalTaskSensor, each with a different execution date offset. Those have been waiting for daily DAGs from 1 to 7 days ago. Apart from that set of sensors, the DAG is similar to others.</p>



<h3 class="wp-block-heading">Source code</h3>



<div class="wp-block-kevinbatdorf-code-block-pro" data-code-block-pro-font-family="Code-Pro-JetBrains-Mono" style="font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;line-height:1.25rem;--cbp-tab-width:2;tab-size:var(--cbp-tab-width, 2)"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#2e3440ff"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="from __future__ import annotations


from datetime import datetime, timedelta
from airflow.models import DAG
from airflow.operators.bash import BashOperator
from airflow.sensors.bash import BashSensor
from airflow.sensors.external_task import ExternalTaskSensor
from airflow.utils.task_group import TaskGroup


with DAG(
    	dag_id=&quot;dag-daily&quot;,
    	schedule_interval=&quot;@daily&quot;,
    	start_date=datetime(2023, 7, 1),
    	catchup=True,
) as dag_daily:
	BashSensor(task_id=&quot;some-input-wait&quot;, bash_command=&quot;true&quot;) &gt;&gt; \
    	BashOperator(task_id=&quot;some-data-processing-1&quot;, bash_command=&quot;sleep 3&quot;) &gt;&gt; \
    	BashOperator(task_id=&quot;some-data-processing-2&quot;, bash_command=&quot;sleep 3&quot;) &gt;&gt; \
    	BashOperator(task_id=&quot;exporting-to-hdfs&quot;, bash_command=&quot;sleep 3&quot;)


with DAG(
    	dag_id=&quot;dag-weekly&quot;,
    	schedule_interval=&quot;@weekly&quot;,
    	start_date=datetime(2023, 7, 1),
    	catchup=True,
) as dag_weekly:
	# TaskGroup is purely optional, but makes DAG in the UI a bit clearer.
	with TaskGroup(group_id=&quot;dag-daily-sensors&quot;) as sensors:
    	[
        	ExternalTaskSensor(
            		task_id=f&quot;wait-{days_offset}d&quot;,
	            	external_dag_id=&quot;dag-daily&quot;,
       		     	timeout=24 * 60 * 60,
            		mode=&quot;reschedule&quot;,
	            	execution_delta=timedelta(days=days_offset)
        	) for days_offset in range(1, 8)
    	]


	sensors &gt;&gt; \
    		BashOperator(task_id=&quot;some-data-processing&quot;, bash_command=&quot;sleep 3&quot;) &gt;&gt; \
	    	BashOperator(task_id=&quot;exporting-to-hdfs&quot;, bash_command=&quot;sleep 3&quot;)" style="color:#d8dee9ff;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki nord" style="background-color: #2e3440ff" tabindex="0"><code><span class="line"><span style="color: #D8DEE9">from</span><span style="color: #D8DEE9FF"> </span><span style="color: #D8DEE9">__future__</span><span style="color: #D8DEE9FF"> </span><span style="color: #81A1C1">import</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">annotations</span></span>
<span class="line"></span>
<span class="line"></span>
<span class="line"><span style="color: #81A1C1">from</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">datetime</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">import</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">datetime</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">timedelta</span></span>
<span class="line"><span style="color: #81A1C1">from</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">airflow</span><span style="color: #D8DEE9FF">.</span><span style="color: #8FBCBB">models</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">import</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">DAG</span></span>
<span class="line"><span style="color: #81A1C1">from</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">airflow</span><span style="color: #D8DEE9FF">.</span><span style="color: #8FBCBB">operators</span><span style="color: #D8DEE9FF">.</span><span style="color: #8FBCBB">bash</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">import</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">BashOperator</span></span>
<span class="line"><span style="color: #81A1C1">from</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">airflow</span><span style="color: #D8DEE9FF">.</span><span style="color: #8FBCBB">sensors</span><span style="color: #D8DEE9FF">.</span><span style="color: #8FBCBB">bash</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">import</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">BashSensor</span></span>
<span class="line"><span style="color: #81A1C1">from</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">airflow</span><span style="color: #D8DEE9FF">.</span><span style="color: #8FBCBB">sensors</span><span style="color: #D8DEE9FF">.</span><span style="color: #8FBCBB">external_task</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">import</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">ExternalTaskSensor</span></span>
<span class="line"><span style="color: #81A1C1">from</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">airflow</span><span style="color: #D8DEE9FF">.</span><span style="color: #8FBCBB">utils</span><span style="color: #D8DEE9FF">.</span><span style="color: #8FBCBB">task_group</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">import</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">TaskGroup</span></span>
<span class="line"></span>
<span class="line"></span>
<span class="line"><span style="color: #8FBCBB">with</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">DAG</span><span style="color: #D8DEE9FF">(</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	</span><span style="color: #8FBCBB">dag_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">dag-daily</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	</span><span style="color: #8FBCBB">schedule_interval</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">@daily</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	</span><span style="color: #8FBCBB">start_date</span><span style="color: #D8DEE9FF">=</span><span style="color: #8FBCBB">datetime</span><span style="color: #D8DEE9FF">(2023</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> 7</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> 1)</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	</span><span style="color: #8FBCBB">catchup</span><span style="color: #D8DEE9FF">=</span><span style="color: #8FBCBB">True</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">) </span><span style="color: #8FBCBB">as</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">dag_daily</span><span style="color: #D8DEE9FF">:</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">BashSensor</span><span style="color: #D8DEE9FF">(</span><span style="color: #8FBCBB">task_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">some-input-wait</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">bash_command</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">true</span><span style="color: #ECEFF4">&quot;</span><span style="color: #D8DEE9FF">) &gt;&gt; \</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	</span><span style="color: #8FBCBB">BashOperator</span><span style="color: #D8DEE9FF">(</span><span style="color: #8FBCBB">task_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">some-data-processing-1</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">bash_command</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">sleep 3</span><span style="color: #ECEFF4">&quot;</span><span style="color: #D8DEE9FF">) &gt;&gt; \</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	</span><span style="color: #8FBCBB">BashOperator</span><span style="color: #D8DEE9FF">(</span><span style="color: #8FBCBB">task_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">some-data-processing-2</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">bash_command</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">sleep 3</span><span style="color: #ECEFF4">&quot;</span><span style="color: #D8DEE9FF">) &gt;&gt; \</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	</span><span style="color: #8FBCBB">BashOperator</span><span style="color: #D8DEE9FF">(</span><span style="color: #8FBCBB">task_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">exporting-to-hdfs</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">bash_command</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">sleep 3</span><span style="color: #ECEFF4">&quot;</span><span style="color: #D8DEE9FF">)</span></span>
<span class="line"></span>
<span class="line"></span>
<span class="line"><span style="color: #8FBCBB">with</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">DAG</span><span style="color: #D8DEE9FF">(</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	</span><span style="color: #8FBCBB">dag_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">dag-weekly</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	</span><span style="color: #8FBCBB">schedule_interval</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">@weekly</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	</span><span style="color: #8FBCBB">start_date</span><span style="color: #D8DEE9FF">=</span><span style="color: #8FBCBB">datetime</span><span style="color: #D8DEE9FF">(2023</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> 7</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> 1)</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	</span><span style="color: #8FBCBB">catchup</span><span style="color: #D8DEE9FF">=</span><span style="color: #8FBCBB">True</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">) </span><span style="color: #8FBCBB">as</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">dag_weekly</span><span style="color: #D8DEE9FF">:</span></span>
<span class="line"><span style="color: #D8DEE9FF">	# </span><span style="color: #8FBCBB">TaskGroup</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">is</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">purely</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">optional</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">but</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">makes</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">DAG</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">in</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">the</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">UI</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">a</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">bit</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">clearer</span><span style="color: #D8DEE9FF">.</span></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">with</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">TaskGroup</span><span style="color: #D8DEE9FF">(</span><span style="color: #8FBCBB">group_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">dag-daily-sensors</span><span style="color: #ECEFF4">&quot;</span><span style="color: #D8DEE9FF">) </span><span style="color: #8FBCBB">as</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">sensors</span><span style="color: #D8DEE9FF">:</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	[</span></span>
<span class="line"><span style="color: #D8DEE9FF">        	</span><span style="color: #8FBCBB">ExternalTaskSensor</span><span style="color: #D8DEE9FF">(</span></span>
<span class="line"><span style="color: #D8DEE9FF">            		</span><span style="color: #8FBCBB">task_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #8FBCBB">f</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">wait-{days_offset}d</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">	            	</span><span style="color: #8FBCBB">external_dag_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">dag-daily</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">       		     	</span><span style="color: #8FBCBB">timeout</span><span style="color: #D8DEE9FF">=24 </span><span style="color: #81A1C1">*</span><span style="color: #D8DEE9FF"> 60 </span><span style="color: #81A1C1">*</span><span style="color: #D8DEE9FF"> 60</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">            		</span><span style="color: #8FBCBB">mode</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">reschedule</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span></span>
<span class="line"><span style="color: #D8DEE9FF">	            	</span><span style="color: #8FBCBB">execution_delta</span><span style="color: #D8DEE9FF">=</span><span style="color: #8FBCBB">timedelta</span><span style="color: #D8DEE9FF">(</span><span style="color: #8FBCBB">days</span><span style="color: #D8DEE9FF">=</span><span style="color: #8FBCBB">days_offset</span><span style="color: #D8DEE9FF">)</span></span>
<span class="line"><span style="color: #D8DEE9FF">        	) </span><span style="color: #8FBCBB">for</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">days_offset</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">in</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">range</span><span style="color: #D8DEE9FF">(1</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> 8)</span></span>
<span class="line"><span style="color: #D8DEE9FF">    	]</span></span>
<span class="line"></span>
<span class="line"></span>
<span class="line"><span style="color: #D8DEE9FF">	</span><span style="color: #8FBCBB">sensors</span><span style="color: #D8DEE9FF"> &gt;&gt; \</span></span>
<span class="line"><span style="color: #D8DEE9FF">    		</span><span style="color: #8FBCBB">BashOperator</span><span style="color: #D8DEE9FF">(</span><span style="color: #8FBCBB">task_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">some-data-processing</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">bash_command</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">sleep 3</span><span style="color: #ECEFF4">&quot;</span><span style="color: #D8DEE9FF">) &gt;&gt; \</span></span>
<span class="line"><span style="color: #D8DEE9FF">	    	</span><span style="color: #8FBCBB">BashOperator</span><span style="color: #D8DEE9FF">(</span><span style="color: #8FBCBB">task_id</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">exporting-to-hdfs</span><span style="color: #ECEFF4">&quot;</span><span style="color: #ECEFF4">,</span><span style="color: #D8DEE9FF"> </span><span style="color: #8FBCBB">bash_command</span><span style="color: #D8DEE9FF">=</span><span style="color: #ECEFF4">&quot;</span><span style="color: #A3BE8C">sleep 3</span><span style="color: #ECEFF4">&quot;</span><span style="color: #D8DEE9FF">)</span></span></code></pre></div>



<p></p>



<p>The source code for all examples and the docker environment to run them is available on GitHub.</p>
<p>The post <a href="https://tantusdata.com/insights/managing-inter-dag-dependencies-in-airflow/">Managing inter-DAG dependencies in Airflow</a> appeared first on <a href="https://tantusdata.com">TantusData</a>.</p>
]]></content:encoded>
					
		
		
			</item>
	</channel>
</rss>
