ExternalTaskSensor as a second operator type #318
Replies: 2 comments
-
I think it would be especially useful in Feature Store Analytics where we have dependencies between our jobs and problem with empty new partition occurs sometimes. But I am not sure if it will be useful for other teams. In my opinion, most of the processes depend on DAGS(tables) which are started in different Airflow instance |
Beta Was this translation helpful? Give feedback.
-
The empty partition is actually a valid point. We can of course merge workflows into a single DAG, but that's problematic in big codebases like yours. Also, because bigflow generated dags don't support airflow-based backfill (you have to deploy a separate dag with a proper start-time) and in that case, it's easier to cherry-pick workflows you want to backfill, than operating on Airflow GUI/CLI to skip some stuff or even modify code to do that. The first option is like opening the pandora's box in a weird way. If we want to allow people to use all airflow features, I suggest doing it by just allowing them to write Airflow DAGs. So the first option is "no" for me. The In general – approved. |
Beta Was this translation helpful? Give feedback.
-
The problem
BigFlow offers sensor jobs that wait for the fresh partition in BigQuery. This solution works well if the data flow is unbroken and the new partition is always non-empty. However, if the processing has no data to process but the sink table is a dependency for another workflow, the user has to create dummy partitions just for the workflow to start.
Solution
Airflow offers ExternalTaskSensor which waits for a dag or a specific task in the dag. This operator could be useful for workflows waiting for other workflows on the same airflow instance (which is often the case) without the need to create dummy records.
Possible implementations
DAG task generation in Job class
The
build_dag_operator
function which lives insidegenerate_dag_file
could be moved to theWorkflowJob
class (with an appropriate abstract method in theJob
class). Then create a new classExternalSensorJob
which would have a different implementation of said method.Pros:
Cons:
New Job class and an
if
A simpler implementation could be to add a new class
ExternalSensorJob
inheriting from theJob
class and use an if statement to distinguish between the different operators when generating a DAG.Pros:
Cons:
Things to consider
Beta Was this translation helpful? Give feedback.
All reactions