Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: @task.skip_if decorator fails when used with @task.virtualenv in Airflow TaskFlow API #43354

Closed
1 of 2 tasks
pedro-cf opened this issue Oct 24, 2024 · 6 comments
Closed
1 of 2 tasks
Labels
area:core kind:bug This is a clearly a bug

Comments

@pedro-cf
Copy link

pedro-cf commented Oct 24, 2024

Apache Airflow version

2.10.2

If "Other Airflow 2 version" selected, which one?

No response

What happened?

When using the @task.skip_if decorator in combination with the @task.virtualenv decorator in an Airflow DAG using the TaskFlow API, the task fails with a NameError: name 'task' is not defined error.

What you think should happen instead?

The task should execute.

How to reproduce

Run the following DAG:

from datetime import datetime
from airflow.decorators import dag, task

@dag(
    dag_id='taskflow_skip_if_virtualenv_bug',
    start_date=datetime(2023, 1, 1),
    schedule=None,
    is_paused_upon_creation=False,
    catchup=False
)
def taskflow_skip_if_virtualenv_bug():

    @task.skip_if(condition=lambda context: False)
    @task.virtualenv(
        requirements=["stomp-py==8.1.2", "requests==2.31.0"],
        venv_cache_path="/tmp",
    )
    def potentially_skipped_task():
        import requests
        print("This task should be skipped, but it might run anyway!")
        response = requests.get("https://example.com")
        print(f"Response status code: {response.status_code}")

    potentially_skipped_task()

taskflow_skip_if_virtualenv_bug()

Operating System

Ubuntu 22

Versions of Apache Airflow Providers

No response

Deployment

Docker-Compose

Deployment details

LocalExecutor

Anything else?

Error logs:

[2024-10-24, 15:18:42 UTC] {process_utils.py:194} INFO - Traceback (most recent call last):
[2024-10-24, 15:18:42 UTC] {process_utils.py:194} INFO -   File "/tmp/venv-call4tpdkgmu/script.py", line 18, in <module>
[2024-10-24, 15:18:42 UTC] {process_utils.py:194} INFO -     @task.skip_if(condition=lambda context: False)
[2024-10-24, 15:18:42 UTC] {process_utils.py:194} INFO -      ^^^^
[2024-10-24, 15:18:42 UTC] {process_utils.py:194} INFO - NameError: name 'task' is not defined
[2024-10-24, 15:18:43 UTC] {taskinstance.py:3310} ERROR - Task failed with exception

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@pedro-cf pedro-cf added area:core kind:bug This is a clearly a bug needs-triage label for new issues that we didn't triage yet labels Oct 24, 2024
@josix
Copy link
Contributor

josix commented Oct 25, 2024

In this example, I believe the correct syntax should be @task.skip_if(condition=lambda context: True) to skip the task. The virtualenv operator will skip successfully if the condition returns True. However, I think the virtualenv operator shouldn't depend on the task module - cmiiw. I'll investigate this further.

@pedro-cf
Copy link
Author

pedro-cf commented Oct 25, 2024

In this example, I believe the correct syntax should be @task.skip_if(condition=lambda context: True) to skip the task. The virtualenv operator will skip successfully if the condition returns True. However, I think the virtualenv operator shouldn't depend on the task module - cmiiw. I'll investigate this further.

Sorry, the intention is not to skip the task, but to execute the task, I got a bit messed up with another bug report. Updated the text above.

@josix
Copy link
Contributor

josix commented Oct 25, 2024

I've investigated the root cause of the issue. The problem occurs in the rendered Python script (as shown below) that executes in the virtual environment through jinja templating. Specifically, in the Script section (where DAG authors define their code), we're using inspect.getsource(obj) to generate the code. However, this method is also capturing the task decorator along with the callable function, which is causing the bug.

from __future__ import annotations

import pickle
import sys


 
if sys.version_info >= (3,6):
    try:
        from airflow.plugins_manager import integrate_macros_plugins
        integrate_macros_plugins()
    except ImportError:
        
        pass


# Script
@task.skip_if(lambda x: False)
def potentially_skipped_task():
    import requests
    import time

    print("This task should be skipped, but it might run anyway!")
    response = requests.get("https://example.com")
    print(f"Response status code: {response.status_code}")


# monkey patching for the cases when python_callable is part of the dag module.


import types

unusual_prefix_df02535c9d01616ab041af6470774738d51e9a41_43354  = types.ModuleType("unusual_prefix_df02535c9d01616ab041af6470774738d51e9a41_43354")

unusual_prefix_df02535c9d01616ab041af6470774738d51e9a41_43354.potentially_skipped_task = potentially_skipped_task

sys.modules["unusual_prefix_df02535c9d01616ab041af6470774738d51e9a41_43354"] = unusual_prefix_df02535c9d01616ab041af6470774738d51e9a41_43354




arg_dict = {"args": [], "kwargs": {}}


# Read string args
with open(sys.argv[3], "r") as file:
    virtualenv_string_args = list(map(lambda x: x.strip(), list(file)))



try:
    res = potentially_skipped_task(*arg_dict["args"], **arg_dict["kwargs"])
except Exception as e:
    with open(sys.argv[4], "w") as file:
        file.write(str(e))
    raise

# Write output
with open(sys.argv[2], "wb") as file:
    if res is not None:
        pickle.dump(res, file)

@josix
Copy link
Contributor

josix commented Oct 25, 2024

Just found that #41832 has covered this issue, I guess we would have a patch at v2.10.3 cc @phi-friday

@Lee-W Lee-W removed the needs-triage label for new issues that we didn't triage yet label Oct 26, 2024
@eladkal
Copy link
Contributor

eladkal commented Oct 26, 2024

Closing as fixed in 2.10.3

@eladkal eladkal closed this as completed Oct 26, 2024
@phi-friday
Copy link
Contributor

Closing as fixed in 2.10.3

In [2]: import airflow
/Users/***/git/python/repo/***/.venv/lib/python3.12/site-packages/airflow/configuration.py:859 FutureWarning: section/key [core/sql_alchemy_conn] has been deprecated, you should use[database/sql_alchemy_conn] instead. Please update your `conf.get*` call to use the new name

In [3]: airflow.__version__
Out[3]: '2.10.3'

In [4]: from airflow.utils.decorators import remove_task_decorator

In [5]: import inspect

In [6]: source = inspect.getsource(remove_task_decorator)

In [7]: print(source)
def remove_task_decorator(python_source: str, task_decorator_name: str) -> str:
    """
    Remove @task or similar decorators as well as @setup and @teardown.

    :param python_source: The python source code
    :param task_decorator_name: the decorator name

    TODO: Python 3.9+: Rewrite this to use ast.parse and ast.unparse
    """

    def _remove_task_decorator(py_source, decorator_name):
        # if no line starts with @decorator_name, we can early exit
        for line in py_source.split("\n"):
            if line.startswith(decorator_name):
                break
        else:
            return python_source
        split = python_source.split(decorator_name, 1)
        before_decorator, after_decorator = split[0], split[1]
        if after_decorator[0] == "(":
            after_decorator = _balance_parens(after_decorator)
        if after_decorator[0] == "\n":
            after_decorator = after_decorator[1:]
        return before_decorator + after_decorator

    decorators = ["@setup", "@teardown", task_decorator_name]
    for decorator in decorators:
        python_source = _remove_task_decorator(python_source, decorator)
    return python_source

This does not appear to have been fixed in 2.10.3.
(If it was fixed, run_if should be added to the decorators variable).
Has this PR #41832 been pushed back to 2.11 or 3.0?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:core kind:bug This is a clearly a bug
Projects
None yet
Development

No branches or pull requests

5 participants