Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[airflow] Add lint rule to show error for removed context variables in airflow #15144

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

sunank200
Copy link

@sunank200 sunank200 commented Dec 26, 2024

Summary

Airflow 3.0 removes a set of deprecated context variables that were phased out in 2.x. This PR introduces lint rules to detect usage of these removed variables in various patterns, helping identify incompatibilities. The removed context variables include:

conf
execution_date
next_ds
next_ds_nodash
next_execution_date
prev_ds
prev_ds_nodash
prev_execution_date
prev_execution_date_success
tomorrow_ds
yesterday_ds
yesterday_ds_nodash

Detected Patterns and Examples

The linter now flags the use of removed context variables in the following scenarios:

  1. Direct Subscript Access

    execution_date = context["execution_date"]  # Flagged
  2. .get("key") Method Calls

    print(context.get("execution_date"))  # Flagged
  3. Variables Assigned from get_current_context()
    If a variable is assigned from get_current_context() and then used to access a removed key:

    c = get_current_context()
    print(c.get("execution_date"))  # Flagged
  4. Function Parameters in @task-Decorated Functions
    Parameters named after removed context variables in functions decorated with @task are flagged:

    from airflow.decorators import task
    
    @task
    def my_task(execution_date, **kwargs):  # Parameter 'execution_date' flagged
        pass
  5. Removed Keys in Task Decorator kwargs and Other Scenarios
    Other similar patterns where removed context variables appear (e.g., as part of kwargs in a @task function) are also detected.

from airflow.decorators import task

@task
def process_with_execution_date(**context):
    execution_date = lambda: context["execution_date"]  # flagged
    print(execution_date)

@task(kwargs={"execution_date": "2021-01-01"})   # flagged
def task_with_kwargs(**context):  
    pass

Test Plan

Test fixtures covering various patterns of deprecated context usage are included in this PR. For example:

from airflow.decorators import task, dag, get_current_context
from airflow.models import DAG
from airflow.operators.dummy import DummyOperator
import pendulum
from datetime import datetime

@task
def access_invalid_key_task(**context):
    print(context.get("conf"))  # 'conf' flagged

@task
def print_config(**context):
    execution_date = context["execution_date"]  # Flagged
    prev_ds = context["prev_ds"]                # Flagged

@task
def from_current_context():
    context = get_current_context()
    print(context["execution_date"])            # Flagged

# Usage outside of a task decorated function
c = get_current_context()
print(c.get("execution_date"))                 # Flagged

@task
def some_task(execution_date, **kwargs):
    print("execution date", execution_date)     # Parameter flagged

@dag(
    start_date=pendulum.datetime(2021, 1, 1, tz="UTC")
)
def my_dag():
    task1 = DummyOperator(
        task_id="task1",
        params={
            "execution_date": "{{ execution_date }}",  # Flagged in template context
        },
    )

    access_invalid_key_task()
    print_config()
    from_current_context()
    
dag = my_dag()

Ruff will emit AIR302 diagnostics for each deprecated usage, with suggestions when applicable, aiding in code migration to Airflow 3.0.

related: apache/airflow#44409, apache/airflow#41641

Copy link
Contributor

github-actions bot commented Dec 26, 2024

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

@sunank200 sunank200 force-pushed the deprecated_context_variable_airflow branch from 5c96f89 to 5103ef7 Compare December 27, 2024 04:52
@sunank200 sunank200 requested review from Lee-W and uranusjr December 27, 2024 04:53
@dhruvmanila dhruvmanila added rule Implementing or modifying a lint rule preview Related to preview mode features labels Dec 30, 2024
@sunank200 sunank200 force-pushed the deprecated_context_variable_airflow branch 3 times, most recently from d580a4b to c0a34d3 Compare January 2, 2025 08:03
@dhruvmanila
Copy link
Member

Going to focus on reviewing this PR instead of #15240 for now as I think this one supersedes the other one but please correct me if I'm wrong.

Copy link
Member

@dhruvmanila dhruvmanila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this. I've a couple of doubts which I've highlighted in the review comments and #15240 (comment).

@dhruvmanila
Copy link
Member

Thank you for updating the PR, I plan on looking at it later today.

@sunank200 sunank200 force-pushed the deprecated_context_variable_airflow branch from 4975421 to ffee139 Compare January 15, 2025 14:54
Copy link
Member

@dhruvmanila dhruvmanila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good, can you update the PR description to include all the checks that are being done? I'm mainly looking for all the structural matching that's being done here, not specific symbols or variables that's being checked. I'm having a hard time keeping track of them :)

@dhruvmanila
Copy link
Member

Please re-request for review when it's ready :)

@sunank200 sunank200 force-pushed the deprecated_context_variable_airflow branch 2 times, most recently from 2fec3ae to c82abcf Compare January 21, 2025 22:14
@sunank200 sunank200 force-pushed the deprecated_context_variable_airflow branch from c82abcf to 7825256 Compare January 21, 2025 22:36
@Lee-W Lee-W force-pushed the deprecated_context_variable_airflow branch from 7825256 to c2c37b8 Compare January 22, 2025 02:06
@Lee-W
Copy link
Contributor

Lee-W commented Jan 22, 2025

Please re-request for review when it's ready :)

I fix the CI failure. I think it's ready for review now. Thanks!

Copy link
Member

@dhruvmanila dhruvmanila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for updating the PR description with the patterns that are being checked here, it's helpful as a reference when reviewing.

Regarding 1, 2, 3 pattern, do they need to be in function / method which has the @task decorator or can they be in any function / method?

I think we're pretty close to finishing this up, thank you for your patience!

Comment on lines +411 to +418
AIR302_context.py:109:5: AIR302 `execution_date` is removed in Airflow 3.0
|
108 | @task
109 | def access_invalid_argument_task_out_of_dag(execution_date, **context):
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ AIR302
110 | print("execution date", execution_date)
111 | print("access invalid key", context.get("conf"))
|
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be highlighting the wrong range. I think you meant to highlight the parameter execution_date?

Can we add more test cases to check functions with multiple parameters which has both deprecated and non-deprecated parameters?

Comment on lines +142 to +172
AIR302_context.py:58:5: AIR302 [*] `schedule_interval` is removed in Airflow 3.0
|
56 | with DAG(
57 | dag_id="example_dag",
58 | schedule_interval="@daily",
| ^^^^^^^^^^^^^^^^^ AIR302
59 | start_date=datetime(2023, 1, 1),
60 | template_searchpath=["/templates"],
|
= help: Use `schedule` instead

Safe fix
55 55 |
56 56 | with DAG(
57 57 | dag_id="example_dag",
58 |- schedule_interval="@daily",
58 |+ schedule="@daily",
59 59 | start_date=datetime(2023, 1, 1),
60 60 | template_searchpath=["/templates"],
61 61 | ) as dag:

AIR302_context.py:62:13: AIR302 `airflow.operators.dummy.DummyOperator` is removed in Airflow 3.0
|
60 | template_searchpath=["/templates"],
61 | ) as dag:
62 | task1 = DummyOperator(
| ^^^^^^^^^^^^^ AIR302
63 | task_id="task1",
64 | params={
|
= help: Use `airflow.operators.empty.EmptyOperator` instead
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these related to the changes made in this PR? This is fine, I just want to confirm.

}

for removed_key in REMOVED_CONTEXT_KEYS {
if let Some(argument) = call_expr.arguments.find_argument_value(removed_key, 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can use 0 as the default value here. What would happen if it's a positional parameter which is expected to be at position 1 or 2? This will return incorrect argument.

I think we should also add test cases where there are multiple arguments that are deprecated in the same function intermixed with non-deprecated arguments.

Comment on lines +398 to +400
let is_named_context = value.as_name_expr().is_some_and(|name| {
matches!(name.id.as_str(), "context" | "kwargs") || name.id.as_str().starts_with("**")
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the ** check is for? I only see ** being used in parameter in the tests and PR description but this is checking the call function and not the function parameter. Did you meant to look at the definition of value and check that instead? If so, I think that will require adding functionality to get the parameter definition for the binding which would be something like:

fn find_parameter(semantic: &SemanticModel, name: &ast::ExprName) -> Option<ast::AnyParameterRef> {
	let binding_id = semantic.only_binding(name_expr)?;
	let binding = semantic.binding(binding_id);
	let ast::StmtFunctionDef { parameters, .. } = binding.statement(semantic)?.as_function_def_stmt()?;
    parameters
        .iter()
        .find(|parameter| parameter.name().range() == binding.range())
}

Comment on lines +7 to +9
11 | def access_invalid_key_in_context(**context):
12 | print("access invalid key", context["conf"])
| ^^^^^^ AIR302
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function doesn't have the @task decorator but the diagnostics is being raised here, is this correct?

Comment on lines +96 to +101
/// Checks for the use of deprecated Airflow context variables.
///
/// The function handles context keys accessed:
/// - Directly using `context["key"]` or context.get("key").
/// - Using `.get()` with variables derived from `get_current_context()`.
/// - In custom operators, task decorators, and within templates or macros.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to update the documentation for this function as it's only checking subscript expressions?

@dhruvmanila dhruvmanila self-assigned this Jan 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
preview Related to preview mode features rule Implementing or modifying a lint rule
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants