Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature] Render YAML definitions in Airflow UI #272

Open
2 tasks
tatiana opened this issue Oct 22, 2024 · 13 comments
Open
2 tasks

[feature] Render YAML definitions in Airflow UI #272

tatiana opened this issue Oct 22, 2024 · 13 comments
Assignees
Labels

Comments

@tatiana
Copy link
Collaborator

tatiana commented Oct 22, 2024

Context

Even though the YAML is the source of truth for the DAG topology and the operators/tasks configuration, we cannot see those in the Airflow UI. This can make it harder to troubleshoot and confirm the behaviours are as expected or that the expected version was deployed.

Acceptance criteria

  • Ideally, we'd display the whole YML in the Airflow UI (plugin?)
  • If that's not possible, we should at least render as a templated field the YML representation for the task of interest
@tatiana tatiana added enhancement New feature or request triage-needed labels Oct 22, 2024
@tatiana
Copy link
Collaborator Author

tatiana commented Oct 22, 2024

cc @cmarteepants

@rory-data
Copy link

rory-data commented Oct 27, 2024

Thanks, @tatiana. We recently had dag-factory suggested to us, so I've been investigating it this weekend. Like you say, if we can't have those details surfaced, it makes it difficult to troubleshoot anything and that'll be a hard sell to our support engineers.

In a perfect world, the generated DAG code would be presented in the Code view (or custom tab), but even having the YAML accessible in some form would be great.

It's a workaround and could get messy at scale, but I've got the YAML going into doc_md and presenting in the DAG Docs..

@cmarteepants
Copy link
Collaborator

Thanks @rory-data, that makes sense for a workaround. What I'm hearing is that in a perfect world, you would see the serialized DAG in the code view versus the contents of the yaml file. Is that right?

@tatiana
Copy link
Collaborator Author

tatiana commented Oct 28, 2024

@rory-data I can fully relate. @cmarteepants is helping us prioritise this feature; it's high on our board, and we'll keep this thread up-to-date as it is picked up/developed.

@rory-data
Copy link

@cmarteepants I would think so from a consistent or predictable experience perspective for support engineers. I do understand that it's not catered for in Airflow (Ash mentioned in Slack that it could be in AIP-85), so it may be a case of having the YAML presented in a better place than DAG Docs and using tagging to indicate a given DAG has been generated. That's just my two cents though 😅

@tatiana
Copy link
Collaborator Author

tatiana commented Oct 29, 2024

Perhaps we can have a strategy in the short-term and one in the long-term, WDYT @cmarteepants ?

@rory-data how has been the experience with doc_md so far?
What were the main advantages and drawbacks of this approach..?

@cmarteepants, do you think doc_md is a better or worse solution than templated fields? Are they complementary? I just realised that since 2.9, we have also introduced a cap to templated fields in Airflow (which can be configured; still, some users may face issues without realising it).

I like the idea of tagging DAGs created by dag-factory - I think we should certainly do this.

@rory-data
Copy link

I admittedly only came up with the idea in the weekend just gone as part of our initial investigations into using DAG Factory, so it's not being used in any live processes.

However, I can see that a large YAML input could be potentially unwieldy in the DAG Docs field from a UX perspective with it being an expandable section high in the DAG UI. The way I've currently got it also means it would overwrite any user-supplied doc_md values, so I need to adjust it to append instead.

@tatiana tatiana added this to the DAG Factory 0.21.0 milestone Oct 30, 2024
@rory-data
Copy link

It needs polish, but FWIW, this is what I did for the example_dag_factory.py script in the examples directory.

"""
Airflow currently cannot serialise a DAG generated by DagFactory into the Code view of
the UI (this may be resolved in AIP-85).
As a workaround, the below change to the process of generating DAGs will render the YAML
into a markdown string and add it to the doc_md field so it is viewable in the UI DAG
Docs.
"""

import os
from pathlib import Path
from typing import Any, Dict

import yaml
from airflow import DAG  ## by default, this is needed for the dagbag to parse this file

import dagfactory  # type: ignore

DEFAULT_CONFIG_ROOT_DIR: str = "/usr/local/airflow/dags/configs"
CONFIG_ROOT_DIR: Path = Path(os.getenv("CONFIG_ROOT_DIR", DEFAULT_CONFIG_ROOT_DIR))


def serialise_config_md(key: str, value: Dict[str, Any]) -> str:
    """
    Serialises a given key and its corresponding dictionary value into a formatted
    YAML string enclosed in Markdown code block syntax.
    Args:
        key (str): The key to be serialized.
        value (Dict[str, Any]): The dictionary value associated with the key.
    Returns:
        str: A string containing the YAML representation of the key-value pair,
             formatted within a Markdown code block.
    """
    yaml_string = yaml.dump({key: value}, default_flow_style=False, sort_keys=False)
    return f"```yaml\n{yaml_string}\n```"


config_path: str = str(CONFIG_ROOT_DIR / "example_dag_factory.yml")

with open(config_path, encoding="utf8") as file:
    config_data: Dict[str, Any] = yaml.safe_load(file)
for dag_id, dag_details in config_data.items():
    # for each DAG in the config, add the YAML to the doc_md field
    if dag_details.get("doc_md"):
        # check if doc_md already exists and append to it
        dag_details["doc_md"] += "\n" + serialise_config_md(dag_id, dag_details)
    else:
        # if doc_md does not exist, create it
        dag_details["doc_md"] = serialise_config_md(dag_id, dag_details)

dag_factory = dagfactory.DagFactory(config=config_data)
dag_factory.clean_dags(globals())
dag_factory.generate_dags(globals())

@cmarteepants
Copy link
Collaborator

cmarteepants commented Oct 31, 2024

@cmarteepants, do you think doc_md is a better or worse solution than templated fields? Are they complementary? I just realised that since 2.9, we have also introduced a cap to apache/airflow#38094 in Airflow (which can be configured; still, some users may face issues without realising it).

Going off of memory, but cap for templated fields was introduced as we were seeing a lot of DB perf issues. One of the causes was because of the issue that you linked, but there were others. I don't love the idea of appending the yaml contents to dag docs either, but it's a fair short-term workaround and more robust solution compared to using templated fields.

It's a weird predicament: If we do add a view via a UI plugin, I'd rather we do it in react and in context of the new Airflow UI (and we will for AF3), not with FAB. I'm also not a fan of making a config for this and have DAG Factory handle appending this to docs if it's only a stop-gap, but I am open to it. @rory-data - is it reasonable to document your solution and have people add it to the dag generator and do this properly for Airflow 3? Or do you feel strongly that we should have a config to add the yaml to your dag docs?

@rory-data
Copy link

@cmarteepants I'd be fine with that. As you (and Ash) have pointed out, there are constraints in Airflow 2 that make this difficult, so documenting it as an optional workaround for now at least gives folk a choice.
Having a specific config built would make it more consistent and easier for users, but is it a tactical fix that would be obsoleted in AF3 so not worth the effort, or do you need to have something robust for those who'll stay on AF2?

@cmarteepants
Copy link
Collaborator

If we have DAG Factory itself append the yaml contents to DAG docs, it would be a tactical fix that realistically would become obsolete in AF3. I don't know if we'd go as far as officially deprecating it though; it's more that I can't see anybody wanting this behaviour if we're properly showing you the rendered yaml as a first class feature. Great point about having something more robust than an example for those staying on AF2 for awhile.

Ok, I'm convinced. I can't imagine it being a lot of effort for us to do this. @tatiana only callout I have that we should only display the yaml contents that are relevant to that DAG (many DAGs can be defined in a single yaml). We should also include anything found in the default section

@cmarteepants
Copy link
Collaborator

@rory-data we'll target the next release, which will be sometime near the end of December. In the meantime, are you ok with your current workaround?

@rory-data
Copy link

@cmarteepants yes, we'll be good with that, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants