-
Notifications
You must be signed in to change notification settings - Fork 180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feature] Render YAML definitions in Airflow UI #272
Comments
Thanks, @tatiana. We recently had dag-factory suggested to us, so I've been investigating it this weekend. Like you say, if we can't have those details surfaced, it makes it difficult to troubleshoot anything and that'll be a hard sell to our support engineers. In a perfect world, the generated DAG code would be presented in the Code view (or custom tab), but even having the YAML accessible in some form would be great. It's a workaround and could get messy at scale, but I've got the YAML going into doc_md and presenting in the DAG Docs.. |
Thanks @rory-data, that makes sense for a workaround. What I'm hearing is that in a perfect world, you would see the serialized DAG in the code view versus the contents of the yaml file. Is that right? |
@rory-data I can fully relate. @cmarteepants is helping us prioritise this feature; it's high on our board, and we'll keep this thread up-to-date as it is picked up/developed. |
@cmarteepants I would think so from a consistent or predictable experience perspective for support engineers. I do understand that it's not catered for in Airflow (Ash mentioned in Slack that it could be in AIP-85), so it may be a case of having the YAML presented in a better place than DAG Docs and using tagging to indicate a given DAG has been generated. That's just my two cents though 😅 |
Perhaps we can have a strategy in the short-term and one in the long-term, WDYT @cmarteepants ? @rory-data how has been the experience with doc_md so far? @cmarteepants, do you think I like the idea of tagging DAGs created by dag-factory - I think we should certainly do this. |
I admittedly only came up with the idea in the weekend just gone as part of our initial investigations into using DAG Factory, so it's not being used in any live processes. However, I can see that a large YAML input could be potentially unwieldy in the DAG Docs field from a UX perspective with it being an expandable section high in the DAG UI. The way I've currently got it also means it would overwrite any user-supplied doc_md values, so I need to adjust it to append instead. |
It needs polish, but FWIW, this is what I did for the example_dag_factory.py script in the examples directory. """
Airflow currently cannot serialise a DAG generated by DagFactory into the Code view of
the UI (this may be resolved in AIP-85).
As a workaround, the below change to the process of generating DAGs will render the YAML
into a markdown string and add it to the doc_md field so it is viewable in the UI DAG
Docs.
"""
import os
from pathlib import Path
from typing import Any, Dict
import yaml
from airflow import DAG ## by default, this is needed for the dagbag to parse this file
import dagfactory # type: ignore
DEFAULT_CONFIG_ROOT_DIR: str = "/usr/local/airflow/dags/configs"
CONFIG_ROOT_DIR: Path = Path(os.getenv("CONFIG_ROOT_DIR", DEFAULT_CONFIG_ROOT_DIR))
def serialise_config_md(key: str, value: Dict[str, Any]) -> str:
"""
Serialises a given key and its corresponding dictionary value into a formatted
YAML string enclosed in Markdown code block syntax.
Args:
key (str): The key to be serialized.
value (Dict[str, Any]): The dictionary value associated with the key.
Returns:
str: A string containing the YAML representation of the key-value pair,
formatted within a Markdown code block.
"""
yaml_string = yaml.dump({key: value}, default_flow_style=False, sort_keys=False)
return f"```yaml\n{yaml_string}\n```"
config_path: str = str(CONFIG_ROOT_DIR / "example_dag_factory.yml")
with open(config_path, encoding="utf8") as file:
config_data: Dict[str, Any] = yaml.safe_load(file)
for dag_id, dag_details in config_data.items():
# for each DAG in the config, add the YAML to the doc_md field
if dag_details.get("doc_md"):
# check if doc_md already exists and append to it
dag_details["doc_md"] += "\n" + serialise_config_md(dag_id, dag_details)
else:
# if doc_md does not exist, create it
dag_details["doc_md"] = serialise_config_md(dag_id, dag_details)
dag_factory = dagfactory.DagFactory(config=config_data)
dag_factory.clean_dags(globals())
dag_factory.generate_dags(globals()) |
Going off of memory, but cap for templated fields was introduced as we were seeing a lot of DB perf issues. One of the causes was because of the issue that you linked, but there were others. I don't love the idea of appending the yaml contents to dag docs either, but it's a fair short-term workaround and more robust solution compared to using templated fields. It's a weird predicament: If we do add a view via a UI plugin, I'd rather we do it in react and in context of the new Airflow UI (and we will for AF3), not with FAB. I'm also not a fan of making a config for this and have DAG Factory handle appending this to docs if it's only a stop-gap, but I am open to it. @rory-data - is it reasonable to document your solution and have people add it to the dag generator and do this properly for Airflow 3? Or do you feel strongly that we should have a config to add the yaml to your dag docs? |
@cmarteepants I'd be fine with that. As you (and Ash) have pointed out, there are constraints in Airflow 2 that make this difficult, so documenting it as an optional workaround for now at least gives folk a choice. |
If we have DAG Factory itself append the yaml contents to DAG docs, it would be a tactical fix that realistically would become obsolete in AF3. I don't know if we'd go as far as officially deprecating it though; it's more that I can't see anybody wanting this behaviour if we're properly showing you the rendered yaml as a first class feature. Great point about having something more robust than an example for those staying on AF2 for awhile. Ok, I'm convinced. I can't imagine it being a lot of effort for us to do this. @tatiana only callout I have that we should only display the yaml contents that are relevant to that DAG (many DAGs can be defined in a single yaml). We should also include anything found in the default section |
@rory-data we'll target the next release, which will be sometime near the end of December. In the meantime, are you ok with your current workaround? |
@cmarteepants yes, we'll be good with that, thanks. |
Context
Even though the YAML is the source of truth for the DAG topology and the operators/tasks configuration, we cannot see those in the Airflow UI. This can make it harder to troubleshoot and confirm the behaviours are as expected or that the expected version was deployed.
Acceptance criteria
The text was updated successfully, but these errors were encountered: