-
Notifications
You must be signed in to change notification settings - Fork 14.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide an alternative OpenTelemetry implementation for traces that follows standard otel practices #43941
base: main
Are you sure you want to change the base?
Conversation
Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contributors' Guide (https://github.com/apache/airflow/blob/main/contributing-docs/README.rst)
|
I'm confirming, but otel traces might have been experimental still, and if that's the case we are free to change them for a good reason, and your description certainly sounds like one! |
If I may, can you add some indication to the title that this is related to OTel Traces specifically? We also have OTel Metrics implemented and it would be nice to minimize confusion. |
@ferruzzi I adjusted the title. The only change in this patch that is related to metrics, it's https://github.com/apache/airflow/pull/43941/files#diff-1cca954ec0be1aaf2c212e718c004cb0902a96ac60043bf0c97a782dee52cc32R85-R86 If you think that it's out of scope, then I can remove it. |
@@ -330,6 +337,30 @@ def trigger_tasks(self, open_slots: int) -> None: | |||
for _ in range(min((open_slots, len(self.queued_tasks)))): | |||
key, (command, _, queue, ti) = sorted_queue.pop(0) | |||
|
|||
if self.otel_use_context_propagation: | |||
# If it's None, then the span for the current TaskInstanceKey hasn't been started. | |||
if self.active_spans.get(key) is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been chatting to @xBis7 about this, and we're not sure that this active_spans idea will work with Airflow's HA schedulers. We are exploring options.
# According to otel spec, max length should be 255. Change if the spec gets revised. | ||
OTEL_NAME_MAX_LENGTH = 255 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can drop this for now. @ArshiaZr has a PR already going that is making this fix.
FWIW, long story short, that was there because when I initially implemented OTel Metrics there was a bug on their end that threw an exception if the name was longer than that, but it appears to have been fixed at some point.
This PR is a draft because this is my first time contributing to Airflow and I'm not sure if there should be an AIP or a mailing discussion for such changes. I'd appreciate any feedback.
Issue description
related: #40802
There are some OpenTelemetry standard practices that help keep the usage consistent across multiple projects. According to those
To explain more, this is what the flow of operations should be
dagrun
spani. Get the
dagrun
span contextdagrun
context, startti
spani. Get the
ti
span contextti
span context create task-sub spanti
span endsdagrun
span endsAirflow follows a different approach
dag_run
and thetask_instance
This is the flow of the current implementation
The current approach makes it impossible to create spans from under tasks while using the existing airflow code. To achieve that, you need to use https://github.com/howardyoo/airflow_otel_provider which has to generate the same trace id and span id as airflow otherwise the spans won't be properly associated. This isn't easily maintainable and it also makes it hard for people that are familiar with otel but new to airflow, to start using the feature.
These are some references to OpenTelemetry docs
https://opentelemetry.io/docs/concepts/context-propagation/
https://opentelemetry.io/docs/languages/python/propagation/
https://www.w3.org/TR/trace-context/
Implementation description
A lot of people might already be using airflow with the existing otel implementation. To avoid any inconvenience, the changes are hidden behind a config flag.
This patch is extending the existing implementation and not changing it. Once the flag is turned on, a new set of spans gets generated and exported. The new spans have the suffix
_ctx_prop
on their name.I've reused the attributes from the original implementation. In addition, the timings are the same.
To be able to propagate the context of a span, the span must be active.
For example, for a dag run, the span can't be created at the end but
Same goes for a task and any sub-spans.
With this approach, we can use the new otel methods for creating spans directly from under a task without the need of the
airflow_otel_provider
. These spans will be children of the task span.Check
test_otel_dag.py
for an example of usage.Testing
I've added a unit test for the
otel_tracer
methods and an integration test that runs a dag and then asserts the parent child relationships for the dag span, the task spans and the task sub spans.I've also tested the changes manually with a
PythonVirtualenvOperator
without issues.^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rst
or{issue_number}.significant.rst
, in newsfragments.