-
Notifications
You must be signed in to change notification settings - Fork 444
feat: poc implementation of the OTel Metrics API #13780
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…gStatsD to the local agent
… of the module imported checks. This is bad and will need to be deleted, but I'm just trying to get stuff working right now
…in-progress system-tests for the OTel Metrics API feature at DataDog/system-tests#4790
…h is enough validation for the time being.
…config _otel_metrics_enabled
|
elif other.schema_url == "": | ||
schema_url = self.schema_url | ||
elif self.schema_url == other.schema_url: | ||
schema_url = other.schema_url |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bootstrap import analysisComparison of import times between this PR and base. SummaryThe average import time from this PR is: 276 ± 2 ms. The average import time from base is: 277 ± 2 ms. The import time difference between this PR and base is: -0.36 ± 0.1 ms. Import time breakdownThe following import paths have grown:
|
@@ -626,7 +626,8 @@ def __init__(self): | |||
"DD_CIVISIBILITY_EARLY_FLAKE_DETECTION_ENABLED", True, asbool | |||
) | |||
self._otel_enabled = _get_config("DD_TRACE_OTEL_ENABLED", False, asbool, "OTEL_SDK_DISABLED") | |||
if self._otel_enabled: | |||
self._otel_metrics_enabled = _get_config("DD_TRACE_OTEL_METRICS_ENABLED", False, asbool, "OTEL_SDK_DISABLED") | |||
if self._otel_enabled or self._otel_metrics_enabled: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A little unrelated but OTEL_METRICS_EXPORTER is only used to enable/disable runtime metrics. Do we need to do anything with this config now that we support OTLP metrics or will this be handled by otel packages?
@@ -33,6 +33,7 @@ dependencies = [ | |||
"importlib_metadata<=6.5.0; python_version<'3.8'", | |||
"legacy-cgi>=2.0.0; python_version>='3.13.0'", | |||
"opentelemetry-api>=1", | |||
"opentelemetry-exporter-otlp>=1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc: @brettlangdon @emmettbutler How does this change fit in to nextgen api efforts.
These are all the transitive dependencies of the opentelemetry-exporter-otlp package. It's important to note that OpenTelemetry packages are only compatible with matching versions of the SDK, API, and exporters. For example, you cannot install version 1.20.0 of the API and use it with version 1.20.1 of the SDK or exporter:
certifi==2025.6.15
charset-normalizer==3.4.2
googleapis-common-protos==1.70.0
grpcio==1.73.1
idna==3.10
importlib_metadata==8.7.0
opentelemetry-api==1.34.1
opentelemetry-exporter-otlp==1.34.1
opentelemetry-exporter-otlp-proto-common==1.34.1
opentelemetry-exporter-otlp-proto-grpc==1.34.1
opentelemetry-exporter-otlp-proto-http==1.34.1
opentelemetry-proto==1.34.1
opentelemetry-sdk==1.34.1
opentelemetry-semantic-conventions==0.55b1
protobuf==5.29.5
requests==2.32.4
typing_extensions==4.14.0
urllib3==2.5.0
zipp==3.23.0
Another to note is that opentelemetry exporters require a specific major version of the protobuf library: https://github.com/open-telemetry/opentelemetry-python/blob/698f9a521482d6ab3ec75721ff7ed61a207fa110/opentelemetry-proto/pyproject.toml#L28.
# This is the implementation of the "Any" type as specified by the specifications of OpenTelemetry data model for logs. | ||
# For more details, refer to the OTel specification: | ||
# https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/logs/data-model.md#type-any | ||
AnyValue = Union[ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can import some of these types from the OpenTelemetry API: https://github.com/open-telemetry/opentelemetry-python/blob/v1.0.0/opentelemetry-api/src/opentelemetry/util/types.py. If possible we should avoid defining our own implementations.
We can use an approach similar to this to support changes to the API across versions: https://github.com/DataDog/dd-trace-py/blob/v3.10.0rc1/ddtrace/internal/opentelemetry/trace.py#L53 (only re-define objects/classes when we absolutely have to).
from json import dumps | ||
from typing import Optional | ||
|
||
from ddtrace.internal.opentelemetry.types import Attributes, BoundedAttributes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Attributes is defined in the opentelemetry-api for all supported versions. BoundedAttributes was introduced in v1.4.0: open-telemetry/opentelemetry-python@01c6954. We should use the definitions provided by the OpenTelemetry API where possible. This will minimize the maintenance burden on our part and hopefully "future" proof the library.
from ddtrace.internal.opentelemetry.types import Attributes, BoundedAttributes | ||
|
||
|
||
class InstrumentationScope: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instrumentation scope was added to the SDK in v1.11.0: open-telemetry/opentelemetry-python@7647a11. Since the opentelemetry exporter already installs the SDK we can just re-use this component. If someone is using opentelemetry-api<1.12.0
this component won't exist so we won't need this type.
@@ -75,6 +75,20 @@ def _(_): | |||
|
|||
set_tracer_provider(TracerProvider()) | |||
|
|||
if config._otel_metrics_enabled: | |||
# This POC uses the OpenTelemetry OTLP exporter. | |||
# If we implement our own exporter, we can remove this import, which seems much safer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I missed this in my earlier comments. It looks like this PR is attempting to vendor as much code as possible, with the eventual goal of replacing the opentelemetry-exporter with our own implementation.
However, this approach puts us in the worst of both worlds. We're still installing the official opentelemetry-exporter (which brings in the OpenTelemetry SDK, Open Telemetry Protos, OpenTelemetry Semantics package, and versions of the grpc and protobuf libraries with heavy restrictions), while also hoping our vendored implementations don’t conflict. To move forward, we need to fully commit to one direction: either vendor all OpenTelemetry components or none. A partial solution will likely lead to version conflicts and unnecessary complexity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But ya if we vendor nothing this PR will be 10-20 line change. If we vendor everything (including protobuf and grpc libraries this change will likely more than double the size of the ddtrace library)
Motivation
We want to implement support for the OTel Metrics API across our libraries. There is an RFC in development as well as system-tests to define that behavior, and this POC aims to deliver the necessary code to implement that.
For API usage, see the
utils/build/docker/python/flask/app.py
file in the referenced system-tests PR.Summary
This PR is a minimum implementation of the OTel Metrics API. This API covers the minimum code that application / library developers need to:
MeterProvider
instance (which provides access toMeter
s)Meter
with a given name+versionAll of this functionality is controlled by the environment variable
DD_TRACE_OTEL_METRICS_ENABLED
which is disabled by default.At a high-level, the OTel interfaces are implemented by new implementation types. To be able to emit these data types into OTLP as a quick POC, the following approaches were taken:
Vendored Files
Many files are directly vendored from the open-telemetry/opentelemetry-python repository, including:
ddtrace/internal/opentelemetry/types.py
- Defines key values likeAttributes
,AnyValue
ddtrace/internal/opentelemetry/resources.py
- Defines types for the Resource concept in OpenTelemetry, which is included in the OTLP metrics payload. A Resource defines the entity producing data, like a service name, env, version.ddtrace/internal/opentelemetry/metrics.py
- The part of this file that is vendored is a minimal implementation of thePeriodicExportingMetricReader
from the SDK, whose purpose is to collect metrics before exporting them..ddtrace/internal/opentelemetry/metric_points.py
- This class defines the dataclasses that map 1:1 with the protobuf format of OTLP metricsddtrace/internal/opentelemetry/instrumentation.py
- Contains the definition of theInstrumentationScope
type, which is included in the OTLP metrics payload. This identifies the smallest identifiable scope of instrumentation, like an instrumentation library name and version.Remaining Files
The rest of the net-new code is:
ddtrace/internal/opentelemetry/metrics.py
- The main entrypoints ofMeterProvider
andMeter
are defined here.ddtrace/internal/opentelemetry/instrument.py
- The classes that implement the various instrument interfaces are defined here. All of the instrument types are as follows:Counter
UpDownCounter
Gauge
Histogram
ObservableCounter
ObservableUpDownCounter
ObservableGauge
Testing
This PR doesn't add any unit or integration tests to the repo. At this time, I'd prefer to defer that work and use the referenced system-tests as the testing strategy.
Improvements
As a POC, there are some implementation details that are not long-term ideal but I'd like to defer them to a follow-up PR. If you feel they must be included in this PR, I can go ahead and change them.This includes:
Counter.Add
) blocks on the generation and HTTP submission of this one metric to the OTLP endpoint. This should be batched alongside the other measurements from the asynchronous instruments that are collected and submitted on a configured interval.Checklist
Reviewer Checklist