-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Various tracing fixes #137908
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Various tracing fixes #137908
Conversation
Only continue tracing in TaskManager if a parent APM trace context exists. Trace headers might be set by external transactions as well. If present, it caused APMTracer to report a transaction for every span when in fact being sampled out. Correctly report the duration of transactions if not recording by not discarding the root span immediately. These transactions might still get reported, but without spans. Discard transient trace start time in `newTraceContext` when a parent APM trace context already exists. If propagated, all spans would start at the same time. Relates to ES-13389
|
Pinging @elastic/es-core-infra (Team:Core/Infra) |
Turns out this might be unnecessary (for serverless at least), the APM agent behavior depends on the apm-server version. For recent apm-server versions such transactions are not reported of unsampled. |
jdconrad
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Just a couple of questions.
modules/apm/src/test/java/org/elasticsearch/telemetry/apm/internal/tracing/APMTracerTests.java
Outdated
Show resolved
Hide resolved
| final Object previousTraceContext = newTransientHeaders.remove(Task.APM_TRACE_CONTEXT); | ||
| if (previousTraceContext != null) { | ||
| newTransientHeaders.put(Task.PARENT_APM_TRACE_CONTEXT, previousTraceContext); | ||
| // Remove the trace start time override for a previous context if such a context already exists. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Naive question - it's guaranteed there will never be overlap in this case? (If we remove we are sure the prior trace is completed?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TRACE_START_TIME is a rather unfortunate hack. It should only be used for the root span (and previously it wasn't possible to set this again, but it could be set to late). Overlapping shouldn't ever be an issue. But I'll add an assert to make sure this is never set after a trace context already exists catching the too late case.
modules/apm/src/test/java/org/elasticsearch/telemetry/apm/internal/tracing/APMTracerTests.java
Outdated
Show resolved
Hide resolved
…rnal/tracing/APMTracerTests.java
| assert key != Task.TRACE_START_TIME || (hasApmTraceContext() || hasParentApmTraceContext()) == false | ||
| : "trace.starttime cannot be set after a trace context is present"; | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jdconrad this will prevent wrong usage of trace.starttime
- Only continue tracing in TaskManager if a parent APM trace context exists. Trace headers might be set by external transactions as well. If present, it caused APMTracer to report a transaction for every span when in fact being sampled out. - Correctly report the duration of transactions if not recording by not discarding the root span immediately. These transactions might still get reported, but without spans. - Discard transient trace start time in `newTraceContext` when a parent APM trace context already exists. If propagated, all spans would start at the same time. Relates to ES-13389
💔 Backport failed
You can use sqren/backport to manually backport by running |
- Only continue tracing in TaskManager if a parent APM trace context exists. Trace headers might be set by external transactions as well. If present, it caused APMTracer to report a transaction for every span when in fact being sampled out. - Correctly report the duration of transactions if not recording by not discarding the root span immediately. These transactions might still get reported, but without spans. - Discard transient trace start time in `newTraceContext` when a parent APM trace context already exists. If propagated, all spans would start at the same time. Relates to ES-13389
This PR fixes various issues causing an excessive volume of APM events and incorrectly reported transaction & span durations as described in further detail:
Only continue tracing in TaskManager if a parent APM trace context exists. Trace headers might
be set by external transactions as well. If present, it caused APMTracer to report a transaction
for every span when in fact being sampled out.
Correctly report the duration of transactions if not recording by not discarding the root span
immediately. These transactions might still get reported, but without spans.
Discard transient trace start time in
newTraceContextwhen a parent APM trace contextalready exists. If propagated, all spans would start at the same time.
Relates to ES-13389