-
Notifications
You must be signed in to change notification settings - Fork 164
Automatic propagation of peer.service #247
Changes from 5 commits
43079cd
10ad05d
c8b500a
5b8cc6a
21ca221
7915481
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,176 @@ | ||
# Automatic Peer Service Name propagation | ||
|
||
Automatic propagation of `peer.service` through `TraceState``. | ||
|
||
## Motivation | ||
|
||
Knowing the service name on the other side of a remote call is valuable | ||
troubleshooting information. The semantic conventions represent this via | ||
`peer.service`, which needs to be manually populated. In a deployment scenario, | ||
when a new service is added, all the existing services interacting with it | ||
need to update `peer.service`, which is error-prone and may | ||
become unreliable, making it eventually obsolete. | ||
|
||
This information can be effectively derived in the backend using the | ||
`Resource` of the parent `Span`, but is otherwise not available | ||
at Collector processing time, where it could be used for transformation | ||
purposes or sampling (e.g. adaptive sampling based on the calling service). | ||
|
||
As metrics and logs do not have defined a parent-child relationship, using | ||
`peer.service` could help gaining insight into the remote service as well. | ||
|
||
Defining (optional) automated population of `peer.service` will greatly help | ||
adoption of this attribute by users and vendors explicitly interested in this | ||
scenario. | ||
|
||
## Explanation | ||
|
||
SDKs will define an optional feature, disabled by default, | ||
to read the `service.name` attribute of the related `Resource` and set it | ||
in the spans' `TraceState` as described in | ||
[trace state handling](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/tracestate-handling.md) | ||
specifically using the `us` subkey (denoting **upstream service**): | ||
|
||
``` | ||
ot=us:MyService | ||
``` | ||
|
||
Instrumentation and processors are then free to use this information to set | ||
`peer.service` and perform other related processing. | ||
|
||
## Internal details | ||
|
||
SDKs will disable by default this option, maintaining the current behavior. | ||
When the feature is explicitly enabled by the user, spans will include | ||
an additional entry in `TraceState` as described above. By doing this, | ||
the user acknowledges the additional cost in memory and bandwidth. | ||
|
||
Span creation will be updated like this: | ||
|
||
```java | ||
// | ||
// SpanBuilder.startSpan() | ||
// | ||
if (tracerSharedState.propagateServiceName) { | ||
String serviceName = tracerSharedState.getResource().getAttribute(PEER_SERVICE); | ||
traceState = addServiceNameToTracerState(traceState, serviceName); | ||
} | ||
// Use the updated `traceState` to create the new SpanContext. | ||
``` | ||
|
||
Server-side instrumentation (e.g. http servers, gRPC on the receiver side) | ||
can then be updated to use the propagated context to look for the `us` subkey | ||
in `TraceState`, and if it exists, use it to set `peer.service` on the local `Span`: | ||
|
||
```java | ||
// | ||
// Incoming request handling. | ||
// | ||
try (Scope scope = extractRemoteContext(headers).makeCurrent()) { | ||
SpanBuilder spanBuilder = tracer.spanBuilder("server-span"); | ||
|
||
TraceState remoteTraceState = Span.current() | ||
.getSpanContext() | ||
.getTraceState(); | ||
String peerServiceName = getUpstreamServiceName(remoteTraceState); | ||
if (peerServiceName != null) { | ||
spanBuilder.setAttribute(PEER_SERVICE, peerServiceName); | ||
} | ||
} | ||
``` | ||
|
||
With `peer.service` present in server spans, further processing, filtering and sampling can | ||
then be accomplished in the Collector, e.g. a preview of the dependency map of a service, | ||
similar in spirit to zPages could be created. | ||
|
||
### Use scenarios | ||
|
||
Sampling can benefit from knowing the calling service, specifically: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As sampling is mentioned as a primary usage scenario for this feature, I wonder if would make sense to bundle this together with other sampling related values for which there's a proposal to propagate them via trace state: #235 All those values could then be consistently used and populated by samplers, one wouldn't need to invent a new configurable mechanism in SDKs (at least for the client side). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This value makes sense to propagate with the sampling-related attributes in the |
||
|
||
* An adaptive sampler may decide to sample or not based on the calling service, e.g. | ||
given Service A amounting to 98% of requests, and Service B amounting to 2% only, | ||
with a default sampling rate of 50%, more traces of the latter service could be preserved, | ||
as opposed to running the risk of no traces at all. | ||
* In cases where a parent `Span ` is **not** sampled **but** its child (or linked-to `Span`) | ||
wants to sample, knowing the calling service **may** help with the sampling decision. | ||
Right now only a parent span id is available in such case. | ||
|
||
In deployment scenarios where context is properly propagated through **all** the services, | ||
but not all of them are traced (i.e. use the no-op implementation), it would be helpful to know what services | ||
were actually part of the request, even if not observed. This could help with confusion and false | ||
positives. Observe this cannot be currently | ||
computed at the backend with OTel, as non-traced systems will simply send no telemetry | ||
whatsoever. See https://github.com/w3c/trace-context/issues/550 | ||
|
||
## Trade-offs and mitigations | ||
|
||
Given the `TraceState` [lenght contrains](https://www.w3.org/TR/trace-context/#tracestate-header) | ||
we may decide to trim the service name up to a given length. | ||
|
||
In case propagating `peer.service` ever represents a privacy or security concern, | ||
consider hashing the `peer.service` values, and provide a dictionary to interpret them | ||
by the Collector and backends. | ||
|
||
## Prior art and alternatives | ||
|
||
Using `Baggage` to **automatically** propagate `service.name` was explored. | ||
It would consist of two parts: | ||
|
||
* Explicit `Resource`'s `service.name` propagation using `Baggage` | ||
at **request** time. Instrumentation libraries would need to include | ||
an option to perform such propagation (potentially false by default, | ||
in order to keep the current behavior). Caveat is that `Resource` is an SDK item, while | ||
instrumentation is expected to solely rely on the API. | ||
* Either explicit handling on the server-side instrumentation (similar to how | ||
it's proposed using `TraceState` above, but relying on `Baggage ` instead), or | ||
specialized processors that automatically enrich `Spans` with `Baggage` values, | ||
as shown below: | ||
|
||
```java | ||
public class BaggageDecoratingSpanProcessor implements SpanProcessor { | ||
public BaggageDecoratingSpanProcessor(SpanProcessor processor, Predicate<Span> predicate, Set<String> keys) { | ||
this.processor = processor; | ||
this.predicate = predicate; | ||
this.keys = keys; | ||
} | ||
|
||
public void onStart(Context context, ReadWriteSpan span) { | ||
this.processor.onStart(context, span); | ||
if (predicate.test(span)) { | ||
Baggage baggage = Baggage.current(); | ||
keys.forEach(key -> { | ||
String value = baggage.getEntryValue(key); | ||
if (value != null) { | ||
span.setAttribute(key, baggage.getEntryValue(key)) | ||
} | ||
}) | ||
} | ||
} | ||
} | ||
``` | ||
|
||
The `TraceState` alternative was preferred as `Baggage` has general, | ||
application-level propagation purposes, whereas `TraceState` can be used | ||
by observability purposes, along with the fact that accessing `Resource` | ||
from instrumentation is not feasible. | ||
|
||
## Open questions | ||
|
||
* At the moment of writing this OTEP only `peer.service` is defined (which | ||
relies on `service.name`). However, semantic conventions also define | ||
`service.version`, `service.instance.id` and `service.namespace`, | ||
which may provide additional details. Given the contraints of memory | ||
and bandwidth (for both `TraceState` and `Baggage`) we will decide | ||
in the future whether to propagate these additional values or not. | ||
|
||
## Future possibilities | ||
|
||
Logging and metrics can be augmented by using automatic `peer.service` propagation, | ||
in order to hint at a parent-child (or client-server) relationship, given they do not | ||
include such information as part of their data models: | ||
|
||
* Logs can optionally be converted to traces if a hierarchy or dependency map is desired, | ||
but augmenting them with `peer.service` could be done as an intermediate step. | ||
* Metrics could use `peer.service` as an additional dimension that helps performing filtering | ||
based on related services, for example. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The motivation does not read convincing to me. Why does a service need to know who called it? If this is part of a transitive root cause isolation workflow, then you just use distributed traces for that. If this is about some business-specific behavior depending on who called you, e.g. multi-tenant behavior. then I think this mechanism is quite inappropriate - relying on a deployment/infra name of the caller is a pretty narrow use case, not suitable for general purpose multi-tenancy. So please describe Users and Jobs To Be Done of this feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could see this being helpful in two scenarios in my work: