Skip to content
This repository has been archived by the owner on Dec 6, 2024. It is now read-only.

Automatic propagation of peer.service #247

Closed
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
176 changes: 176 additions & 0 deletions text/trace/0247-peer-service-propagation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,176 @@
# Automatic Peer Service Name propagation

Automatic propagation of `peer.service` through `TraceState``.

## Motivation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The motivation does not read convincing to me. Why does a service need to know who called it? If this is part of a transitive root cause isolation workflow, then you just use distributed traces for that. If this is about some business-specific behavior depending on who called you, e.g. multi-tenant behavior. then I think this mechanism is quite inappropriate - relying on a deployment/infra name of the caller is a pretty narrow use case, not suitable for general purpose multi-tenancy. So please describe Users and Jobs To Be Done of this feature.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could see this being helpful in two scenarios in my work:

  1. Sampling rule where you look at a given event and can see that its caller is X. For example, filtering out a noisy branches in a trace but that caller is something you want to keep.
  2. Folks who want to get this information eventually via tracing, but can't today, and so if there's an easier way to "add otel" without fully adopting tracing and getting this caller info, that'd be helpful for them.


Knowing the service name on the other side of a remote call is valuable
troubleshooting information. The semantic conventions represent this via
`peer.service`, which needs to be manually populated. In a deployment scenario,
when a new service is added, all the existing services interacting with it
need to update `peer.service`, which is error-prone and may
become unreliable, making it eventually obsolete.

This information can be effectively derived in the backend using the
`Resource` of the parent `Span`, but is otherwise not available
at Collector processing time, where it could be used for transformation
purposes or sampling (e.g. adaptive sampling based on the calling service).

As metrics and logs do not have defined a parent-child relationship, using
`peer.service` could help gaining insight into the remote service as well.

Defining (optional) automated population of `peer.service` will greatly help
adoption of this attribute by users and vendors explicitly interested in this
scenario.

## Explanation

SDKs will define an optional feature, disabled by default,
to read the `service.name` attribute of the related `Resource` and set it
in the spans' `TraceState` as described in
[trace state handling](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/tracestate-handling.md)
specifically using the `us` subkey (denoting **upstream service**):

```
ot=us:MyService
```

Instrumentation and processors are then free to use this information to set
`peer.service` and perform other related processing.

## Internal details

SDKs will disable by default this option, maintaining the current behavior.
When the feature is explicitly enabled by the user, spans will include
an additional entry in `TraceState` as described above. By doing this,
the user acknowledges the additional cost in memory and bandwidth.

Span creation will be updated like this:

```java
//
// SpanBuilder.startSpan()
//
if (tracerSharedState.propagateServiceName) {
String serviceName = tracerSharedState.getResource().getAttribute(PEER_SERVICE);
traceState = addServiceNameToTracerState(traceState, serviceName);
}
// Use the updated `traceState` to create the new SpanContext.
```

Server-side instrumentation (e.g. http servers, gRPC on the receiver side)
can then be updated to use the propagated context to look for the `us` subkey
in `TraceState`, and if it exists, use it to set `peer.service` on the local `Span`:

```java
//
// Incoming request handling.
//
try (Scope scope = extractRemoteContext(headers).makeCurrent()) {
SpanBuilder spanBuilder = tracer.spanBuilder("server-span");

TraceState remoteTraceState = Span.current()
.getSpanContext()
.getTraceState();
String peerServiceName = getUpstreamServiceName(remoteTraceState);
if (peerServiceName != null) {
spanBuilder.setAttribute(PEER_SERVICE, peerServiceName);
}
}
```

With `peer.service` present in server spans, further processing, filtering and sampling can
then be accomplished in the Collector, e.g. a preview of the dependency map of a service,
similar in spirit to zPages could be created.

### Use scenarios

Sampling can benefit from knowing the calling service, specifically:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As sampling is mentioned as a primary usage scenario for this feature, I wonder if would make sense to bundle this together with other sampling related values for which there's a proposal to propagate them via trace state: #235

All those values could then be consistently used and populated by samplers, one wouldn't need to invent a new configurable mechanism in SDKs (at least for the client side).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This value makes sense to propagate with the sampling-related attributes in the tracestate, which are now covered in open-telemetry/semantic-conventions#793. Still, I see it as an independent property.


* An adaptive sampler may decide to sample or not based on the calling service, e.g.
given Service A amounting to 98% of requests, and Service B amounting to 2% only,
with a default sampling rate of 50%, more traces of the latter service could be preserved,
as opposed to running the risk of no traces at all.
* In cases where a parent `Span ` is **not** sampled **but** its child (or linked-to `Span`)
wants to sample, knowing the calling service **may** help with the sampling decision.
Right now only a parent span id is available in such case.

In deployment scenarios where context is properly propagated through **all** the services,
but not all of them are traced (i.e. use the no-op implementation), it would be helpful to know what services
were actually part of the request, even if not observed. This could help with confusion and false
positives. Observe this cannot be currently
computed at the backend with OTel, as non-traced systems will simply send no telemetry
whatsoever. See https://github.com/w3c/trace-context/issues/550

## Trade-offs and mitigations

Given the `TraceState` [lenght contrains](https://www.w3.org/TR/trace-context/#tracestate-header)
we may decide to trim the service name up to a given length.

In case propagating `peer.service` ever represents a privacy or security concern,
consider hashing the `peer.service` values, and provide a dictionary to interpret them
by the Collector and backends.

## Prior art and alternatives

Using `Baggage` to **automatically** propagate `service.name` was explored.
It would consist of two parts:

* Explicit `Resource`'s `service.name` propagation using `Baggage`
at **request** time. Instrumentation libraries would need to include
an option to perform such propagation (potentially false by default,
in order to keep the current behavior). Caveat is that `Resource` is an SDK item, while
instrumentation is expected to solely rely on the API.
* Either explicit handling on the server-side instrumentation (similar to how
it's proposed using `TraceState` above, but relying on `Baggage ` instead), or
specialized processors that automatically enrich `Spans` with `Baggage` values,
as shown below:

```java
public class BaggageDecoratingSpanProcessor implements SpanProcessor {
public BaggageDecoratingSpanProcessor(SpanProcessor processor, Predicate<Span> predicate, Set<String> keys) {
this.processor = processor;
this.predicate = predicate;
this.keys = keys;
}

public void onStart(Context context, ReadWriteSpan span) {
this.processor.onStart(context, span);
if (predicate.test(span)) {
Baggage baggage = Baggage.current();
keys.forEach(key -> {
String value = baggage.getEntryValue(key);
if (value != null) {
span.setAttribute(key, baggage.getEntryValue(key))
}
})
}
}
}
```

The `TraceState` alternative was preferred as `Baggage` has general,
application-level propagation purposes, whereas `TraceState` can be used
by observability purposes, along with the fact that accessing `Resource`
from instrumentation is not feasible.

## Open questions

* At the moment of writing this OTEP only `peer.service` is defined (which
relies on `service.name`). However, semantic conventions also define
`service.version`, `service.instance.id` and `service.namespace`,
which may provide additional details. Given the contraints of memory
and bandwidth (for both `TraceState` and `Baggage`) we will decide
in the future whether to propagate these additional values or not.

## Future possibilities

Logging and metrics can be augmented by using automatic `peer.service` propagation,
in order to hint at a parent-child (or client-server) relationship, given they do not
include such information as part of their data models:

* Logs can optionally be converted to traces if a hierarchy or dependency map is desired,
but augmenting them with `peer.service` could be done as an intermediate step.
* Metrics could use `peer.service` as an additional dimension that helps performing filtering
based on related services, for example.