Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify usage of Distributed Tracing Extension by OpenTelemetry #912

Closed
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
70 changes: 32 additions & 38 deletions extensions/distributed-tracing.md
Original file line number Diff line number Diff line change
@@ -1,55 +1,49 @@
# Distributed Tracing extension

This extension embeds context from
[Distributed Tracing](https://w3c.github.io/trace-context/) so that distributed
systems can include traces that span an event-driven system. This extension is
meant to contain historical data of the parent trace, in order to diagnose
eventual failures of the system through tracing platforms like Jaeger, Zipkin,
etc.
[W3C TraceContext](https://www.w3.org/TR/trace-context/) into a CloudEvent.
The goal of this extension is to offer means to carry context when instrumenting
CloudEvents based systems with OpenTelemetry.

The [OpenTelemetry](https://opentelemetry.io/) project is a collection
of tools, APIs and SDKs that can be used to instrument, generate, collect,
and export telemetry data (metrics, logs, and traces) to help you
analyze your software’s performance and behavior.

The OpenTelemetry specification defines both
[Context](https://github.com/open-telemetry/opentelemetry-specification/blob/v1.8.0/specification/context/context.md#overview)
and
[Distributed Tracing](https://github.com/open-telemetry/opentelemetry-specification/blob/v1.8.0/specification/overview.md#tracing-signal)
as:

> A `Context` is a propagation mechanism which carries execution-scoped values across
API boundaries and between logically associated execution units. Cross-cutting
concerns access their data in-process using the same shared `Context` object.
>
> A `Distributed Trace` is a set of events, triggered as a result of a single
logical operation, consolidated across various components of an application.
A distributed trace contains events that cross process, network and security boundaries.

## Using the Distributed Tracing Extension

The
[OpenTelemetry Semantic Conventions for CloudEvents](https://github.com/open-telemetry/opentelemetry-specification/blob/v1.9.0/specification/trace/semantic_conventions/cloudevents.md)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get a 404 from this link - is this is placeholder for a new doc that's in-the-works?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's a placeholder for when it's merged open-telemetry/opentelemetry-specification#1978

define in which scenarios this extension should be used and how to use it.

## Attributes

#### traceparent
### traceparent

- Type: `String`
- Description: Contains a version, trace ID, span ID, and trace options as
defined in [section 3.2](https://w3c.github.io/trace-context/#traceparent-header)
defined in [section 3.2](https://www.w3.org/TR/trace-context/#traceparent-header)
- Constraints
- REQUIRED

#### tracestate
### tracestate

- Type: `String`
- Description: a comma-delimited list of key-value pairs, defined by
[section 3.3](https://w3c.github.io/trace-context/#tracestate-header).
[section 3.3](https://www.w3.org/TR/trace-context/#tracestate-header).
- Constraints
- OPTIONAL

## Using the Distributed Tracing Extension

The Distributed Tracing Extension is not intended to replace the protocol specific headers for tracing,
like the ones described in [W3C Trace Context](https://w3c.github.io/trace-context/) for HTTP.

Given a single hop event transmission (from sink to source directly), the Distributed Tracing Extension,
if used, MUST carry the same trace information contained in protocol specific tracing headers.

Given a multi hop event transmission, the Distributed Tracing Extension, if used, MUST
carry the trace information of the starting trace of the transmission.
In other words, it MUST NOT carry trace information of each individual hop, since this information is usually
carried using protocol specific headers, understood by tools like [OpenTelemetry](https://opentelemetry.io/).
Comment on lines -33 to -39
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm reading this part and I don't think the goal of this extension and OTEL are the same.

I think the goal of this extension is to do Event Lineage.
In lineage, we care only about event processing hopes, while OTEL traces every remote call, even middlewares that don't impact the original event.

I think we are changing the meaning of the extension more than clarifying it.

Copy link
Contributor Author

@joaopgrassi joaopgrassi Nov 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @pierDipi thanks for the input.

The semantic conventions we are trying to define in OTel open-telemetry/opentelemetry-specification#1978 are not defining tracing of the transport layer (middlewares, http transport and etc). We are rather focused on tracing the "application-layer", meaning what happened after I sent this event. This seems to align well with the link you mentioned:

An Event lineage is a path in the event processing network that enables the tracing of the consequences of an event, or finding the reasons for a certain action.

The way we are planning to achieve this is by: Every time an event is created, a Span is created. Then, we get this Span and add it to the event's Distributed Tracing Extension.

Then, on every place this event arrives, a Processing span is created. This processing span is then connected to the create span by adding the context from the Distributed Tracing Extension to it as a Link

With this, you would be able to see all operations that happened because of the initial create operation.

Regarding the old text on this spec :

Given a single hop event transmission (from sink to source directly), the Distributed Tracing Extension,
if used, MUST carry the same trace information contained in protocol specific tracing headers.

This can't be true, because if we create a span to track the "event creation" we'll have context A. Then when sending the event, say via HTTP, if the app is auto-instrumented, then there will be another span created for the physical send operation, so context B. The physical send operation is "outside" of the apps domain and we can't really modify the event at that point, to inject the same context (unless the auto-instrumentation understands CloudEvents and does this).

Given a multi hop event transmission, the Distributed Tracing Extension, if used, MUST
carry the trace information of the starting trace of the transmission.

"starting trace of the transmission" is what the OpenTelemetry conventions is modeling with the "Create Span"

Once the trace context is set on the event, it MUST not be modified.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is a use-case with an app:
image

And this is what a trace will look like in Jaeger

Producer
image

Consumer
image

Note the consumer link. If you click that you go to the Create span. Links are not yet well supported by many back-ends, but we hope they will catch up and present this in a more linear way. @lmolkova had a similar example using Azure monitor where links are showed below the create, as you would expect.


Middleware between the source and the sink of the event could eventually add a Distributed Tracing Extension
if the source didn't include any, in order to provide to the sink the starting trace of the transmission.

An example with HTTP:

```bash
CURL -X POST example/webhook.json \
-H 'ce-id: 1' \
-H 'ce-specversion: 1.0' \
-H 'ce-type: example' \
-H 'ce-source: http://localhost' \
-H 'ce-traceparent: 00-0af7651916cd43dd8448eb211c80319c-b9c7c989f97918e1-01' \
-H 'traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01' \
-H 'tracestate: rojo=00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01,congo=lZWRzIHRoNhcm5hbCBwbGVhc3VyZS4`
```