Skip to content

Add new tracing properties for .NET E2E traces. #44

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

philliphoff
Copy link
Member

@philliphoff philliphoff commented May 9, 2025

Using the azure managed backend (i.e. Durable Task Scheduler) with .NET, in both the case of DTFx and Portable SDKs, the gRPC API is missing properties needed to generate spans for a complete orchestration execution. This change adds those missing properties.

The DTFx SDK expects the a (native) ExecutionStartedEvent to contain a parent trace context that exposes a tuple with the orchestration tracing activity <ID, span ID, start time>. This tuple is generated on initial execution of the orchestration, set on the ExecutionStartedEvent, and then expected to be passed back on subsequent executions. These properties enable the SDK to reuse a .NET tracing Activity that represents then entire orchestration, even though they are, in fact, separate instances. The gRPC ExecutionStarted event today contains only the orchestrationSpanID which, AFAIK, is not sufficient to duplicate a span.

The gRPC TaskScheduled event given to workers contains a parentTraceContext property which enables workers to parent traces related to task execution. However, the gRPC ScheduleTaskAction returned by the worker that generates subsequent TaskScheduled events does not have a parent trace context property. This prevents tasks traces from being parented to orchestration-level traces.

The gRPC API does not return the ExecutionStarted event to the backend (as in the case of DTFx). This means that event cannot be used to return orchestration-related tracing properties to the backend to be returned on subsequent executions. Instead, I propose those properties be returned via the OrchestratorResponse object directly. The backend could then inject them into the ExecutionStarted event on subsequent execution in order to be consistent with and reuse properties used by DTFx. (Alternatively, those properties could be added to OrchestratorRequest.)

@philliphoff philliphoff requested a review from cgillum May 28, 2025 17:37
@cgillum
Copy link
Member

cgillum commented May 28, 2025

Thanks for the PR. I have a few questions about the problem being solved and whether we really need this PR.

The DTFx SDK expects the a (native) ExecutionStartedEvent to contain a parent trace context that exposes a tuple with the orchestration tracing activity <ID, span ID, start time>. This tuple is generated on initial execution of the orchestration, set on the ExecutionStartedEvent, and then expected to be passed back on subsequent executions. These properties enable the SDK to reuse a .NET tracing Activity that represents then entire orchestration, even though they are, in fact, separate instances. The gRPC ExecutionStarted event today contains only the orchestrationSpanID which, AFAIK, is not sufficient to duplicate a span.

I'd like to better understand why orchestrationSpanID is not sufficient to duplicate the span. The parent trace context already has the trace ID and the start time. The only thing needed to reconstruct the span should be the span ID, which we capture.

The gRPC TaskScheduled event given to workers contains a parentTraceContext property which enables workers to parent traces related to task execution. However, the gRPC ScheduleTaskAction returned by the worker that generates subsequent TaskScheduled events does not have a parent trace context property. This prevents tasks traces from being parented to orchestration-level traces.

It sounds like you're suggesting that we'd parent subsequent tasks to an activity's trace context. But TaskScheduled events should always be parented by the orchestration's trace context. I don't think there's a scenario where we need to save the trace context of an activity execution. Let me know if I'm misunderstanding.

The gRPC API does not return the ExecutionStarted event to the backend (as in the case of DTFx). This means that event cannot be used to return orchestration-related tracing properties to the backend to be returned on subsequent executions. Instead, I propose those properties be returned via the OrchestratorResponse object directly. The backend could then inject them into the ExecutionStarted event on subsequent execution in order to be consistent with and reuse properties used by DTFx. (Alternatively, those properties could be added to OrchestratorRequest.)

The cases where DTFx would send an ExecutionStarted event are covered by CreateInstanceRequest for new orchestrations and CreateSubOrchestrationAction for sub-orchestrations. We already provide parent trace context for CreateInstanceRequest but I see that we're missing this information for CreateSubOrchestrationAction, so my expectation is that we'd add parent trace context information just to this action. You didn't mention either of these scenarios explicitly, though, so let me know if you see this differently.

Feel free to schedule some time for us to discuss directly if more context-sharing is needed.

Copy link
Member

@cgillum cgillum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make sure we're on the same page regarding the requirements before merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants