Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distributed tracing support #5078

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

CodeBlanch
Copy link

This PR adds a new pipeline which will listen to DiagnosticSourceEventSource in order to receive Activity instances (aka spans) from a target process.

The goal is to use this in dotnet-monitor to export distributed traces (along with logs & metrics) using OpenTelemetry Protocol (OTLP).

/cc @noahfalk @samsp-msft @rajkumar-rangaraj @wiktork

@CodeBlanch CodeBlanch requested a review from a team as a code owner December 3, 2024 21:51
_ActivitySourceNames = activitySourceNames?.ToArray() ?? Array.Empty<string>();

if (_SamplingRatio < 1D)
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should be doing something this heavyweight and async in the ctor. If we want to validate this only works in version 9, we can fail the pipeline later or have the validation done upfront in a helper async method.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved it to a spot that felt more natural. Check it out, LMK.


namespace Microsoft.Diagnostics.Monitoring.EventPipe
{
internal readonly struct ActivityData
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very large struct object. Perhaps a class or record is more suitable? I have not looked at usage yet.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a super big struct! But that doesn't concern me. It is handled correctly, passed by reference (out or in in our cases). What this code is trying to do is not create unneeded GC pressure in the monitoring app. It doesn't have to do that, but that was my thinking 😄

internal static partial class TraceEventExtensions
{
[ThreadStatic]
private static KeyValuePair<string, object?>[]? s_TagStorage;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be string/string?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be. Today everything pumped through DiagnosticSourceEventSource is in fact a string. But I went this direction because that is sort of an issue. When it comes to OTel distributed tracing, there are semantic conventions. Something like http.response.status_code should be an int to be compliant with the spec. I went with object so it could be fixed in the future if we add some typing info to the events or perhaps some conversion in the pipeline 🤷

status,
statusDescription);

payload.Tags = new(s_TagStorage, 0, tagCount);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I am understanding correctly, the tags are cumulative? So payload n contains the tags from 0 to n? Perhaps it would be easier to just associate the tags with the event and then aggregate them as needed. The other consideration here is multiple sessions. If you start reading distributed tracing, fail and then start again the tag list will never reset.

Copy link
Author

@CodeBlanch CodeBlanch Dec 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no aggregation of tags. Each span/Activity will just have whatever tags were added to it. What is going on here is I have some [ThreadStatic] storage for those tags. They are read off the EventSource and added to that storage. Then we hand out a ReadOnlySpan to that storage. So the consumer can get them, but can't move it off the stack. Same kind of idea as the big ActivityData struct. The goal here is to avoid having to allocate a list or array to store the tags for each span/Activity we receive.


namespace Microsoft.Diagnostics.Monitoring.EventPipe
{
internal class TracesPipeline : EventSourcePipeline<TracesPipelineSettings>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think DistributedTracePipeline would work here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed it "DistributedTracesPipeline". Plural "Traces" because it felt better and we had plural in "Logs" pipeline. But happy to drop the "s" if you feel strongly about it.


if (!s_Sources.TryGetValue(sourceName, out source))
{
source = new(sourceName, sourceVersion);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what happens if there are two events that have the same source name but different versions?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll just report the version of the first one we saw. It is kind of an edge case thing. I doubt multiple sources with different versions really happens in practice (it could) but even if it did I don't think there would be big impact. We could make the key a tuple if you are concerned. Or we could do it later if anyone raises an issue?

@samsp-msft
Copy link
Member

I am pretty sure @noahfalk is OOF now for the holidays. @tarekgh may be able to review, or @tommcdon may have somebody else who can review?

@tommcdon
Copy link
Member

I am pretty sure @noahfalk is OOF now for the holidays. @tarekgh may be able to review, or @tommcdon may have somebody else who can review?

@wiktor can review the change and if the change is ready, we will merge into main. Then @noahfalk can optionally perform a post-checkin review in January.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants