Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open Telemetry Support #2909

Closed
jeremydmiller opened this issue Jan 15, 2024 · 17 comments
Closed

Open Telemetry Support #2909

jeremydmiller opened this issue Jan 15, 2024 · 17 comments

Comments

@jeremydmiller
Copy link
Member

This is a placeholder for a conversation about what this really means for Marten.

Some initial thoughts about what this could be:

  • It'd be low hanging fruit to automatically detect and set a new DocumentSessionBase's CorrelationId and CausationId from just detecting the current Activity with System.Diagnostics.Activity.Current. That'd be good enough to record metadata on documents and events being added/modified by Marten to tag them to the Otel activities for whatever is doing the work. Wolverine already does this with its Marten integration, but it's done by Wolverine. We could make this baked into Marten itself.
  • Export metrics on the number of events being appended? That could be done by opt in IDocumentSessionListener
  • Export metrics on the number of documents being stored/inserted/updated? Again, that can be done with an IDocumentSessionListener, and maybe that's just a documentation effort to show how to do that
  • Add additional context directly to the current Activity. We could track Marten events as Activity events. We could do separate spans or activities for some queries or all queries. It's heavyweight to do so, but occasionally super helpful for troubleshooting to have the queries correlated to HTTP requests or incoming messages. I'd also opt to make that "opt in". That might require some changes to our execution model rather than trying to do that through the IDocumentSessionListener approach.
@SeanFarrow
Copy link
Contributor

I wonder whether it makes sense to have a separate package Marten.OpenTelemetry that contains the IDocumentSessionListener implementations for points 2 and 3 above. This could also deal with point 1 as well.

In terms of queries, I agree that two options should be made available, I can either specify I want metrics for all queries or just some of them. If the latter is the case, I would think specifying individual query types would be the best option. Whether this makes the cut for version 7 would depend on how many internal changes would be needed to support this, at least that's how I'm reading it.

I'm happy to help where I can as I have a project that uses Open Telemetry and am seriously looking at martin as an option for data storage.

@jeremydmiller
Copy link
Member Author

@SeanFarrow The only reference you need is System.Diagnostics, and that's pretty lightweight. I don't think this justifies a separate package. But also, I don't think it's a breaking change at all, so might slide

@ElanHasson
Copy link
Contributor

ElanHasson commented Jan 19, 2024

While looking up how to do this with Marten, I found this: #2358
Also this: https://event-driven.io/en/set_up_opentelemetry_wtih_event_sourcing_and_marten/

Could serve as a great starting point.

@jeremydmiller you will need references to OpenTelemtry nugets if you're to add extension methods on the types to make integration easier: MeterProviderBuilder, TracerProviderBuilder.

This could be a small package with the extension methods.

@SeanFarrow
Copy link
Contributor

I'd definitely like to take this on.

@jeremydmiller
Copy link
Member Author

@ElanHasson No, you don't. All you need is a reference to System.Diagnostics.DiagnosticSource

@SeanFarrow If you're up for that, we'd take you up on that.

@jeremydmiller
Copy link
Member Author

Ideas

Tracking correlation and causation on all sessions

In the constructor for QuerySession, we could automatically set CausationId and CorrelationId by looking for any current Activity. See this code from Wolverine that does something similar for its MessageBus entry point: https://github.com/JasperFx/wolverine/blob/main/src/Wolverine/Runtime/MessageBus.cs#L16

Doing that gives you the tracing all the way to events or documents saved automatically.

Activity or Span for Executing Queries

    int Execute(NpgsqlCommand cmd);
    Task<int> ExecuteAsync(NpgsqlCommand command, CancellationToken token = new());

    DbDataReader ExecuteReader(NpgsqlCommand command);

    Task<DbDataReader> ExecuteReaderAsync(NpgsqlCommand command,
        CancellationToken token = default);

    DbDataReader ExecuteReader(NpgsqlBatch batch);

    Task<DbDataReader> ExecuteReaderAsync(NpgsqlBatch batch,
        CancellationToken token = default);

    void ExecuteBatchPages(IReadOnlyList<OperationPage> pages, List<Exception> exceptions);
    Task ExecuteBatchPagesAsync(IReadOnlyList<OperationPage> pages,
        List<Exception> exceptions, CancellationToken token);

That'd give you performance numbers based on the span. You can make this opt in so folks that aren't taking advantage of it don't get the little bit of performance hit.

We can catch exceptions and tag activity's as being failed. That might be valuable. Not a replacement for logging, but gives you some metrics

Event Metrics

  • Use System.Diagnostics.Metrics for everything
  • A counter on the number of events being appended? Can make that by event type, tag it by tenant id, aggregate
  • Do that by opt in IDocumentSessionListener?
  • In the async daemon, use a Histogram to track the delta between a projection's progress and the high water mark
  • Counter on just the high water mark
  • Counter on the progress of each projection running

@SeanFarrow
Copy link
Contributor

SeanFarrow commented Feb 12, 2024 via email

@jeremydmiller
Copy link
Member Author

Cool @SeanFarrow!

How do we want to expose the ability to opt-in to IConnectionLifetime activity tracking? Perhaps something on one of the options classes or the DocumentStore?

The configuration on StoreOptions, then the branching code is actually in SessionOptions where it normally creates the IConnectionLifetime

Do we want to provide an easy way for a user to add the opt-in event tagging support that automatically adds the IDocumentSessionListener implementation?

Could definitely add helper methods on StoreOptions that just register the right IDocumentSessionListener

Thanks!

@SeanFarrow
Copy link
Contributor

I'm just starting to work on this now. We can use the current activities root id if it is set for the corelation id, but I was wondering What property of the current activity we want to use as the causation id?

@jeremydmiller
Copy link
Member Author

@SeanFarrow Sorry for the slow response. I say you take the correlation id & causation id from any kind of current activity. Marten is completely passive, so I'd expect us to just use the parent activity, whatever that happens to be.

@SeanFarrow
Copy link
Contributor

OK, I'll start work on this in the next few days.

This may take a bit of time, as I'm dealing with a death in the family.

I'll open a working PR so we can all keep track of what's going on.

@jeremydmiller
Copy link
Member Author

@SeanFarrow Hey Sean, sorry to hear that. Family first. Independent of this, @oskardudycz & I think this won't be a breaking change and can slide to 7.1. Might be the banner change when that hits.

@jeremydmiller jeremydmiller modified the milestones: 7.0.0, 7.1.0 Feb 24, 2024
@jeremydmiller
Copy link
Member Author

@SeanFarrow Any update on this one?

@SeanFarrow
Copy link
Contributor

SeanFarrow commented Mar 22, 2024 via email

@SeanFarrow
Copy link
Contributor

@jeremydmiller I'll open a PR early with the connection lifetime stuff in the next few days. I'm then away for the next week, so will pick this back up around the 8th.
Do we want to merge things in gradually?

@ElanHasson
Copy link
Contributor

HI @SeanFarrow any updates on this?

@SeanFarrow
Copy link
Contributor

@jeremydmiller I am just looking at the document session listener piece of this, for both appending the number of each event type processed, the number of total events and the number of documents being stored.

Do we want to split this in to two separate listeners? It probably makes sense too do this as people may just be using the document storage piece, the event store piece or both together.

Thanks,
Sean.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants