-
-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crons: Linking monitors to errors #43647
Comments
Routing to @getsentry/crons for triage, due by Thu, January 26th at 13:08 (sfo). ⏲️ |
This adds `SENTRY_MONITOR_ID` to the environ of the executing command. This is in support of getsentry/sentry#43647 and will allow SDKs embedded in the executing command to associate the monitor context to any events produced by the SDK.
This is a change to this proposal:
For the combination of dynamic sampling and trace propagation to work we need three pieces of information:
We have defined ways how this information is carried in HTTP headers, Celery tasks and I believe kafka headers. I do not believe we have defined a format for how this is to be transmitted in environment variables but that should be trivial enough. I believe it's probably a good idea to just define environment variables and to use the http serialization. This would imply: SENTRY_BAGGAGE="other-vendor-value-1=foo;bar;baz, sentry-trace_id=771a43a4192642f0b136d5159a501700, sentry-public_key=49d0f7386ad645858ae85020e393bef3, sentry-sample_rate=0.01337, sentry-user_id=Am%C3%A9lie, other-vendor-value-2=foo;bar;"
SENTRY_TRACE="771a43a4192642f0b136d5159a501700-6b945b814e6c2434-1" The rest of the system would work the same way as documented. The monitor id itself could become part of the baggage with |
Had some discussion with @cleptric and @AbhiPrasad for the SDK side of this. It's looking like the trace propagation part of this plan is a bit more work than expected. Right now the trace identifier needs to be associated to a span in order to propagate in the SDK, there is no mechanism to hold the trace ID in any sort of context without a span. SDKs that implement tracing will always either start a trace from a new transaction/root-span OR will continue a trace and immediately start a new span (such as when continuing from a frontend request) Because a monitor run should not start a new span, there's currently no way to propagate a Trace ID from the CLI. I see two paths forward for associating the checkin with errors:
I am leaning towards implementing number two while we work towards number one. We're already using the |
@mitsuhiko do we have ETA on when we could solve the trace ID propagation? goal is to push weekly-level progress w/ Crons given its stage, so if thats going to take us ages we need to find a faster path to the cron linking (so maybe in near term we just do this monitor thing.. but even an SDK release still takes time so hoping we could accelerate) |
The RFC for the trace propagation was merged. Maybe @antonpirker can give some updates here on how long it takes to implement this for Python. https://github.com/getsentry/rfcs/blob/main/text/0071-continue-trace-over-process-boundaries.md |
We'll also need to implement some type of trace generation within Sentry CLI. I assume the rust SDK doesn't already expose anything to do that right? |
Going to move this back into planned. We will be focusing on an improved ingestion experience which will allow this work to be done in a simpler way without requiring each SDK to be updated to send monitor contexts. |
Routing to @getsentry/product-owners-crons for triage ⏲️ |
Sorry for not answering your request @mitsuhiko Documentation is still missing, I think (at least I was not able to find it right now...) |
Problem Statment
When using the sentry Crons feature errors produced during execution of a monitored task should be associated to the monitor ID. We should also associate errors and transactions to the specific check-in that is reported.
What exists now?
Currently monitors support linking to errors that occur during execution via the
monitor.id
tag.This tag is promoted from the
MonitorContext
.Right now we are manually setting up this context in a few different places.
When a checkin fails it sets the context on the created “checkin failed” error event
🤔 Note that it also manually sets the
monitor.id
in the tags. I believe this may be a mistake since the context promotes theid
to a tagThe sentry monitor utils that are used in our celery tasks set the monitor context to associate errors to the monitor
So what’s the problem
This works but it has a few issues
monitor.id
context is manually set, there is no association of error to checkin.Proposed checkin association strategy
We should use our Trace ID to associate checkins to errors and transactions, realistically this means doing the following
A new
trace_id
UUID column should be added to theMonitorCheckin
table. This should be indexed as we will use it to lookup associated checkinsThe
monitor
context should continue to exist on errors so that we can easily query for errors that we know are part of a monitorThe
trace_id
should be provided as part of the checkin API request. It will not be required🤔 Open Question: Should we generate a trace ID if they no not pass it, and if so should we return that as part of the result API response.
Proposed integration with
sentry-cli
and SDKsBoth the checkin and SDKs producing errors and traces must be aware of the monitor context and the Trace ID being used for the monitor run. Here’s what needs to happen
sentry-cli
should generate a Trace ID and send that with the checkins.sentry-cli monitor run <monitor_id> -- <command>
should set two environment variables during execution of<command>
SENTRY_TRACE_ID
: The Trace ID generated by the CLIfeat(monitors): Pass SENTRY_TRACE_ID down execution path sentry-cli#1441
SENTRY_MONITOR_ID
: The Monitor UUIDfeat(monitors): Pass SENTRY_MONITOR_ID to executing process sentry-cli#1438
SENTRY_TRACE_ID
should be used in place of the SDK generating it’s own trace IDSENTRY_MONITOR_ID
should be used to setup the monitor context of any events that occur during executionfeat(crons): Add CronMonitorIntegration sentry-python#1866
Updates
SENTRY_TRACE_ID
is more involved than the scope of this ticket, see Crons: Linking monitors to errors #43647 (comment)The text was updated successfully, but these errors were encountered: