Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow for recording an error in spans without throwables #939

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ideskov
Copy link

@ideskov ideskov commented Jan 7, 2025

Hey! It would be really nice if we could attach error events to spans without having to pass Throwables around. This is an attempt to cater for a more functional style of programming where exceptions are avoided and errors are part of methods' signatures.

I'm unsure whether this.delegate.tag("error", message); is enough for the Brave bridge. I didn't find many options in the native API.

Thanks for considering this proposal!

@jonatan-ivanov
Copy link
Member

jonatan-ivanov commented Jan 7, 2025

Thank you for the PR! Do you have a real use-case for this? If so, could you please show us what you are trying to do?

What concerns me that both OTel and Brave have their own mechanisms to signal an error which is possible through passing a Throwable and this change bypasses those mechanisms so it could be an endless source of issues since we don't know what the methods that are not called are doing.

If you want the "normal" behavior of these libraries I think you can do something like this:

span.error(new MyCustomError(message));

You don't need to throw the exception, it can be used as a wrapper around the error message and the stacktrace (and other details you want to attach).

If you want just an event (won't fail the span):

span.event("error: oops!");

@shakuzen @marcingrzejszczak What do you think? Has something like this requested before?

@ideskov
Copy link
Author

ideskov commented Jan 8, 2025

Do you have a real use-case for this?

Sure, our code base is mainly in Kotlin utilising this functional companion library arrow-kt. In essence, we don't work with exceptions at all. Instead, we define possible errors as part of function signatures utilising the Either type, e.g.:

fun createPayment(request: PaymentRequest): Either<PaymentError, Payment> { ... }

So what this means is if we want to record an error in a span, we need to artificially create an exception, like so:

span.error(RuntimeException("invalid payment"))

leaving us with a stacktrace in the span which is completely useless and only creates noise in Honeycomb.

Ideally, what we want is to attach an event to the span with a helpful attribute indicating what the error was and set the span status to indicate that there's been an error. Setting the span status is significant as we don't want these to be sampled.

Edit: PaymentError in the example above isn't a Throwable. Usually that would be a sealed type extended by either objects or data classes.

@marcingrzejszczak
Copy link
Contributor

I think that setting an event + customizing OTel components to mark the status as error when an event with a given text was found would be the way to go for now.

I don't recall anyone asking for this really.

@ideskov
Copy link
Author

ideskov commented Jan 8, 2025

I think that setting an event + customizing OTel components to mark the status as error when an event with a given text was found would be the way to go for now.

Sorry, I'm confused. Recording an event + marking the span status as error is what the proposed change does. Does your statement mean that you're opposing the change and clients of micrometer should rather switch to using the native otel API?

@marcingrzejszczak
Copy link
Contributor

I think that setting an event + customizing OTel components to mark the status as error when an event with a given text was found would be the way to go for now.

I'm saying that this would be the way to go without this change. With this change we're adding new API and we're always trying to be cautious about that

Copy link

If you would like us to look at this PR, please provide the requested information. If the information is not provided within the next 7 days this PR will be closed.

@ideskov
Copy link
Author

ideskov commented Jan 21, 2025

@jonatan-ivanov have you got any additional thoughts/concerns around this change?

@marcingrzejszczak
Copy link
Contributor

I mean I do have concerns. You're the only person asking for this change and we're not keen on adding new API if that's not necessary. There are means to achieve sth similar to what you need with the existing API

@jonatan-ivanov
Copy link
Member

jonatan-ivanov commented Jan 21, 2025

What still concerns me that both OTel and Brave have their own mechanisms to signal an error which is only possible through passing a Throwable and this change bypasses those mechanisms so it could be an endless source of issues since we don't know what the methods that are not called are doing. I think if OTel and Brave would add a separate mechanism to signal an error (some kind of error object instead of Throwable) we can add support for it.

If your main concern is having the stacktrace in Honeycomb, I would check how is it ending up there in the first place since Micrometer Tracing does not add it to the span. If somehow your config/app adds it, can you either disable it or remove it later using a SpanFilter?

@ideskov
Copy link
Author

ideskov commented Jan 22, 2025

Our main concern is we're creating a new exception just to signal an error in the span. The code which handles business logic doesn't actually throw, so we consider creating an exception redundant.

All I'm suggesting is that it is micrometer's Span.error() which calls opentelemetry's Span.recordException() and that's not necessarily the only way to signal an error in an otel span.

What still concerns me that both OTel and Brave have their own mechanisms to signal an error which is only possible through passing a Throwable

As far as I'm concerned otel's recordException() is just syntactic sugar as it's says in the documentation. The real signal is attaching an event to the span which describes the error and setting the span's status to ERROR.

Anyhow, if you still feel like the change doesn't fit micrometer's API and the concepts behind it, I will completely respect that and will think of other ways of resolving the concern outlined above. Thanks for giving this a thought.

This comment was marked as outdated.

@jonatan-ivanov
Copy link
Member

Our main concern is we're creating a new exception just to signal an error in the span. The code which handles business logic doesn't actually throw, so we consider creating an exception redundant.

I totally get that, I consider that redundant too but right now the tracing libraries we support need an exception.

All I'm suggesting is that it is micrometer's Span.error() which calls opentelemetry's Span.recordException() and that's not necessarily the only way to signal an error in an otel span.

The only ways I can see to properly signal an error on the OTel API is through a Throwable.

As far as I'm concerned otel's recordException() is just syntactic sugar as it's says in the documentation. The real signal is attaching an event to the span which describes the error and setting the span's status to ERROR.

I think that syntactic sugar is more like the proper and official way to signal an error in my opinion, I'm not very keen on copying the behavior of that method and then trying to keep it in sync with the OTel SDK. As far as I understand, there is no guarantee that it will not change.

Anyhow, if you still feel like the change doesn't fit micrometer's API and the concepts behind it, I will completely respect that and will think of other ways of resolving the concern outlined above. Thanks for giving this a thought.

I think it would if the underlying API would have the same mechanism. Please feel free to open an issue for OTel and Brave and add a link here, maybe they can add method with a string/custom error object. I would feel much comfortable supporting that than copying the code and then being in an endless battle of keeping it in sync.

As a workaround, other than the two options I mentioned above you can also do something like this:

span.tag(ERROR, "oops!");

and in a SpanFilter (Micrometer Tracing) or SpanHandler (Brave) or SpanProcessor (OTel) you can modify the span the way you want (copy what recordException does).
This can work with span.event("error: oops!");, not just with tag.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants