Skip to content

Mobile crash support #191

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 54 commits into
base: main
Choose a base branch
from

Conversation

LikeTheSalad
Copy link
Contributor

@LikeTheSalad LikeTheSalad commented May 23, 2025

Fixes https://github.com/elastic/opentelemetry-dev/issues/831

Adds log crash event support by adding the following attributes to the logs with the event.name attribute or eventName field value set to device.crash:

  • agent.name (as a resource attribute)
  • processor.event
  • timestamp.us
  • error.grouping_key
  • error.type

The above attributes, along with the exception.* ones that come from the OTel conventions and should already be present on these kinds of logs, are what Kibana queries to properly display mobile crash events.

NOTE: Some of the logic needed for this to work was already present for traces, such as setting the agent.name resource attribute for example. To avoid duplicating code, I moved the common tools to a "common" dir.

@LikeTheSalad LikeTheSalad marked this pull request as ready for review May 23, 2025 10:03
@LikeTheSalad LikeTheSalad requested review from a team as code owners May 23, 2025 10:03
"regexp"
)

func CreateGroupingKey(stacktrace string) string {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the same grouping algorithm being used in the legacy collector? I'm concerned the iOS version will always be unique, because the stacktrace string in iOS contains unique keys.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the same grouping algorithm being used in the legacy collector?

It's not. There is a precondition in the legacy collector around stacktraces, which is that they're parsed before creating a hash for them. Since we aren't parsing stacktraces here, I didn't follow the same logic.

I'm concerned the iOS version will always be unique, because the stacktrace string in iOS contains unique keys.

Certainly this logic only works for Java stacktraces, so I think we should apply it based on the telemetry.sdk.language value when it's set to java. If you can show me what logic is needed for iOS stacktraces, I think I can add it into this PR for when the language attr is set to swift.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just updated the logic to calculate a grouping key only for Java stacktraces, that way it won't affect Swift errors.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll probably have to add a parser to pull out the crashing thread out of the iOS stacktrace to generate a grouping key.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll probably have to add a parser to pull out the crashing thread out of the iOS stacktrace to generate a grouping key.

I'm not exactly sure how much work that would take, and/or if it all needs to be done in the EDOT Collector. For some reason it sounds like it's not straightforward, though please correct me if I'm wrong.

In any case, I think the only requirement for groupingkey is to be unique per crash. For java stacktraces, it means removing some parts that can change even for the same crash, such as the error message. The reason I mention this is to see if there might be another approach for iOS that might be simpler (for example, if crashes there don't have messages blended with the stacktrace, then maybe we could just hash the stacktrace as is). Is there something like that that comes to your mind which we could take advantage of for creating a unique key, @bryce-b ?

}

func curateStacktrace(stacktrace string) string {
unwantedPattern := regexp.MustCompile(`(:\s.+)|[\r\n\s]+`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe something like this so that it turns Caused by: java.lang.IllegalArgumentException: The arg 'abc' is not valid into Caused by: java.lang.IllegalArgumentException

Suggested change
unwantedPattern := regexp.MustCompile(`(:\s.+)|[\r\n\s]+`)
unwantedPattern := regexp.MustCompile(`(:\s.+$)|[\r\n\s]+`)

Also, the original algorithm in APM Server strips line numbers so that the grouping key stays consistent if there are minor changes to a file in a new version. Maybe replicate that as well by turning at org.example.SomeClass.callingOtherMethod(SomeClass.kt:16) into at org.example.SomeClass.callingOtherMethod(SomeClass.kt)

Copy link
Contributor Author

@LikeTheSalad LikeTheSalad May 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I've updated it to remove line numbers and also to leave the root cause exception on the Caused by... lines. I tried to do it all with a single regex pattern, but the logic to remove the line number forced me to use multiple patterns 😅 in any case, it should be all good now.

)

func CreateGroupingKey(stacktrace string) string {
hashBytes := sha256.Sum256([]byte(curateStacktrace(stacktrace)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we don't need cryptographic properties here we should use a non-cryptographic hashing function (e.g. xxhash).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. No need for cryptographic properties, just a low collision hashing function. I'll add the changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just added the changes to replace sha256 with xxhash.

}

func curateStacktrace(stacktrace string) string {
errorOrCausePattern := regexp.MustCompile(`^((?:Caused\sby:\s[^:]+)|(?:[^\s][^:]+))(:\s.+)?$`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should probably move these out of the method to avoid compiling them on each call

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this is done now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants