Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add device.crash event semantic convention #1576

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 22 additions & 0 deletions .chloggen/crash-semconv.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# Use this changelog template to create an entry for release notes.
#
# If your change doesn't affect end users you should instead start
# your pull request title with [chore] or use the "Skip Changelog" label.

# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix'
change_type: enhancement

# The name of the area of concern in the attributes-registry, (e.g. http, cloud, db)
component: device

# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`).
note: Adding semantic convention for mobile app crash event under `device.crash`

# Mandatory: One or more tracking issues related to the change. You can use the PR number here if no issue exists.
# The values here must be integers.
issues: [1576]

# (Optional) One or more lines of additional information to render under the primary note.
# These lines will be padded with 2 spaces and then inserted directly into the document.
# Use pipe (|) for multiline entries.
subtext:
132 changes: 130 additions & 2 deletions model/device/events.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ groups:
and from which the `OS terminology` column values are derived.
brief: >
This attribute represents the state the application has transitioned into at the occurrence of the event.
examples: ["active"]
examples: [ "active" ]
type: enum
members:
- id: active
Expand Down Expand Up @@ -64,7 +64,7 @@ groups:
note: >
The Android lifecycle states are defined in [Activity lifecycle callbacks](https://developer.android.com/guide/components/activities/activity-lifecycle#lc),
and from which the `OS identifiers` are derived.
examples: ["created"]
examples: [ "created" ]
type: enum
members:
- id: created
Expand All @@ -82,3 +82,131 @@ groups:
brief: >
Any time after Activity.onResume() or, if the app has no Activity,
Context.startService() has been called when the app was in either the created or background states.

- id: event.device.crash
stability: experimental
type: event
name: device.crash
brief: >
A crash event represents the termination of an application instance due to an unhandled error or exception. It can be detected and
recorded as it is happening (e.g. through an UncaughtExceptionHandler), or after the fact, when a tombstone is detected
containing information about a previously terminated app instance that was caused by an unhandled error or exception.
note: >
The body fields of this event contain data and metadata about the crash tht can be used to classify and aggregate it with similar
crashes on other devices. The crash event may not contain the entirety of the data necessary for it to be properly aggregated
because some of it are not available on the crashing device. In those cases, the data contained in the body fields of the
event SHOULD provide enough specificity for the rest to be looked up (e.g. the ID for a proguard file uploaded at build time).
The resource attributes, event attributes, and body fields should in totality contain enough information for reasonable deduplication
to take place so the same crash instance isn't counted twice even the same data causes the emission of more than once event
(e.g. device ID + process ID).

This event is meant to be used in conjunction with `os.name` [resource semantic convention](/docs/resource/os.md) to identify the
mobile operating system (e.g. Android, iOS, etc.) on which the crash occurred, which could be useful to determine how the data
in the event can be interpreted.

The event body fields MUST be used to describe the state of the application at the time of the crash, not when the event was actually
emitted, which could happen at a much later time (e.g. when the app next starts up).
body:
id: device_crash_state

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: is it needed to wrap all the body fields within this map? Since it seems like this map is the only field at the root level of the body, I was wondering if we could instead define all of its children as independent fields.

type: map
requirement_level: required
fields:
- id: id
stability: experimental
requirement_level: required
brief: >
An ID that uniquely identifies the crash instance obtained from a specific `source`.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would this be generated? From say, an Android device crash?

examples: [ "0d48510589c0426b43f01a5fa060a333" ]
type: string
- id: source
stability: experimental
requirement_level: recommended
brief: >
A value from a fixed set of values that uniquely identifies source of the crash data that determines what the `data` field contains.
note: >
This field, combined with `source_version`, will uniquely identify the structure of the `data` field.
examples: [ "jvm_exception" ]
type: enum
members:
- id: jvm_exception
value: 'jvm_exception'
brief: >
Throwable in the JVM layer, usually caught by an UncaughtExceptionHandler.
- id: sig_handler
value: 'sig_handler'
brief: >
Crash in the native layer caught by a signal handler
- id: aei
value: 'aei'
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m curious if the full name (application_exit_info) might be a better approach?

brief: >
[Application Exit Info](https://developer.android.com/reference/android/app/ApplicationExitInfo) written by Android after a process death
- id: source_version
stability: experimental
requirement_level: recommended
brief: >
Supplements the `source` field that identifies the specific variation of it [2].
note: >
This version is specifically for the `source` field. It can be a well-defined version of some external format (e.g. Android 15
Application Exit Info), or some custom version number associated with the usage in this event (e.g. some custom JSON schema).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on this note it seems like the source_version data might come from different places depending on the case. Maybe there's something I'm missing, though since we're defining source as an enum, I think it would be better (if possible) to define the source_version format for each type of source to avoid potential ambiguity issues.

examples: [ "1.0.0" ]
type: string
- id: data
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this better be in the body (earlier payload) of the Event (https://opentelemetry.io/docs/specs/otel/logs/data-model/#field-body)? This would also implicate the type any rather than string.

stability: experimental
requirement_level: recommended
brief: >
A blob field containing the details of the crash that is obtained from the `source`. Combined with `data_content_type`, the
data in this field SHOULD be parseable.
note: >
This is considered a blob because it is not expected to be programmatically understandable without additional information
not represented in the semantic conventions. The value of this blob is typically obfuscated or contains obfuscated values,
encoded a binary format, and not useful unless paired with data that is not available on-device. As such, it's best for it to
be transmitted as a blob to be further processed at the Collector level.
type: string
examples: [ "{
\"exceptions\": [
{
\"type\": \"a.b.c\",
\"message\": \"An error has occurred\",
\"stacktrace\": \"a.b.c: An error has occurred\nat a.b.d.e.p(unknown source)\nat a.b.d.e.g(unknown source)\nat a.b.d.e.z(unknown source)\nat a.b.d.y.r(unknown source)\"
}
],
\"threads\": [
{
\"id\": 74,
\"state\": \"RUNNABLE\",
\"name\": \"main\",
\"callstack\": [
\"x.y.z\",
\"x.y.aa\"
]
}
],
\"proguard_file_id\": \"<UUID>\"
}" ]
- id: data_content_type
stability: experimental
requirement_level: recommended
brief: >
The format of the `data` field as defined by [RFC 2046](https://datatracker.ietf.org/doc/html/rfc2046).
note: >
This, combined with a priori knowledge of the structure of the blob, will allow Collectors to parse and process the `data` field.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to check this with an example just to make sure I got it properly.

Considering an event with source = jvm_exception. Let's say that this data_content_type field has application/json as its value. The collector in this case will be able to parse it as json, however, how can it know what fields to look for in that json? My question is mostly about ensuring that the UI can later display this data properly.

Copy link

@jzwc jzwc Nov 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current formats for crash report bodies seem insufficiently defined, particularly for crashes originating from various sources or those captured by different tools. Leaving the data opaque while concentrating on defining the common metadata in the other attributes strikes the right balance IMHO.

In many instances, the crash report cannot be effectively presented without backend post-processing and supplementary data from the application provider, which falls outside the scope of OTel.

examples: [ "application/json" ]
type: string
- id: crashed_service_version
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the intention to emphasize, but the crashed_ prefix appears unnecessary in this context, as the attribute is clearly defined within device.crash, isn't it? Ditto crashed_os_version.

stability: experimental
requirement_level: recommended
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The note reads "This is required"...

brief: >
The version of the app when the crash happened, which may be different than the `service.version` resource attribute.
note: >
This is required so crashes can be aggregated by the version in which it occurred, not the one that emitted the event.
examples: [ "7.5.0" ]
type: string
- id: crashed_os_version
stability: experimental
requirement_level: recommended
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The note reads "This is required"...

brief: >
The version of the OS when the crash happened, which may be different than the `os.version` resource attribute.
note: >
This is required so crashes can be aggregated by the version of the OS on which it occurred, not the one that emitted the event.
examples: [ "15.0" ]
type: string
Loading