Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rfc(decision): Revamping the SDK Performance API #101

Merged
merged 13 commits into from
Aug 2, 2023

Conversation

AbhiPrasad
Copy link
Member

@AbhiPrasad AbhiPrasad commented Jun 12, 2023

Rendered RFC

closes #28

This RFC proposes to revamp the performance API in the SDKs. The new API aims to accomplish the following:

  • Align both the internal schemas and top level public API with OpenTelemetry and their SDKs.
  • De-emphasize the concept of transactions from users using a Sentry performance monitoring SDK.
  • Optimize for making sure performance data is always sent to Sentry.
  • Open the SDK up for future work where we support batch span ingestion instead of relying on a transaction solely.

The RFC proposal comes in two steps.

  1. Introduce three new top level methods, Sentry.startActiveSpan and Sentry.startSpan and their language specific variants as well as Sentry.setMeasurement.

  2. Introduce a new span schema that is aligned with OpenTelemetry. Out of scope for this RFC, will be addressed later.

Comment on lines 139 to 147
1. Get the active span from the current scope
2. If the active span is defined, create a child span of that active span based on the provided `spanContext`, otherwise create a transaction based on the provided `spanContext`.
3. Run the provided callback
4. Finish the child span/transaction created in step 2
5. Remove the child span/transaction from the current scope and if it exists, set the previous active span as the active span in the current scope.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if this solves (or not) the use case that you have an active span (span A), and spawn multiple threads to download something, and each of those threads will start a new span (span B and C), but then the relationship of the spans are messed up because they "race each other", for example, span B and C should be a child of span A, but what I see is that span C is a child of span B.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the caveat here is that the hub/scope needs to be properly forked and correctly follow the execution context.

I guess in single scope instances, this logic breaks down re: parent-child relationships. Let me amend the RFC to reference this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, Mobile is mostly since scope instance.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defining a root span for mobile/single-scope SDKs sounds good. It doesn't solve the parent-child relationships, but keeps the behaviour the same as with Transactions and Span.


| Span Field | Description | Type | Notes |
| ----------------- | ------------------------------------------------------------ | --------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `name` | The name of the span | String | Should be low cardinality. Replacing span `description` |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe is important to keep description, how would I set the SQL statement of a DB query if it has to be low cardinality? One can say you can do it under the data field, so the name then would be similar to the former operation? some operations just don't have a unique name, is the grouping of transactions still per name?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the OpenTelemetry conventions, it's recommended db spans be named <db.operation> <db.name>.<db.sql.table> - like SELECT bestDb.orders, but the only thing they require is that something is low cardinality, so the db query also works here. In most cases, we can just add whatever we were setting in span.description to span.name (as long as it's low cardinality).

Grouping of transactions/spans would still be by name, but there is more freedom about what that should be. The product instead relies on span attributes to make this work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, DB queries are high cardinality though, at least for bigger databases.
Most of the time interceptors/hooks just give you a String, so to build SELECT bestDb.orders requires parsing the String on the client which is also not ideal.
Alright, this is more an implementation detail anyway, thanks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm OK removing op and adding a new low cardinality format of OpenTelemetry as you described @AbhiPrasad, but I think the description we currently have for db queries, HTTP requests, and such is highly useful. Where would that end up now?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@philipphofmann the span attributes, aka span.data


Having both the `name` and `op` fields is redundant, but we choose to keep both for backwards compatibility. There are many features in the product that are powered by having a meaningful operation, more details about this can be found in the documentation around [Span operations](https://develop.sentry.dev/sdk/performance/span-operations/). In the future, we can choose to deprecate `op` and remove it from the schema.

The most notable change here is to formally introduce the `attributes` field, and remove the `span.data` field. This is a breaking change, but worth it in the long-term. If we start accepting `attributes` on transactions as well, we more closely align with the OpenTelemetry schema, and can use the same conventions for both spans and transactions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we keep the data that is gonna be deprecated and forwarded to attributes for compatibility?
eg
span.setData(x, y) just calls span.setAttribute(x, y) internally.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup this is a good idea! I will also put more details into migrating between the different span schemas.

Copy link
Member

@philipphofmann philipphofmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent that you try to address some critical issues of our performance API @AbhiPrasad 👏. I gave this a first quick pass, but I'll have to take some more time after my vacation next week and think about how the new API will play out on mobile.


This allows us to de-emphasize the concept of hubs, scopes and transactions from users, and instead have them think about just spans. Under the hood, `Sentry.startActiveSpan` and `Sentry.startSpan` should create transactions/spans as appropriate. `Sentry.setMeasurement` is used to abstract away `transaction.setMeasurement` and similar.

1. Introduce a new span schema that is aligned with OpenTelemetry.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that should be step 2.

Suggested change
1. Introduce a new span schema that is aligned with OpenTelemetry.
2. Introduce a new span schema that is aligned with OpenTelemetry.


We change the data model that is referenced and used internally inside the SDK to better reflect OpenTelemetry. This involves adding shims for backwards compatibility and removing redundant fields.

# Background
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice summary of the biggest problems 👍

1. Newly created spans must have the correct trace and parent/child relationship
2. Users shouldn’t be burdened with knowing if something is a span/transaction
3. Spans only need a name to identify themselves, everything else is optional.
4. The new top level APIs should be as similar to the OpenTelemetry SDK public API as possible.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this already an official decision that Sentry decided to align more with OpenTelemetry? If yes, please link the decision or elaborate somewhere in the document on why we want to achieve this, as I think it might not be 100% clear to everyone.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be interested in that as well, what are the pro x cons of moving closer to opentelemetry rather than our own model?
I know a few of them but it'd be nice to be written down.

Comment on lines +132 to +138
// declare function startActiveSpanAsync<T>(
// spanContext: SpanContext,
// callback: (span: Span) => Promise<T>,
// ): Promise<T>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that should not be commented out

Suggested change
// declare function startActiveSpanAsync<T>(
// spanContext: SpanContext,
// callback: (span: Span) => Promise<T>,
// ): Promise<T>;
declare function startActiveSpanAsync<T>(
spanContext: SpanContext,
callback: (span: Span) => Promise<T>,
): Promise<T>;

3. Spans only need a name to identify themselves, everything else is optional.
4. The new top level APIs should be as similar to the OpenTelemetry SDK public API as possible.

There are two top level methods we'll be introducing to achieve this: `Sentry.startActiveSpan` and `Sentry.startSpan`. `Sentry.startActiveSpan` will take a callback and start/stop a span automatically. In addition, it'll also set the span on the current scope. Under the hood, the SDK will create a transaction or span based on if there is already an existing span on the scope.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was a bit confused by the difference of startActiveSpan and startSpan when reading it for the first time. Even after reading the explanation, I'm still unhappy with the name. The name states to start a span, but the method takes a lamba and starts and finishes the span. I think the name should reflect that, so it should be something like executeSpan, createSpan, etc.

Furthermore, I'd like to have the flexibility of finishing the span manually and achieving the same behavior you described with startActiveSpan or will you still be able to do startSpan(bindtoScope: true)? If yes, then it's a bit confusing to me to achieve the same behavior with either startActiveSpan plus a lambda and startSpan(bindtoScope: true).

I'm sorry I didn't come up with proposals for solutions yet. That is not an easy problem to solve and I have to take some time to think about it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@philipphofmann If I got it right, startActiveSpan is the equivalent of bindtoScope: true, and startSpan the equivalent of bindtoScope: false, hence the active naming, because it is the one active in the scope.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Active for active in the scope in the scope make sense, but since we try to shield users from the internal scope and hub.

startActiveSpan looks to me as active for the automatic finish.
startSpan as a manual variant of the startAtiveSpan


To simplify how performance data is consumed and understood, we are proposing a new span schema that the SDKs send to Sentry. The new span schema aims to be a superset of the [OpenTelemetry span schema](https://github.com/open-telemetry/opentelemetry-proto/blob/4967b51b5cb29f725978362b9d413bae1dbb641c/opentelemetry/proto/trace/v1/trace.proto) and have a minimal top level API surface. This also means that spans can be easily converted to OpenTelemetry spans and vice versa.

The new span schema is as follows:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No tags for spans?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Today spans have tags, but they are not indexed, so they can just be span attributes.
But I'd love to see indexed fields from spans as well, I believe this is important.


| Span Field | Description | Type | Notes |
| ----------------- | ------------------------------------------------------------ | --------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `name` | The name of the span | String | Should be low cardinality. Replacing span `description` |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm OK removing op and adding a new low cardinality format of OpenTelemetry as you described @AbhiPrasad, but I think the description we currently have for db queries, HTTP requests, and such is highly useful. Where would that end up now?


To simplify how performance data is consumed and understood, we are proposing a new span schema that the SDKs send to Sentry. The new span schema aims to be a superset of the [OpenTelemetry span schema](https://github.com/open-telemetry/opentelemetry-proto/blob/4967b51b5cb29f725978362b9d413bae1dbb641c/opentelemetry/proto/trace/v1/trace.proto) and have a minimal top level API surface. This also means that spans can be easily converted to OpenTelemetry spans and vice versa.

The new span schema is as follows:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also keep origin https://develop.sentry.dev/sdk/performance/trace-origin/? It's handy to know what precisely created a span.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

origin can also be part of the span attributes if needed, but good point.


# Background

Right now every SDK has both the concept of transactions and spans - and to a user they both exist as vehicles of performance data. In addition, the transaction exists as the carrier of distributed tracing information ([dynamic sampling context](https://develop.sentry.dev/sdk/performance/dynamic-sampling-context/) and [sentry-trace info](https://develop.sentry.dev/sdk/performance/#header-sentry-trace)), although this is going to change with the advent of tracing without performance support in the SDKs.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the plan with transaction-specific information that doesn't belong to dynamic sampling or tracing? Is it going to be duplicated in every span? Tags, for example.

Comment on lines +250 to +278
| `attributes` | A set of attributes on the span. | Object | This maps to `span.data` in the current schema for spans. There is no existing mapping for this in the current transaction schema. The keys of attributes are well known values, and defined by a combination of OpenTelemtry's and Sentry's semantic conventions. |
| `measurements` | Measurements which holds observed values such as web vitals. | Object | |

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a good time to structure the data further and define what Object looks like for each field? The more we structure the data, the better product experience we can provide on the server side.


1. Introduce three new top level methods, `Sentry.startActiveSpan` and `Sentry.startSpan` and their language specific variants as well as `Sentry.setMeasurement`.

This allows us to de-emphasize the concept of hubs, scopes and transactions from users, and instead have them think about just spans. Under the hood, `Sentry.startActiveSpan` and `Sentry.startSpan` should create transactions/spans as appropriate. `Sentry.setMeasurement` is used to abstract away `transaction.setMeasurement` and similar.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sentry.startActiveSpan should be the default, hence called Sentry.startSpan.
Introducing more top-level functions (even undocumented) increases the API surface unnecessarily and adds more confusion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we turn it around - instead of having startActiveSpan we just have startSpan that does by default what startActiveSpan does (meaning putting it on the scope) - I suppose this is what people would expect 99% of the time. We could add an argument that prevents setting it on the scope if someone sets it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what we do on Mobile, startTransaction with a bindToScope param, but it's bindToScope=false by default.

If this flag is enabled by default, because of single Hub/Scope on Mobile, might be overwriting each other transactions as a side-effect, not sure if it's a good idea, at least on the Mobile world, maybe keeping bindToScope=false by default makes more sense in this case.

The other option is that if it's enabled by default, but there's a living transaction in the scope, you just create a span, but maybe this is not the desired behavior and we'd need a way to opt-out, in this case, people would need to force bindToScope=false.

Ergonomics isn't great again because they will not know if there's a living transaction in the scope.


`startActiveSpan` only provides the correct parent-child relationship if your platform has proper support for forking scopes. For platforms have a single hub/scope (like the mobile SDKs), this method will not lead to the correct parent-child relationship. The SDK will have to provide a different method for these platforms. One option is for `startActiveSpan` to always attach the span it creates to the root span (the transaction), which means users don't get exact parent-child relationships, but they do get relative relationships between spans using relative durations.

`Sentry.startSpan` will create a span, but not set it as the active span in the current scope.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any use-cases come to mind why this function is necessary?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mobile has single Hub/Scope and this is needed in case you want your own transaction instead of the auto-instrumented ones, measuring something side by side.
For example, a auto-instrumented transaction during screen loading, but also a service doing the X job, that has nothing to do with the screen loading transaction.


The most notable change here is to formally introduce the `attributes` field, and remove the `span.data` field. This is a breaking change, but worth it in the long-term. If we start accepting `attributes` on transactions as well, we more closely align with the OpenTelemetry schema, and can use the same conventions for both spans and transactions.

### Shimming the old schema
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are we going to do with the extra stuff?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@HazAT HazAT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

@AbhiPrasad AbhiPrasad force-pushed the rfc/revamping-the-sdk-performance-api branch from b760c5d to 136d5d8 Compare August 2, 2023 02:10
@AbhiPrasad AbhiPrasad force-pushed the rfc/revamping-the-sdk-performance-api branch from 136d5d8 to db96510 Compare August 2, 2023 02:26
@AbhiPrasad
Copy link
Member Author

After a lot of deliberation finally pushed updates to this RFC.

We've moved the span schema changes out of scope of this RFC as this requires further discussion in ingest/product. As a result the RFC now only focuses on introducing the new top level methods for the performance API.

The RFC has also been amended to give more freedom to SDK authors to decide what methods to best introduce, although the recommendation is to use startActiveSpan and startSpan. This was done because certain languages/runtimes might have more idiomatic patterns for span creation, and we should prefer that over forcing a single unified API everywhere. We see this in OpenTelemetry where each otel SDK has it's own set of span creation methods.

// golang
ctx, span := tracer.Start
// .NET
using var myActivity = MyActivitySource.StartActivity
# ruby
MyAppTracer.in_span("do_work") do |span|
  # do some work that the 'do_work' span tracks!
end
# Java
Span childSpan = tracer.spanBuilder("child")
    .setParent(Context.current().with(parentSpan))
    .startSpan();

@AbhiPrasad AbhiPrasad merged commit 3d7e6c6 into main Aug 2, 2023
1 check passed
@AbhiPrasad AbhiPrasad deleted the rfc/revamping-the-sdk-performance-api branch August 2, 2023 02:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve DX on Sentry.trace [DECISION] Add RFC for top level span creation method
7 participants