-
-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(eap): use eap_items in timeseries endpoint #6934
Conversation
b409c28
to
955fb24
Compare
❌ 1 Tests Failed:
View the full list of 1 ❄️ flaky tests
To view more test analytics, go to the Test Analytics Dashboard |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a PR description that describes what the PR is doing and the major changes.
@@ -38,6 +40,160 @@ | |||
"sentry.end_timestamp", | |||
} | |||
|
|||
COLLUMN_PREFIX: str = "sentry." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mispelled
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job!
Only real change I'm requesting is in PROTO_TYPE_TO_ATTRIBUTE_COLUMN and a question about the virtual columns.
everything else looks good and pretty clean
"sentry.organization_id": [AttributeKey.Type.TYPE_INT], | ||
"sentry.project_id": [AttributeKey.Type.TYPE_INT], | ||
"sentry.timestamp": [ | ||
AttributeKey.Type.TYPE_FLOAT, | ||
AttributeKey.Type.TYPE_DOUBLE, | ||
AttributeKey.Type.TYPE_INT, | ||
AttributeKey.Type.TYPE_STRING, | ||
], | ||
"sentry.trace_id": [ | ||
AttributeKey.Type.TYPE_STRING | ||
], # this gets converted from a uuid to a string in a storage processor | ||
"sentry.item_id": [AttributeKey.Type.TYPE_STRING], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why define a prefix if you won't use it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's being used in the attribute_key_to_expression_eap_items
function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant that it's used here too but not explicitly i.e.
/s/"sentry.organization_id"/f"{PREFIX}organization_id"
just a nit
} | ||
|
||
PROTO_TYPE_TO_ATTRIBUTE_COLUMN: Final[Mapping[AttributeKey.Type.ValueType, str]] = { | ||
AttributeKey.Type.TYPE_INT: "attributes_int", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use floats for TYPE_INT and TYPE_BOOLEAN (cast them in the query)
the attributes_int, attributes_bool are there to reconstruct the original event but we double write them to the float column for aggregations since those are bucketed
if expression.column.column_name != "attributes_string": | ||
return expression |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice
@@ -240,25 +267,37 @@ def _proto_expression_to_ast_expression(expr: ProtoExpression) -> Expression: | |||
|
|||
def _build_query(request: TimeSeriesRequest) -> Query: | |||
# TODO: This is hardcoded still |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can remove this comment I think
@@ -303,13 +304,14 @@ def _get_count_column_alias( | |||
|
|||
def get_count_column( | |||
aggregation: AttributeAggregation | AttributeConditionalAggregation, | |||
attribute_key_to_expression: Callable[[Any], Expression], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: is the typing for this right? It's not really [Any] -> Expression
is it?
spans_storage = get_storage(StorageKey("eap_spans")) | ||
items_storage = get_storage(StorageKey("eap_items")) | ||
start = BASE_TIME | ||
messages = [gen_message(start - timedelta(minutes=i)) for i in range(120)] | ||
write_raw_unprocessed_events(spans_storage, messages) # type: ignore | ||
write_raw_unprocessed_events(items_storage, messages) # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would extract the double writing to its own function and change that across the board rather than writing the double write everywhere
) | ||
|
||
|
||
def apply_virtual_columns_eap_items( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where does this function get called?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it doesn't get called yet but when I update the trace item table endpoint, it will get used
Oh one more thing so we can compare performance: Can you add something that lets us force using the eap_spans table from the client side so we can compare performance? Maybe a special prefix in the referrer or something |
@@ -77,6 +77,9 @@ | |||
|
|||
|
|||
def use_eap_items_table(request_meta: RequestMeta) -> bool: | |||
if request_meta.referrer.startswith("force_use_eap_items_table"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to force using the OLD table
think about this: I run a timeseries query for some ingested timeframe, if use_eap_items_table
is turned on, there's no way to for me to compare that query against the old table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, I thought you wanted to test it before turning it on
* master: feat(eap): use eap_items in timeseries endpoint (#6934) chore(tests): Clean up which sentry tests are run on snuba PRs (#6938) Revert "bug: cant use group by with formulas (#6933)" bug: cant use group by with formulas (#6933) fix(test): run more relevant sentry CI tests (#6937) fix(ourlogs): Logs trace item table should support offset (#6936) Revert "fix(eap): Remove column processor preventing the use of an index (#6932)" fix(eap): Remove column processor preventing the use of an index (#6932) feat(eap): Sampled storage materialized views (#6930) feat(capman): Instead of flatly rejecting heavy projects, limit their bytes scanned on the query level (#6929) feat(ourlogs): Add a trace item details endpoint (#6921) Run rustup toolchain install explicitly (#6928) fix(ci): Auth protoc to avoid rate limits (#6927) fix(eap): Set a max execution time for queries (#6925) feat(smart-autocomplete): add migration for smart autocomplete view on eap_items (#6912)
This PR updates the timeseries endpoint for the eap_spans resolver to use the new eap_items table when the following conditions are met:
use_eap_items_table
runtime config is set to1
use_eap_items_table_start_timestamp_seconds
(measured in UTC seconds)The tests have also been updated to been run once on the eap_spans table and once on the eap_items table.