-
-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: str_attr and num_attr materialized views for new eap_items table #6907
base: master
Are you sure you want to change the base?
Conversation
This PR has a migration; here is the generated SQL for -- start migrations
-- forward migration events_analytics_platform : 0033_items_attribute_table_v1
Local op: CREATE TABLE IF NOT EXISTS items_attrs_1_local (organization_id UInt64, project_id UInt64, item_type UInt8, attr_key String CODEC (ZSTD(1)), attr_type LowCardinality(String), timestamp DateTime CODEC (DoubleDelta, ZSTD(1)), retention_days UInt16, attr_value String CODEC (ZSTD(1))) ENGINE ReplicatedReplacingMergeTree('/clickhouse/tables/events_analytics_platform/{shard}/default/items_attrs_1_local', '{replica}') PRIMARY KEY (organization_id, project_id, timestamp, item_type, attr_key) ORDER BY (organization_id, project_id, timestamp, item_type, attr_key, attr_type, attr_value, retention_days) PARTITION BY (retention_days, toMonday(timestamp)) TTL timestamp + toIntervalDay(retention_days) SETTINGS index_granularity=8192;
Distributed op: CREATE TABLE IF NOT EXISTS items_attrs_1_dist (organization_id UInt64, project_id UInt64, item_type UInt8, attr_key String CODEC (ZSTD(1)), attr_type LowCardinality(String), timestamp DateTime CODEC (DoubleDelta, ZSTD(1)), retention_days UInt16, attr_value String CODEC (ZSTD(1))) ENGINE Distributed(`cluster_one_sh`, default, items_attrs_1_local);
Local op: CREATE MATERIALIZED VIEW IF NOT EXISTS items_attrs_1_mv TO items_attrs_1_local (organization_id UInt64, project_id UInt64, item_type UInt8, attr_key String CODEC (ZSTD(1)), attr_type LowCardinality(String), timestamp DateTime CODEC (DoubleDelta, ZSTD(1)), retention_days UInt16, attr_value String CODEC (ZSTD(1))) AS
SELECT
organization_id,
project_id,
item_type,
attrs.1 as attr_key,
attrs.2 as attr_value,
attrs.3 as attr_type,
toStartOfWeek(timestamp) AS timestamp,
retention_days,
FROM eap_items_1_local
LEFT ARRAY JOIN
arrayConcat(
arrayMap(x -> tuple(x.1, x.2, 'string'), CAST(attributes_string_0, 'Array(Tuple(String, String))')), arrayMap(x -> tuple(x.1, x.2, 'string'), CAST(attributes_string_1, 'Array(Tuple(String, String))')), arrayMap(x -> tuple(x.1, x.2, 'string'), CAST(attributes_string_2, 'Array(Tuple(String, String))')), arrayMap(x -> tuple(x.1, x.2, 'string'), CAST(attributes_string_3, 'Array(Tuple(String, String))')), arrayMap(x -> tuple(x.1, x.2, 'string'), CAST(attributes_string_4, 'Array(Tuple(String, String))')), arrayMap(x -> tuple(x.1, x.2, 'string'), CAST(attributes_string_5, 'Array(Tuple(String, String))')), arrayMap(x -> tuple(x.1, x.2, 'string'), CAST(attributes_string_6, 'Array(Tuple(String, String))')), arrayMap(x -> tuple(x.1, x.2, 'string'), CAST(attributes_string_7, 'Array(Tuple(String, String))')), arrayMap(x -> tuple(x.1, x.2, 'string'), CAST(attributes_string_8, 'Array(Tuple(String, String))')), arrayMap(x -> tuple(x.1, x.2, 'string'), CAST(attributes_string_9, 'Array(Tuple(String, String))')), arrayMap(x -> tuple(x.1, x.2, 'string'), CAST(attributes_string_10, 'Array(Tuple(String, String))')), arrayMap(x -> tuple(x.1, x.2, 'string'), CAST(attributes_string_11, 'Array(Tuple(String, String))')), arrayMap(x -> tuple(x.1, x.2, 'string'), CAST(attributes_string_12, 'Array(Tuple(String, String))')), arrayMap(x -> tuple(x.1, x.2, 'string'), CAST(attributes_string_13, 'Array(Tuple(String, String))')), arrayMap(x -> tuple(x.1, x.2, 'string'), CAST(attributes_string_14, 'Array(Tuple(String, String))')), arrayMap(x -> tuple(x.1, x.2, 'string'), CAST(attributes_string_15, 'Array(Tuple(String, String))')), arrayMap(x -> tuple(x.1, x.2, 'string'), CAST(attributes_string_16, 'Array(Tuple(String, String))')), arrayMap(x -> tuple(x.1, x.2, 'string'), CAST(attributes_string_17, 'Array(Tuple(String, String))')), arrayMap(x -> tuple(x.1, x.2, 'string'), CAST(attributes_string_18, 'Array(Tuple(String, String))')), arrayMap(x -> tuple(x.1, x.2, 'string'), CAST(attributes_string_19, 'Array(Tuple(String, String))')), arrayMap(x -> tuple(x.1, x.2, 'string'), CAST(attributes_string_20, 'Array(Tuple(String, String))')), arrayMap(x -> tuple(x.1, x.2, 'string'), CAST(attributes_string_21, 'Array(Tuple(String, String))')), arrayMap(x -> tuple(x.1, x.2, 'string'), CAST(attributes_string_22, 'Array(Tuple(String, String))')), arrayMap(x -> tuple(x.1, x.2, 'string'), CAST(attributes_string_23, 'Array(Tuple(String, String))')), arrayMap(x -> tuple(x.1, x.2, 'string'), CAST(attributes_string_24, 'Array(Tuple(String, String))')), arrayMap(x -> tuple(x.1, x.2, 'string'), CAST(attributes_string_25, 'Array(Tuple(String, String))')), arrayMap(x -> tuple(x.1, x.2, 'string'), CAST(attributes_string_26, 'Array(Tuple(String, String))')), arrayMap(x -> tuple(x.1, x.2, 'string'), CAST(attributes_string_27, 'Array(Tuple(String, String))')), arrayMap(x -> tuple(x.1, x.2, 'string'), CAST(attributes_string_28, 'Array(Tuple(String, String))')), arrayMap(x -> tuple(x.1, x.2, 'string'), CAST(attributes_string_29, 'Array(Tuple(String, String))')), arrayMap(x -> tuple(x.1, x.2, 'string'), CAST(attributes_string_30, 'Array(Tuple(String, String))')), arrayMap(x -> tuple(x.1, x.2, 'string'), CAST(attributes_string_31, 'Array(Tuple(String, String))')), arrayMap(x -> tuple(x.1, x.2, 'string'), CAST(attributes_string_32, 'Array(Tuple(String, String))')), arrayMap(x -> tuple(x.1, x.2, 'string'), CAST(attributes_string_33, 'Array(Tuple(String, String))')), arrayMap(x -> tuple(x.1, x.2, 'string'), CAST(attributes_string_34, 'Array(Tuple(String, String))')), arrayMap(x -> tuple(x.1, x.2, 'string'), CAST(attributes_string_35, 'Array(Tuple(String, String))')), arrayMap(x -> tuple(x.1, x.2, 'string'), CAST(attributes_string_36, 'Array(Tuple(String, String))')), arrayMap(x -> tuple(x.1, x.2, 'string'), CAST(attributes_string_37, 'Array(Tuple(String, String))')), arrayMap(x -> tuple(x.1, x.2, 'string'), CAST(attributes_string_38, 'Array(Tuple(String, String))')), arrayMap(x -> tuple(x.1, x.2, 'string'), CAST(attributes_string_39, 'Array(Tuple(String, String))')),
arrayMap(x -> tuple(x, '', 'float'), mapKeys(attributes_float_0)),arrayMap(x -> tuple(x, '', 'float'), mapKeys(attributes_float_1)),arrayMap(x -> tuple(x, '', 'float'), mapKeys(attributes_float_2)),arrayMap(x -> tuple(x, '', 'float'), mapKeys(attributes_float_3)),arrayMap(x -> tuple(x, '', 'float'), mapKeys(attributes_float_4)),arrayMap(x -> tuple(x, '', 'float'), mapKeys(attributes_float_5)),arrayMap(x -> tuple(x, '', 'float'), mapKeys(attributes_float_6)),arrayMap(x -> tuple(x, '', 'float'), mapKeys(attributes_float_7)),arrayMap(x -> tuple(x, '', 'float'), mapKeys(attributes_float_8)),arrayMap(x -> tuple(x, '', 'float'), mapKeys(attributes_float_9)),arrayMap(x -> tuple(x, '', 'float'), mapKeys(attributes_float_10)),arrayMap(x -> tuple(x, '', 'float'), mapKeys(attributes_float_11)),arrayMap(x -> tuple(x, '', 'float'), mapKeys(attributes_float_12)),arrayMap(x -> tuple(x, '', 'float'), mapKeys(attributes_float_13)),arrayMap(x -> tuple(x, '', 'float'), mapKeys(attributes_float_14)),arrayMap(x -> tuple(x, '', 'float'), mapKeys(attributes_float_15)),arrayMap(x -> tuple(x, '', 'float'), mapKeys(attributes_float_16)),arrayMap(x -> tuple(x, '', 'float'), mapKeys(attributes_float_17)),arrayMap(x -> tuple(x, '', 'float'), mapKeys(attributes_float_18)),arrayMap(x -> tuple(x, '', 'float'), mapKeys(attributes_float_19)),arrayMap(x -> tuple(x, '', 'float'), mapKeys(attributes_float_20)),arrayMap(x -> tuple(x, '', 'float'), mapKeys(attributes_float_21)),arrayMap(x -> tuple(x, '', 'float'), mapKeys(attributes_float_22)),arrayMap(x -> tuple(x, '', 'float'), mapKeys(attributes_float_23)),arrayMap(x -> tuple(x, '', 'float'), mapKeys(attributes_float_24)),arrayMap(x -> tuple(x, '', 'float'), mapKeys(attributes_float_25)),arrayMap(x -> tuple(x, '', 'float'), mapKeys(attributes_float_26)),arrayMap(x -> tuple(x, '', 'float'), mapKeys(attributes_float_27)),arrayMap(x -> tuple(x, '', 'float'), mapKeys(attributes_float_28)),arrayMap(x -> tuple(x, '', 'float'), mapKeys(attributes_float_29)),arrayMap(x -> tuple(x, '', 'float'), mapKeys(attributes_float_30)),arrayMap(x -> tuple(x, '', 'float'), mapKeys(attributes_float_31)),arrayMap(x -> tuple(x, '', 'float'), mapKeys(attributes_float_32)),arrayMap(x -> tuple(x, '', 'float'), mapKeys(attributes_float_33)),arrayMap(x -> tuple(x, '', 'float'), mapKeys(attributes_float_34)),arrayMap(x -> tuple(x, '', 'float'), mapKeys(attributes_float_35)),arrayMap(x -> tuple(x, '', 'float'), mapKeys(attributes_float_36)),arrayMap(x -> tuple(x, '', 'float'), mapKeys(attributes_float_37)),arrayMap(x -> tuple(x, '', 'float'), mapKeys(attributes_float_38)),arrayMap(x -> tuple(x, '', 'float'), mapKeys(attributes_float_39))
) AS attrs
;
-- end forward migration events_analytics_platform : 0033_items_attribute_table_v1
-- backward migration events_analytics_platform : 0033_items_attribute_table_v1
Local op: DROP TABLE IF EXISTS items_attrs_1_mv;
Local op: DROP TABLE IF EXISTS items_attrs_1_local;
Distributed op: DROP TABLE IF EXISTS items_attrs_1_dist;
-- end backward migration events_analytics_platform : 0033_items_attribute_table_v1 |
❌ 1 Tests Failed:
View the top 1 failed test(s) by shortest run time
To view more test analytics, go to the Test Analytics Dashboard |
snuba/snuba_migrations/events_analytics_platform/0029_items_attribute_table_v1.py
Outdated
Show resolved
Hide resolved
snuba/snuba_migrations/events_analytics_platform/0029_items_attribute_table_v1.py
Outdated
Show resolved
Hide resolved
LEFT ARRAY JOIN | ||
arrayConcat( | ||
{", ".join(f"CAST(attributes_string_{n}, 'Array(Tuple(String, String))')" for n in range(ITEM_ATTRIBUTE_BUCKETS))} | ||
) AS attrs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
at this part the old mvs also had
array(
tuple('sentry.service', `service`),
tuple('sentry.segment_name', `segment_name`),
tuple('sentry.name', `name`)
)
i removed that since those columns dont exist in eap_items
LEFT ARRAY JOIN | ||
arrayConcat( | ||
{",".join(f"CAST(attributes_float_{n}, 'Array(Tuple(String, Float64))')" for n in range(ITEM_ATTRIBUTE_BUCKETS))} | ||
) AS attrs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
at this part the old mvs also had
array(
tuple('sentry.duration_ms', duration_micro / 1000)
)
i removed that since the column doesnt exist anymore
snuba/snuba_migrations/events_analytics_platform/0029_items_attribute_table_v1.py
Outdated
Show resolved
Hide resolved
snuba/snuba_migrations/events_analytics_platform/0029_items_attribute_table_v1.py
Outdated
Show resolved
Hide resolved
engine=table_engines.AggregatingMergeTree( | ||
storage_set=self.storage_set_key, | ||
primary_key="(organization_id, project_id, timestamp, item_type, attr_key)", | ||
order_by="(organization_id, project_id, timestamp, item_type, attr_key, retention_days)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didnt include attr_value
in this order by, but I did include it for str. This is because the num_attrs table has the columns min_value
and max_value
which made me think it was intended to be collapsed into just key
not (key,value)
08033cd
to
1fd1ff7
Compare
- organization_id | ||
- referrer | ||
|
||
query_processors: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there used to be a processor
- processor: UUIDColumnProcessor
args:
columns: [trace_id]
but since theres no trace_id column I thought its safe to remove
columns=self.columns, | ||
destination_table_name=self.local_table, | ||
target=OperationTarget.LOCAL, | ||
query=f""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I verified on my local machine that this select statement works as I expect
operations.CreateTable( | ||
storage_set=self.storage_set_key, | ||
table_name=self.local_table, | ||
engine=table_engines.ReplacingMergeTree( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I chose ReplacingMergeTree to save space by deleting duplicate rows
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just 2 questions, looks good otherwise
Column("organization_id", UInt(64)), | ||
Column("project_id", UInt(64)), | ||
Column("item_type", UInt(8)), | ||
Column("attr_key", String(modifiers=Modifiers(codecs=["ZSTD(1)"]))), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why the compression here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was stolen from how the spans_str_attrs table did it https://github.com/getsentry/snuba/blob/master/snuba/snuba_migrations/events_analytics_platform/0017_span_attribute_table_v3.py#L39
GROUP BY | ||
organization_id, | ||
project_id, | ||
item_type, | ||
attr_key, | ||
attr_value, | ||
attr_type, | ||
timestamp, | ||
retention_days |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the point of the group by here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was originally there in the old MV https://github.com/getsentry/snuba/blob/master/snuba/snuba_migrations/events_analytics_platform/0017_span_attribute_table_v3.py#L122-L127
I think its to remove duplicates so thats why I kept it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The group by is for aggregations (in the aggregating merge tree).
You don't need to group by here. The replacing merge tree will remove duplicates by itself
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah good point lol, I will get rid of this group by. But even when the mv was aggregating merge tree, this select query still didn't have an aggregate in it, so it seems just as wrong before as it is now.
f94b735
to
3411880
Compare
3411880
to
5d75489
Compare
the purpose of this PR is to recreate the existing
spans_num_attrs_3_mv
andspans_str_attrs_3_mv
so they read fromeap_items_1_local
(they used to read fromeap_spans_2_local
). it addresses this ticket https://github.com/getsentry/eap-planning/issues/194.major changes
items_attrs_1
via migrationcount
,min_value
, andmax_value
but we never used themdesign decisions
string
andfloat
attr_type='float'
but this made it so i cant use attr_value in the order by. so now attr_value just gets set to empty string for floats.