Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: str_attr and num_attr materialized views for new eap_items table #6907

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

kylemumma
Copy link
Member

@kylemumma kylemumma commented Feb 21, 2025

the purpose of this PR is to recreate the existing spans_num_attrs_3_mv and spans_str_attrs_3_mv so they read from eap_items_1_local (they used to read from eap_spans_2_local). it addresses this ticket https://github.com/getsentry/eap-planning/issues/194.

major changes

  • create a materialized views items_attrs_1 via migration
  • creates a storage definitions for it
  • no more aggregations in the mv, there used to be count, min_value, and max_value but we never used them

design decisions

  • attr_type indicates whether the attribute is a float or string, I made this a low cardinality string with values string and float
  • originally i made attr_value nullable bc its not present for attr_type='float' but this made it so i cant use attr_value in the order by. so now attr_value just gets set to empty string for floats.
  • I chose ReplacingMergeTree for the MV to save space by deleting duplicate rows

Copy link

github-actions bot commented Feb 21, 2025

This PR has a migration; here is the generated SQL for ./snuba/migrations/groups.py ()

-- start migrations

-- forward migration events_analytics_platform : 0033_items_attribute_table_v1
Local op: CREATE TABLE IF NOT EXISTS items_attrs_1_local (organization_id UInt64, project_id UInt64, item_type UInt8, attr_key String CODEC (ZSTD(1)), attr_type LowCardinality(String), timestamp DateTime CODEC (DoubleDelta, ZSTD(1)), retention_days UInt16, attr_value String CODEC (ZSTD(1))) ENGINE ReplicatedReplacingMergeTree('/clickhouse/tables/events_analytics_platform/{shard}/default/items_attrs_1_local', '{replica}') PRIMARY KEY (organization_id, project_id, timestamp, item_type, attr_key) ORDER BY (organization_id, project_id, timestamp, item_type, attr_key, attr_type, attr_value, retention_days) PARTITION BY (retention_days, toMonday(timestamp)) TTL timestamp + toIntervalDay(retention_days) SETTINGS index_granularity=8192;
Distributed op: CREATE TABLE IF NOT EXISTS items_attrs_1_dist (organization_id UInt64, project_id UInt64, item_type UInt8, attr_key String CODEC (ZSTD(1)), attr_type LowCardinality(String), timestamp DateTime CODEC (DoubleDelta, ZSTD(1)), retention_days UInt16, attr_value String CODEC (ZSTD(1))) ENGINE Distributed(`cluster_one_sh`, default, items_attrs_1_local);
Local op: CREATE MATERIALIZED VIEW IF NOT EXISTS items_attrs_1_mv TO items_attrs_1_local (organization_id UInt64, project_id UInt64, item_type UInt8, attr_key String CODEC (ZSTD(1)), attr_type LowCardinality(String), timestamp DateTime CODEC (DoubleDelta, ZSTD(1)), retention_days UInt16, attr_value String CODEC (ZSTD(1))) AS 
SELECT
    organization_id,
    project_id,
    item_type,
    attrs.1 as attr_key,
    attrs.2 as attr_value,
    attrs.3 as attr_type,
    toStartOfWeek(timestamp) AS timestamp,
    retention_days,
FROM eap_items_1_local
LEFT ARRAY JOIN
    arrayConcat(
        arrayMap(x -> tuple(x.1, x.2, 'string'), CAST(attributes_string_0, 'Array(Tuple(String, String))')), arrayMap(x -> tuple(x.1, x.2, 'string'), CAST(attributes_string_1, 'Array(Tuple(String, String))')), arrayMap(x -> tuple(x.1, x.2, 'string'), CAST(attributes_string_2, 'Array(Tuple(String, String))')), arrayMap(x -> tuple(x.1, x.2, 'string'), CAST(attributes_string_3, 'Array(Tuple(String, String))')), arrayMap(x -> tuple(x.1, x.2, 'string'), CAST(attributes_string_4, 'Array(Tuple(String, String))')), arrayMap(x -> tuple(x.1, x.2, 'string'), CAST(attributes_string_5, 'Array(Tuple(String, String))')), arrayMap(x -> tuple(x.1, x.2, 'string'), CAST(attributes_string_6, 'Array(Tuple(String, String))')), arrayMap(x -> tuple(x.1, x.2, 'string'), CAST(attributes_string_7, 'Array(Tuple(String, String))')), arrayMap(x -> tuple(x.1, x.2, 'string'), CAST(attributes_string_8, 'Array(Tuple(String, String))')), arrayMap(x -> tuple(x.1, x.2, 'string'), CAST(attributes_string_9, 'Array(Tuple(String, String))')), arrayMap(x -> tuple(x.1, x.2, 'string'), CAST(attributes_string_10, 'Array(Tuple(String, String))')), arrayMap(x -> tuple(x.1, x.2, 'string'), CAST(attributes_string_11, 'Array(Tuple(String, String))')), arrayMap(x -> tuple(x.1, x.2, 'string'), CAST(attributes_string_12, 'Array(Tuple(String, String))')), arrayMap(x -> tuple(x.1, x.2, 'string'), CAST(attributes_string_13, 'Array(Tuple(String, String))')), arrayMap(x -> tuple(x.1, x.2, 'string'), CAST(attributes_string_14, 'Array(Tuple(String, String))')), arrayMap(x -> tuple(x.1, x.2, 'string'), CAST(attributes_string_15, 'Array(Tuple(String, String))')), arrayMap(x -> tuple(x.1, x.2, 'string'), CAST(attributes_string_16, 'Array(Tuple(String, String))')), arrayMap(x -> tuple(x.1, x.2, 'string'), CAST(attributes_string_17, 'Array(Tuple(String, String))')), arrayMap(x -> tuple(x.1, x.2, 'string'), CAST(attributes_string_18, 'Array(Tuple(String, String))')), arrayMap(x -> tuple(x.1, x.2, 'string'), CAST(attributes_string_19, 'Array(Tuple(String, String))')), arrayMap(x -> tuple(x.1, x.2, 'string'), CAST(attributes_string_20, 'Array(Tuple(String, String))')), arrayMap(x -> tuple(x.1, x.2, 'string'), CAST(attributes_string_21, 'Array(Tuple(String, String))')), arrayMap(x -> tuple(x.1, x.2, 'string'), CAST(attributes_string_22, 'Array(Tuple(String, String))')), arrayMap(x -> tuple(x.1, x.2, 'string'), CAST(attributes_string_23, 'Array(Tuple(String, String))')), arrayMap(x -> tuple(x.1, x.2, 'string'), CAST(attributes_string_24, 'Array(Tuple(String, String))')), arrayMap(x -> tuple(x.1, x.2, 'string'), CAST(attributes_string_25, 'Array(Tuple(String, String))')), arrayMap(x -> tuple(x.1, x.2, 'string'), CAST(attributes_string_26, 'Array(Tuple(String, String))')), arrayMap(x -> tuple(x.1, x.2, 'string'), CAST(attributes_string_27, 'Array(Tuple(String, String))')), arrayMap(x -> tuple(x.1, x.2, 'string'), CAST(attributes_string_28, 'Array(Tuple(String, String))')), arrayMap(x -> tuple(x.1, x.2, 'string'), CAST(attributes_string_29, 'Array(Tuple(String, String))')), arrayMap(x -> tuple(x.1, x.2, 'string'), CAST(attributes_string_30, 'Array(Tuple(String, String))')), arrayMap(x -> tuple(x.1, x.2, 'string'), CAST(attributes_string_31, 'Array(Tuple(String, String))')), arrayMap(x -> tuple(x.1, x.2, 'string'), CAST(attributes_string_32, 'Array(Tuple(String, String))')), arrayMap(x -> tuple(x.1, x.2, 'string'), CAST(attributes_string_33, 'Array(Tuple(String, String))')), arrayMap(x -> tuple(x.1, x.2, 'string'), CAST(attributes_string_34, 'Array(Tuple(String, String))')), arrayMap(x -> tuple(x.1, x.2, 'string'), CAST(attributes_string_35, 'Array(Tuple(String, String))')), arrayMap(x -> tuple(x.1, x.2, 'string'), CAST(attributes_string_36, 'Array(Tuple(String, String))')), arrayMap(x -> tuple(x.1, x.2, 'string'), CAST(attributes_string_37, 'Array(Tuple(String, String))')), arrayMap(x -> tuple(x.1, x.2, 'string'), CAST(attributes_string_38, 'Array(Tuple(String, String))')), arrayMap(x -> tuple(x.1, x.2, 'string'), CAST(attributes_string_39, 'Array(Tuple(String, String))')),
        arrayMap(x -> tuple(x, '', 'float'), mapKeys(attributes_float_0)),arrayMap(x -> tuple(x, '', 'float'), mapKeys(attributes_float_1)),arrayMap(x -> tuple(x, '', 'float'), mapKeys(attributes_float_2)),arrayMap(x -> tuple(x, '', 'float'), mapKeys(attributes_float_3)),arrayMap(x -> tuple(x, '', 'float'), mapKeys(attributes_float_4)),arrayMap(x -> tuple(x, '', 'float'), mapKeys(attributes_float_5)),arrayMap(x -> tuple(x, '', 'float'), mapKeys(attributes_float_6)),arrayMap(x -> tuple(x, '', 'float'), mapKeys(attributes_float_7)),arrayMap(x -> tuple(x, '', 'float'), mapKeys(attributes_float_8)),arrayMap(x -> tuple(x, '', 'float'), mapKeys(attributes_float_9)),arrayMap(x -> tuple(x, '', 'float'), mapKeys(attributes_float_10)),arrayMap(x -> tuple(x, '', 'float'), mapKeys(attributes_float_11)),arrayMap(x -> tuple(x, '', 'float'), mapKeys(attributes_float_12)),arrayMap(x -> tuple(x, '', 'float'), mapKeys(attributes_float_13)),arrayMap(x -> tuple(x, '', 'float'), mapKeys(attributes_float_14)),arrayMap(x -> tuple(x, '', 'float'), mapKeys(attributes_float_15)),arrayMap(x -> tuple(x, '', 'float'), mapKeys(attributes_float_16)),arrayMap(x -> tuple(x, '', 'float'), mapKeys(attributes_float_17)),arrayMap(x -> tuple(x, '', 'float'), mapKeys(attributes_float_18)),arrayMap(x -> tuple(x, '', 'float'), mapKeys(attributes_float_19)),arrayMap(x -> tuple(x, '', 'float'), mapKeys(attributes_float_20)),arrayMap(x -> tuple(x, '', 'float'), mapKeys(attributes_float_21)),arrayMap(x -> tuple(x, '', 'float'), mapKeys(attributes_float_22)),arrayMap(x -> tuple(x, '', 'float'), mapKeys(attributes_float_23)),arrayMap(x -> tuple(x, '', 'float'), mapKeys(attributes_float_24)),arrayMap(x -> tuple(x, '', 'float'), mapKeys(attributes_float_25)),arrayMap(x -> tuple(x, '', 'float'), mapKeys(attributes_float_26)),arrayMap(x -> tuple(x, '', 'float'), mapKeys(attributes_float_27)),arrayMap(x -> tuple(x, '', 'float'), mapKeys(attributes_float_28)),arrayMap(x -> tuple(x, '', 'float'), mapKeys(attributes_float_29)),arrayMap(x -> tuple(x, '', 'float'), mapKeys(attributes_float_30)),arrayMap(x -> tuple(x, '', 'float'), mapKeys(attributes_float_31)),arrayMap(x -> tuple(x, '', 'float'), mapKeys(attributes_float_32)),arrayMap(x -> tuple(x, '', 'float'), mapKeys(attributes_float_33)),arrayMap(x -> tuple(x, '', 'float'), mapKeys(attributes_float_34)),arrayMap(x -> tuple(x, '', 'float'), mapKeys(attributes_float_35)),arrayMap(x -> tuple(x, '', 'float'), mapKeys(attributes_float_36)),arrayMap(x -> tuple(x, '', 'float'), mapKeys(attributes_float_37)),arrayMap(x -> tuple(x, '', 'float'), mapKeys(attributes_float_38)),arrayMap(x -> tuple(x, '', 'float'), mapKeys(attributes_float_39))
    ) AS attrs
;
-- end forward migration events_analytics_platform : 0033_items_attribute_table_v1




-- backward migration events_analytics_platform : 0033_items_attribute_table_v1
Local op: DROP TABLE IF EXISTS items_attrs_1_mv;
Local op: DROP TABLE IF EXISTS items_attrs_1_local;
Distributed op: DROP TABLE IF EXISTS items_attrs_1_dist;
-- end backward migration events_analytics_platform : 0033_items_attribute_table_v1

Copy link

codecov bot commented Feb 21, 2025

❌ 1 Tests Failed:

Tests completed Failed Passed Skipped
1 1 0 0
View the top 1 failed test(s) by shortest run time
tests.test_api.TestApi::test_count
Stack Traces | 8.38s run time
Traceback (most recent call last):
  File ".../snuba/clickhouse/native.py", line 206, in execute
    result_data = query_execute()
                  ^^^^^^^^^^^^^^^
  File ".../snuba/clickhouse/native.py", line 189, in query_execute
    return conn.execute(  # type: ignore
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../local/lib/python3.11.............../site-packages/clickhouse_driver/client.py", line 382, in execute
    rv = self.process_ordinary_query(
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../local/lib/python3.11.............../site-packages/clickhouse_driver/client.py", line 580, in process_ordinary_query
    return self.receive_result(with_column_types=with_column_types,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../local/lib/python3.11.../sentry_sdk/integrations/clickhouse_driver.py", line 112, in _inner_end
    res = f(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^
  File ".../local/lib/python3.11.............../site-packages/clickhouse_driver/client.py", line 212, in receive_result
    return result.get_result()
           ^^^^^^^^^^^^^^^^^^^
  File ".../local/lib/python3.11.../site-packages/clickhouse_driver/result.py", line 50, in get_result
    for packet in self.packet_generator:
  File ".../local/lib/python3.11.............../site-packages/clickhouse_driver/client.py", line 228, in packet_generator
    packet = self.receive_packet()
             ^^^^^^^^^^^^^^^^^^^^^
  File ".../local/lib/python3.11.............../site-packages/clickhouse_driver/client.py", line 245, in receive_packet
    raise packet.exception
clickhouse_driver.errors.ServerException: Code: 47.
DB::Exception: Missing columns: 'attr_str_19' 'service' 'attr_str_38' 'attr_str_25' 'attr_str_35' 'attr_str_33' 'attr_str_32' '_sort_timestamp' 'attr_str_17' 'attr_str_30' 'attr_str_2' 'attr_str_34' 'attr_str_29' 'attr_str_0' 'attr_str_28' 'attr_str_27' 'segment_name' 'attr_str_39' 'attr_str_26' 'attr_str_37' 'attr_str_36' 'attr_str_24' 'attr_str_23' 'attr_str_16' 'attr_str_21' 'attr_str_22' 'name' 'attr_str_15' 'attr_str_7' 'attr_str_14' 'attr_str_12' 'attr_str_20' 'attr_str_10' 'attr_str_18' 'attr_str_9' 'attr_str_8' 'attr_str_6' 'attr_str_11' 'attr_str_4' 'attr_str_3' 'attr_str_5' 'attr_str_1' 'attr_str_31' 'attr_str_13' while processing query: 'SELECT organization_id, project_id, item_type, attrs.1 AS attr_key, attrs.2 AS attr_value, toStartOfDay(_sort_timestamp) AS timestamp, retention_days, 1 AS count FROM default.eap_items_1_local LEFT ARRAY JOIN arrayConcat(CAST(attr_str_0, 'Array(Tuple(String, String))'), CAST(attr_str_1, 'Array(Tuple(String, String))'), CAST(attr_str_2, 'Array(Tuple(String, String))'), CAST(attr_str_3, 'Array(Tuple(String, String))'), CAST(attr_str_4, 'Array(Tuple(String, String))'), CAST(attr_str_5, 'Array(Tuple(String, String))'), CAST(attr_str_6, 'Array(Tuple(String, String))'), CAST(attr_str_7, 'Array(Tuple(String, String))'), CAST(attr_str_8, 'Array(Tuple(String, String))'), CAST(attr_str_9, 'Array(Tuple(String, String))'), CAST(attr_str_10, 'Array(Tuple(String, String))'), CAST(attr_str_11, 'Array(Tuple(String, String))'), CAST(attr_str_12, 'Array(Tuple(String, String))'), CAST(attr_str_13, 'Array(Tuple(String, String))'), CAST(attr_str_14, 'Array(Tuple(String, String))'), CAST(attr_str_15, 'Array(Tuple(String, String))'), CAST(attr_str_16, 'Array(Tuple(String, String))'), CAST(attr_str_17, 'Array(Tuple(String, String))'), CAST(attr_str_18, 'Array(Tuple(String, String))'), CAST(attr_str_19, 'Array(Tuple(String, String))'), CAST(attr_str_20, 'Array(Tuple(String, String))'), CAST(attr_str_21, 'Array(Tuple(String, String))'), CAST(attr_str_22, 'Array(Tuple(String, String))'), CAST(attr_str_23, 'Array(Tuple(String, String))'), CAST(attr_str_24, 'Array(Tuple(String, String))'), CAST(attr_str_25, 'Array(Tuple(String, String))'), CAST(attr_str_26, 'Array(Tuple(String, String))'), CAST(attr_str_27, 'Array(Tuple(String, String))'), CAST(attr_str_28, 'Array(Tuple(String, String))'), CAST(attr_str_29, 'Array(Tuple(String, String))'), CAST(attr_str_30, 'Array(Tuple(String, String))'), CAST(attr_str_31, 'Array(Tuple(String, String))'), CAST(attr_str_32, 'Array(Tuple(String, String))'), CAST(attr_str_33, 'Array(Tuple(String, String))'), CAST(attr_str_34, 'Array(Tuple(String, String))'), CAST(attr_str_35, 'Array(Tuple(String, String))'), CAST(attr_str_36, 'Array(Tuple(String, String))'), CAST(attr_str_37, 'Array(Tuple(String, String))'), CAST(attr_str_38, 'Array(Tuple(String, String))'), CAST(attr_str_39, 'Array(Tuple(String, String))'), [('sentry.service', service), ('sentry.segment_name', segment_name), ('sentry.name', name)]) AS attrs GROUP BY organization_id, project_id, item_type, attr_key, attr_value, timestamp, retention_days', required columns: 'retention_days' 'organization_id' 'attr_str_13' 'attr_str_31' 'attr_str_1' 'attr_str_5' 'project_id' 'attr_str_3' 'attr_str_4' 'attr_str_11' 'attr_str_6' 'attr_str_8' 'attr_str_9' 'attr_str_18' 'attr_str_10' 'attr_str_20' 'attr_str_12' 'item_type' 'attr_str_14' 'attr_str_7' 'attr_str_15' 'name' 'attr_str_22' 'attr_str_21' 'attr_str_16' 'attr_str_23' 'attr_str_24' 'attr_str_36' 'attr_str_37' 'attr_str_26' 'attr_str_39' 'segment_name' 'attr_str_27' 'attr_str_28' 'attr_str_0' 'attr_str_29' 'attr_str_34' 'attr_str_2' 'attr_str_30' 'attr_str_17' '_sort_timestamp' 'attr_str_32' 'attr_str_33' 'attr_str_35' 'attr_str_25' 'attr_str_38' 'service' 'attr_str_19', maybe you meant: 'retention_days', 'organization_id', 'project_id' or 'item_type', arrayJoin columns: 'arrayConcat(CAST(attr_str_0, 'Array(Tuple(String, String))'), CAST(attr_str_1, 'Array(Tuple(String, String))'), CAST(attr_str_2, 'Array(Tuple(String, String))'), CAST(attr_str_3, 'Array(Tuple(String, String))'), CAST(attr_str_4, 'Array(Tuple(String, String))'), CAST(attr_str_5, 'Array(Tuple(String, String))'), CAST(attr_str_6, 'Array(Tuple(String, String))'), CAST(attr_str_7, 'Array(Tuple(String, String))'), CAST(attr_str_8, 'Array(Tuple(String, String))'), CAST(attr_str_9, 'Array(Tuple(String, String))'), CAST(attr_str_10, 'Array(Tuple(String, String))'), CAST(attr_str_11, 'Array(Tuple(String, String))'), CAST(attr_str_12, 'Array(Tuple(String, String))'), CAST(attr_str_13, 'Array(Tuple(String, String))'), CAST(attr_str_14, 'Array(Tuple(String, String))'), CAST(attr_str_15, 'Array(Tuple(String, String))'), CAST(attr_str_16, 'Array(Tuple(String, String))'), CAST(attr_str_17, 'Array(Tuple(String, String))'), CAST(attr_str_18, 'Array(Tuple(String, String))'), CAST(attr_str_19, 'Array(Tuple(String, String))'), CAST(attr_str_20, 'Array(Tuple(String, String))'), CAST(attr_str_21, 'Array(Tuple(String, String))'), CAST(attr_str_22, 'Array(Tuple(String, String))'), CAST(attr_str_23, 'Array(Tuple(String, String))'), CAST(attr_str_24, 'Array(Tuple(String, String))'), CAST(attr_str_25, 'Array(Tuple(String, String))'), CAST(attr_str_26, 'Array(Tuple(String, String))'), CAST(attr_str_27, 'Array(Tuple(String, String))'), CAST(attr_str_28, 'Array(Tuple(String, String))'), CAST(attr_str_29, 'Array(Tuple(String, String))'), CAST(attr_str_30, 'Array(Tuple(String, String))'), CAST(attr_str_31, 'Array(Tuple(String, String))'), CAST(attr_str_32, 'Array(Tuple(String, String))'), CAST(attr_str_33, 'Array(Tuple(String, String))'), CAST(attr_str_34, 'Array(Tuple(String, String))'), CAST(attr_str_35, 'Array(Tuple(String, String))'), CAST(attr_str_36, 'Array(Tuple(String, String))'), CAST(attr_str_37, 'Array(Tuple(String, String))'), CAST(attr_str_38, 'Array(Tuple(String, String))'), CAST(attr_str_39, 'Array(Tuple(String, String))'), array(tuple('sentry.service', service), tuple('sentry.segment_name', segment_name), tuple('sentry.name', name)))'. Stack trace:

0. DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x000000000c61ff37 in ....................................................................................................../usr/bin/clickhouse
1. DB::Exception::Exception(PreformattedMessage&&, int) @ 0x0000000007156ef1 in ....................................................................................................../usr/bin/clickhouse
2. DB::TreeRewriterResult::collectUsedColumns(std::shared_ptr<DB::IAST> const&, bool, bool) @ 0x000000001221ab99 in ....................................................................................................../usr/bin/clickhouse
3. DB::TreeRewriter::analyzeSelect(std::shared_ptr<DB::IAST>&, DB::TreeRewriterResult&&, DB::SelectQueryOptions const&, std::vector<DB::TableWithColumnNamesAndTypes, std::allocator<DB::TableWithColumnNamesAndTypes>> const&, std::vector<String, std::allocator<String>> const&, std::shared_ptr<DB::TableJoin>) const @ 0x000000001221f801 in ....................................................................................................../usr/bin/clickhouse
4. DB::InterpreterSelectQuery::InterpreterSelectQuery(std::shared_ptr<DB::IAST> const&, std::shared_ptr<DB::Context> const&, std::optional<DB::Pipe>, std::shared_ptr<DB::IStorage> const&, DB::SelectQueryOptions const&, std::vector<String, std::allocator<String>> const&, std::shared_ptr<DB::StorageInMemoryMetadata const> const&, std::shared_ptr<DB::PreparedSets>)::$_0::operator()(bool) const @ 0x0000000011ed191c in ....................................................................................................../usr/bin/clickhouse
5. DB::InterpreterSelectQuery::InterpreterSelectQuery(std::shared_ptr<DB::IAST> const&, std::shared_ptr<DB::Context> const&, std::optional<DB::Pipe>, std::shared_ptr<DB::IStorage> const&, DB::SelectQueryOptions const&, std::vector<String, std::allocator<String>> const&, std::shared_ptr<DB::StorageInMemoryMetadata const> const&, std::shared_ptr<DB::PreparedSets>) @ 0x0000000011ec5975 in ....................................................................................................../usr/bin/clickhouse
6. DB::InterpreterSelectWithUnionQuery::InterpreterSelectWithUnionQuery(std::shared_ptr<DB::IAST> const&, std::shared_ptr<DB::Context>, DB::SelectQueryOptions const&, std::vector<String, std::allocator<String>> const&) @ 0x0000000011f74948 in ....................................................................................................../usr/bin/clickhouse
7. DB::InterpreterCreateQuery::createTable(DB::ASTCreateQuery&) @ 0x0000000011ced71c in ....................................................................................................../usr/bin/clickhouse
8. DB::InterpreterCreateQuery::execute() @ 0x0000000011cfd920 in ....................................................................................................../usr/bin/clickhouse
9. DB::executeQueryImpl(char const*, char const*, std::shared_ptr<DB::Context>, bool, DB::QueryProcessingStage::Enum, DB::ReadBuffer*) @ 0x00000000122bfe15 in ....................................................................................................../usr/bin/clickhouse
10. DB::executeQuery(String const&, std::shared_ptr<DB::Context>, bool, DB::QueryProcessingStage::Enum) @ 0x00000000122bb5b5 in ....................................................................................................../usr/bin/clickhouse
11. DB::TCPHandler::runImpl() @ 0x0000000013137519 in ....................................................................................................../usr/bin/clickhouse
12. DB::TCPHandler::run() @ 0x00000000131498f9 in ....................................................................................................../usr/bin/clickhouse
13. Poco::Net::TCPServerConnection::start() @ 0x0000000015b42834 in ....................................................................................................../usr/bin/clickhouse
14. Poco::Net::TCPServerDispatcher::run() @ 0x0000000015b43a31 in ....................................................................................................../usr/bin/clickhouse
15. Poco::PooledThread::run() @ 0x0000000015c7a667 in ....................................................................................................../usr/bin/clickhouse
16. Poco::ThreadImpl::runnableEntry(void*) @ 0x0000000015c7893c in ....................................................................................................../usr/bin/clickhouse
17. ? @ 0x00007ffaf5b8f609 in ?
18. ? @ 0x00007ffaf5ab4353 in ?


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File ".../snuba/tests/conftest.py", line 214, in clickhouse_db
    Runner().run_all(force=True)
  File ".../snuba/migrations/runner.py", line 261, in run_all
    self._run_migration_impl(
  File ".../snuba/migrations/runner.py", line 344, in _run_migration_impl
    migration.forwards(context, dry_run, columns_states)
  File ".../snuba/migrations/migration.py", line 170, in forwards
    op.execute()
  File ".../snuba/migrations/operations.py", line 81, in execute
    connection.execute(self.format_sql(), settings=self._settings)
  File ".../snuba/clickhouse/native.py", line 291, in execute
    raise ClickhouseError(e.message, code=e.code) from e
snuba.clickhouse.errors.ClickhouseError: DB::Exception: Missing columns: 'attr_str_19' 'service' 'attr_str_38' 'attr_str_25' 'attr_str_35' 'attr_str_33' 'attr_str_32' '_sort_timestamp' 'attr_str_17' 'attr_str_30' 'attr_str_2' 'attr_str_34' 'attr_str_29' 'attr_str_0' 'attr_str_28' 'attr_str_27' 'segment_name' 'attr_str_39' 'attr_str_26' 'attr_str_37' 'attr_str_36' 'attr_str_24' 'attr_str_23' 'attr_str_16' 'attr_str_21' 'attr_str_22' 'name' 'attr_str_15' 'attr_str_7' 'attr_str_14' 'attr_str_12' 'attr_str_20' 'attr_str_10' 'attr_str_18' 'attr_str_9' 'attr_str_8' 'attr_str_6' 'attr_str_11' 'attr_str_4' 'attr_str_3' 'attr_str_5' 'attr_str_1' 'attr_str_31' 'attr_str_13' while processing query: 'SELECT organization_id, project_id, item_type, attrs.1 AS attr_key, attrs.2 AS attr_value, toStartOfDay(_sort_timestamp) AS timestamp, retention_days, 1 AS count FROM default.eap_items_1_local LEFT ARRAY JOIN arrayConcat(CAST(attr_str_0, 'Array(Tuple(String, String))'), CAST(attr_str_1, 'Array(Tuple(String, String))'), CAST(attr_str_2, 'Array(Tuple(String, String))'), CAST(attr_str_3, 'Array(Tuple(String, String))'), CAST(attr_str_4, 'Array(Tuple(String, String))'), CAST(attr_str_5, 'Array(Tuple(String, String))'), CAST(attr_str_6, 'Array(Tuple(String, String))'), CAST(attr_str_7, 'Array(Tuple(String, String))'), CAST(attr_str_8, 'Array(Tuple(String, String))'), CAST(attr_str_9, 'Array(Tuple(String, String))'), CAST(attr_str_10, 'Array(Tuple(String, String))'), CAST(attr_str_11, 'Array(Tuple(String, String))'), CAST(attr_str_12, 'Array(Tuple(String, String))'), CAST(attr_str_13, 'Array(Tuple(String, String))'), CAST(attr_str_14, 'Array(Tuple(String, String))'), CAST(attr_str_15, 'Array(Tuple(String, String))'), CAST(attr_str_16, 'Array(Tuple(String, String))'), CAST(attr_str_17, 'Array(Tuple(String, String))'), CAST(attr_str_18, 'Array(Tuple(String, String))'), CAST(attr_str_19, 'Array(Tuple(String, String))'), CAST(attr_str_20, 'Array(Tuple(String, String))'), CAST(attr_str_21, 'Array(Tuple(String, String))'), CAST(attr_str_22, 'Array(Tuple(String, String))'), CAST(attr_str_23, 'Array(Tuple(String, String))'), CAST(attr_str_24, 'Array(Tuple(String, String))'), CAST(attr_str_25, 'Array(Tuple(String, String))'), CAST(attr_str_26, 'Array(Tuple(String, String))'), CAST(attr_str_27, 'Array(Tuple(String, String))'), CAST(attr_str_28, 'Array(Tuple(String, String))'), CAST(attr_str_29, 'Array(Tuple(String, String))'), CAST(attr_str_30, 'Array(Tuple(String, String))'), CAST(attr_str_31, 'Array(Tuple(String, String))'), CAST(attr_str_32, 'Array(Tuple(String, String))'), CAST(attr_str_33, 'Array(Tuple(String, String))'), CAST(attr_str_34, 'Array(Tuple(String, String))'), CAST(attr_str_35, 'Array(Tuple(String, String))'), CAST(attr_str_36, 'Array(Tuple(String, String))'), CAST(attr_str_37, 'Array(Tuple(String, String))'), CAST(attr_str_38, 'Array(Tuple(String, String))'), CAST(attr_str_39, 'Array(Tuple(String, String))'), [('sentry.service', service), ('sentry.segment_name', segment_name), ('sentry.name', name)]) AS attrs GROUP BY organization_id, project_id, item_type, attr_key, attr_value, timestamp, retention_days', required columns: 'retention_days' 'organization_id' 'attr_str_13' 'attr_str_31' 'attr_str_1' 'attr_str_5' 'project_id' 'attr_str_3' 'attr_str_4' 'attr_str_11' 'attr_str_6' 'attr_str_8' 'attr_str_9' 'attr_str_18' 'attr_str_10' 'attr_str_20' 'attr_str_12' 'item_type' 'attr_str_14' 'attr_str_7' 'attr_str_15' 'name' 'attr_str_22' 'attr_str_21' 'attr_str_16' 'attr_str_23' 'attr_str_24' 'attr_str_36' 'attr_str_37' 'attr_str_26' 'attr_str_39' 'segment_name' 'attr_str_27' 'attr_str_28' 'attr_str_0' 'attr_str_29' 'attr_str_34' 'attr_str_2' 'attr_str_30' 'attr_str_17' '_sort_timestamp' 'attr_str_32' 'attr_str_33' 'attr_str_35' 'attr_str_25' 'attr_str_38' 'service' 'attr_str_19', maybe you meant: 'retention_days', 'organization_id', 'project_id' or 'item_type', arrayJoin columns: 'arrayConcat(CAST(attr_str_0, 'Array(Tuple(String, String))'), CAST(attr_str_1, 'Array(Tuple(String, String))'), CAST(attr_str_2, 'Array(Tuple(String, String))'), CAST(attr_str_3, 'Array(Tuple(String, String))'), CAST(attr_str_4, 'Array(Tuple(String, String))'), CAST(attr_str_5, 'Array(Tuple(String, String))'), CAST(attr_str_6, 'Array(Tuple(String, String))'), CAST(attr_str_7, 'Array(Tuple(String, String))'), CAST(attr_str_8, 'Array(Tuple(String, String))'), CAST(attr_str_9, 'Array(Tuple(String, String))'), CAST(attr_str_10, 'Array(Tuple(String, String))'), CAST(attr_str_11, 'Array(Tuple(String, String))'), CAST(attr_str_12, 'Array(Tuple(String, String))'), CAST(attr_str_13, 'Array(Tuple(String, String))'), CAST(attr_str_14, 'Array(Tuple(String, String))'), CAST(attr_str_15, 'Array(Tuple(String, String))'), CAST(attr_str_16, 'Array(Tuple(String, String))'), CAST(attr_str_17, 'Array(Tuple(String, String))'), CAST(attr_str_18, 'Array(Tuple(String, String))'), CAST(attr_str_19, 'Array(Tuple(String, String))'), CAST(attr_str_20, 'Array(Tuple(String, String))'), CAST(attr_str_21, 'Array(Tuple(String, String))'), CAST(attr_str_22, 'Array(Tuple(String, String))'), CAST(attr_str_23, 'Array(Tuple(String, String))'), CAST(attr_str_24, 'Array(Tuple(String, String))'), CAST(attr_str_25, 'Array(Tuple(String, String))'), CAST(attr_str_26, 'Array(Tuple(String, String))'), CAST(attr_str_27, 'Array(Tuple(String, String))'), CAST(attr_str_28, 'Array(Tuple(String, String))'), CAST(attr_str_29, 'Array(Tuple(String, String))'), CAST(attr_str_30, 'Array(Tuple(String, String))'), CAST(attr_str_31, 'Array(Tuple(String, String))'), CAST(attr_str_32, 'Array(Tuple(String, String))'), CAST(attr_str_33, 'Array(Tuple(String, String))'), CAST(attr_str_34, 'Array(Tuple(String, String))'), CAST(attr_str_35, 'Array(Tuple(String, String))'), CAST(attr_str_36, 'Array(Tuple(String, String))'), CAST(attr_str_37, 'Array(Tuple(String, String))'), CAST(attr_str_38, 'Array(Tuple(String, String))'), CAST(attr_str_39, 'Array(Tuple(String, String))'), array(tuple('sentry.service', service), tuple('sentry.segment_name', segment_name), tuple('sentry.name', name)))'. Stack trace:

0. DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x000000000c61ff37 in ....................................................................................................../usr/bin/clickhouse
1. DB::Exception::Exception(PreformattedMessage&&, int) @ 0x0000000007156ef1 in ....................................................................................................../usr/bin/clickhouse
2. DB::TreeRewriterResult::collectUsedColumns(std::shared_ptr<DB::IAST> const&, bool, bool) @ 0x000000001221ab99 in ....................................................................................................../usr/bin/clickhouse
3. DB::TreeRewriter::analyzeSelect(std::shared_ptr<DB::IAST>&, DB::TreeRewriterResult&&, DB::SelectQueryOptions const&, std::vector<DB::TableWithColumnNamesAndTypes, std::allocator<DB::TableWithColumnNamesAndTypes>> const&, std::vector<String, std::allocator<String>> const&, std::shared_ptr<DB::TableJoin>) const @ 0x000000001221f801 in ....................................................................................................../usr/bin/clickhouse
4. DB::InterpreterSelectQuery::InterpreterSelectQuery(std::shared_ptr<DB::IAST> const&, std::shared_ptr<DB::Context> const&, std::optional<DB::Pipe>, std::shared_ptr<DB::IStorage> const&, DB::SelectQueryOptions const&, std::vector<String, std::allocator<String>> const&, std::shared_ptr<DB::StorageInMemoryMetadata const> const&, std::shared_ptr<DB::PreparedSets>)::$_0::operator()(bool) const @ 0x0000000011ed191c in ....................................................................................................../usr/bin/clickhouse
5. DB::InterpreterSelectQuery::InterpreterSelectQuery(std::shared_ptr<DB::IAST> const&, std::shared_ptr<DB::Context> const&, std::optional<DB::Pipe>, std::shared_ptr<DB::IStorage> const&, DB::SelectQueryOptions const&, std::vector<String, std::allocator<String>> const&, std::shared_ptr<DB::StorageInMemoryMetadata const> const&, std::shared_ptr<DB::PreparedSets>) @ 0x0000000011ec5975 in ....................................................................................................../usr/bin/clickhouse
6. DB::InterpreterSelectWithUnionQuery::InterpreterSelectWithUnionQuery(std::shared_ptr<DB::IAST> const&, std::shared_ptr<DB::Context>, DB::SelectQueryOptions const&, std::vector<String, std::allocator<String>> const&) @ 0x0000000011f74948 in ....................................................................................................../usr/bin/clickhouse
7. DB::InterpreterCreateQuery::createTable(DB::ASTCreateQuery&) @ 0x0000000011ced71c in ....................................................................................................../usr/bin/clickhouse
8. DB::InterpreterCreateQuery::execute() @ 0x0000000011cfd920 in ....................................................................................................../usr/bin/clickhouse
9. DB::executeQueryImpl(char const*, char const*, std::shared_ptr<DB::Context>, bool, DB::QueryProcessingStage::Enum, DB::ReadBuffer*) @ 0x00000000122bfe15 in ....................................................................................................../usr/bin/clickhouse
10. DB::executeQuery(String const&, std::shared_ptr<DB::Context>, bool, DB::QueryProcessingStage::Enum) @ 0x00000000122bb5b5 in ....................................................................................................../usr/bin/clickhouse
11. DB::TCPHandler::runImpl() @ 0x0000000013137519 in ....................................................................................................../usr/bin/clickhouse
12. DB::TCPHandler::run() @ 0x00000000131498f9 in ....................................................................................................../usr/bin/clickhouse
13. Poco::Net::TCPServerConnection::start() @ 0x0000000015b42834 in ....................................................................................................../usr/bin/clickhouse
14. Poco::Net::TCPServerDispatcher::run() @ 0x0000000015b43a31 in ....................................................................................................../usr/bin/clickhouse
15. Poco::PooledThread::run() @ 0x0000000015c7a667 in ....................................................................................................../usr/bin/clickhouse
16. Poco::ThreadImpl::runnableEntry(void*) @ 0x0000000015c7893c in ....................................................................................................../usr/bin/clickhouse
17. ? @ 0x00007ffaf5b8f609 in ?
18. ? @ 0x00007ffaf5ab4353 in ?

To view more test analytics, go to the Test Analytics Dashboard
📋 Got 3 mins? Take this short survey to help us improve Test Analytics.

LEFT ARRAY JOIN
arrayConcat(
{", ".join(f"CAST(attributes_string_{n}, 'Array(Tuple(String, String))')" for n in range(ITEM_ATTRIBUTE_BUCKETS))}
) AS attrs
Copy link
Member Author

@kylemumma kylemumma Feb 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at this part the old mvs also had

array(
            tuple('sentry.service', `service`),
            tuple('sentry.segment_name', `segment_name`),
            tuple('sentry.name', `name`)
        )

i removed that since those columns dont exist in eap_items

LEFT ARRAY JOIN
arrayConcat(
{",".join(f"CAST(attributes_float_{n}, 'Array(Tuple(String, Float64))')" for n in range(ITEM_ATTRIBUTE_BUCKETS))}
) AS attrs
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at this part the old mvs also had

array(
            tuple('sentry.duration_ms', duration_micro / 1000)
        )

i removed that since the column doesnt exist anymore

@kylemumma kylemumma marked this pull request as ready for review February 21, 2025 20:06
@kylemumma kylemumma requested review from a team as code owners February 21, 2025 20:06
engine=table_engines.AggregatingMergeTree(
storage_set=self.storage_set_key,
primary_key="(organization_id, project_id, timestamp, item_type, attr_key)",
order_by="(organization_id, project_id, timestamp, item_type, attr_key, retention_days)",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didnt include attr_value in this order by, but I did include it for str. This is because the num_attrs table has the columns min_value and max_value which made me think it was intended to be collapsed into just key not (key,value)

- organization_id
- referrer

query_processors:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there used to be a processor

- processor: UUIDColumnProcessor
    args:
      columns: [trace_id]

but since theres no trace_id column I thought its safe to remove

columns=self.columns,
destination_table_name=self.local_table,
target=OperationTarget.LOCAL,
query=f"""
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I verified on my local machine that this select statement works as I expect

operations.CreateTable(
storage_set=self.storage_set_key,
table_name=self.local_table,
engine=table_engines.ReplacingMergeTree(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chose ReplacingMergeTree to save space by deleting duplicate rows

Copy link
Member

@volokluev volokluev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just 2 questions, looks good otherwise

Column("organization_id", UInt(64)),
Column("project_id", UInt(64)),
Column("item_type", UInt(8)),
Column("attr_key", String(modifiers=Modifiers(codecs=["ZSTD(1)"]))),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the compression here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 91 to 99
GROUP BY
organization_id,
project_id,
item_type,
attr_key,
attr_value,
attr_type,
timestamp,
retention_days
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the point of the group by here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was originally there in the old MV https://github.com/getsentry/snuba/blob/master/snuba/snuba_migrations/events_analytics_platform/0017_span_attribute_table_v3.py#L122-L127
I think its to remove duplicates so thats why I kept it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The group by is for aggregations (in the aggregating merge tree).

You don't need to group by here. The replacing merge tree will remove duplicates by itself

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah good point lol, I will get rid of this group by. But even when the mv was aggregating merge tree, this select query still didn't have an aggregate in it, so it seems just as wrong before as it is now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants