POC - Enable support for clickhouse on SL #1592

thiagosalvatore · 2025-01-13T17:44:38Z

No description provided.

cla-bot · 2025-01-13T17:44:44Z

Thanks for your pull request, and welcome to our community! We require contributors to sign our Contributor License Agreement and we don't seem to have your signature on file. Check out this article for more information on why we have a CLA.

In order for us to review and merge your code, please submit the Individual Contributor License Agreement form attached above above. If you have questions about the CLA, or if you believe you've received this message in error, please reach out through a comment on this PR.

CLA has not been signed by users: @thiagosalvatore

github-actions · 2025-01-13T17:44:54Z

Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide.

thiagosalvatore · 2025-01-13T17:50:23Z

metricflow/sql/render/clickhouse.py

+        settings = ["allow_experimental_join_condition = 1", "allow_experimental_analyzer = 1", "join_use_nulls = 0"]
+        return SqlPlanRenderResult(sql=f"SETTINGS {', '.join(settings)}", bind_parameter_set=SqlBindParameterSet())
+
+    def _render_joins_section(self, join_descriptions: Sequence[SqlJoinDescription]) -> Optional[SqlPlanRenderResult]:


I don't link this at all.

The problem is that clickhouse doesn't support inequality INNER JOINS

https://clickhouse.com/docs/en/sql-reference/statements/select/join#join-with-inequality-conditions-for-columns-from-different-tables.

The suggested approach is to use CROSS JOIN instead, but it forces a change on the rendering logic of any join that uses inequality.

The tests are passing, but I'm open for suggestions here

We'll definitely want to discuss this at the MF sync tomorrow. The problem is that a SQL plan theoretically should be engine-agnostic. In this case, it isn't, since the INNER JOIN in the SQL plan is invalid for Clickhouse.

A couple of other potential options:

We change the SQL plan to use CROSS JOIN, which I'm assuming would work for all engines. Then, we could implement an engine-specific optimizer that swaps in an INNER JOIN if the engine supports it. In practice, if that optimizer errors for some reason, the user might be surprised to see a CROSS JOIN here when it's not necessary. We might be ok with that tradeoff.

Another option would be for us to just create an internal concept of a generic "inequality join" or something similar to use in the SQL plan. And that would get translated into either INNER JOIN or CROSS JOIN by the SQL renderer.
Both of those options might have issues integrating with the WHERE clause, though, since we would need to include that when CROSS JOIN is used 🤔

Let's hold off on making any changes here for now - I want to see if Paul has any better ideas for how to work around this.

thiagosalvatore · 2025-01-13T17:51:21Z

metricflow/sql/render/sql_plan_renderer.py

@@ -322,6 +322,9 @@ def _render_limit_section(self, limit_value: Optional[int]) -> Optional[SqlPlanR
            return None
        return SqlPlanRenderResult(sql=f"LIMIT {limit_value}", bind_parameter_set=SqlBindParameterSet())

+    def _render_adapter_specific_flags(self) -> Optional[SqlPlanRenderResult]:


clickhouse has some flags that can be enabled on a query level. Instead of adding a bunch of workarounds to the statement execution I decided to add it here so if we have other adapters with similar features we can just use it.

thiagosalvatore · 2025-01-13T17:51:44Z

tests_metricflow/fixtures/sql_clients/adapter_backed_ddl_client.py

+
+            if self.sql_engine_type == SqlEngine.CLICKHOUSE:
+                create_table_statement = (
+                    f"{create_table_statement} ENGINE = MergeTree ORDER BY ({columns_to_insert[0].split(" ")[0]})"


clickhouse enforces that every table being created needs to specify what is the engine

thiagosalvatore · 2025-01-13T17:51:58Z

tests_metricflow/fixtures/sql_clients/adapter_backed_ddl_client.py


    def drop_schema(self, schema_name: str, cascade: bool = True) -> None:
        """Drop the given schema from the data warehouse. Only used in tests."""
-        self.execute(f"DROP SCHEMA IF EXISTS {schema_name}{' CASCADE' if cascade else ''}")
+        if self.sql_engine_type is SqlEngine.CLICKHOUSE:


there's no concept of schema on clickhouse. A schema is a database

courtneyholcomb

It's awesome how quickly you were able to figure all this out with so little context! 🙌

courtneyholcomb · 2025-01-13T20:03:23Z

metricflow/sql/render/clickhouse.py

+        }
+
+    @override
+    def render_date_part(self, date_part: DatePart) -> str:


I'm guessing this will need to be moved to render_extract()?

courtneyholcomb · 2025-01-13T20:23:37Z

metricflow/sql/render/clickhouse.py

+        settings = ["allow_experimental_join_condition = 1", "allow_experimental_analyzer = 1", "join_use_nulls = 0"]
+        return SqlPlanRenderResult(sql=f"SETTINGS {', '.join(settings)}", bind_parameter_set=SqlBindParameterSet())
+
+    def _render_joins_section(self, join_descriptions: Sequence[SqlJoinDescription]) -> Optional[SqlPlanRenderResult]:


We'll definitely want to discuss this at the MF sync tomorrow. The problem is that a SQL plan theoretically should be engine-agnostic. In this case, it isn't, since the INNER JOIN in the SQL plan is invalid for Clickhouse.

A couple of other potential options:

We change the SQL plan to use CROSS JOIN, which I'm assuming would work for all engines. Then, we could implement an engine-specific optimizer that swaps in an INNER JOIN if the engine supports it. In practice, if that optimizer errors for some reason, the user might be surprised to see a CROSS JOIN here when it's not necessary. We might be ok with that tradeoff.

Another option would be for us to just create an internal concept of a generic "inequality join" or something similar to use in the SQL plan. And that would get translated into either INNER JOIN or CROSS JOIN by the SQL renderer.
Both of those options might have issues integrating with the WHERE clause, though, since we would need to include that when CROSS JOIN is used 🤔

Let's hold off on making any changes here for now - I want to see if Paul has any better ideas for how to work around this.

courtneyholcomb · 2025-01-13T20:31:39Z

tests_metricflow/fixtures/sql_clients/adapter_backed_ddl_client.py

@@ -132,8 +140,14 @@ def _quote_escape_value(self, value: str) -> str:

    def create_schema(self, schema_name: str) -> None:
        """Create the given schema in a data warehouse. Only used in tutorials and tests."""
-        self.execute(f"CREATE SCHEMA IF NOT EXISTS {schema_name}")
+        if self.sql_engine_type is SqlEngine.CLICKHOUSE:


We should add this to the renderers so that we don't need any logic here to check the engine type (especially since it's used twice). You can do similar to what we do on line 125 above to get the string needed for timestamp_data_type. You can add a property to the renderer called like schema_str or something like that that just returns "SCHEMA" or "DATABASE".

I'm wondering if it makes sense to move this to the renderer given that we only create/drop schemas on tests.

Maybe we can turn this into a property inside the renderer that looks like:

has_schema_support that by default is True and then when this value is false we use the create database instead. WDYT?

Sure that works too!

courtneyholcomb · 2025-01-13T20:36:09Z

tests_metricflow/generate_snapshots.py

@@ -32,6 +32,10 @@
        "engine_url": trino://...",
        "engine_password": "..."
    },
+    "clickhouse": {


There's a 1pass entry that stores these creds for easy access, so it would be great if we could update that too!

…ouse

thiagosalvatore added 2 commits January 13, 2025 14:41

Enable clickhouse support on MetricFlow

ed7e07f

Fix lint issues

3a3d7cc

fix f-string format on python 3.9

d7b3f88

cla-bot bot added the cla:yes label Jan 13, 2025

thiagosalvatore commented Jan 13, 2025

View reviewed changes

thiagosalvatore changed the title ~~Poc clickhouse~~ POC - Enable support for clickhouse on SL Jan 13, 2025

courtneyholcomb added the Run Tests With Other SQL Engines Runs the test suite against the SQL engines in our target environment label Jan 13, 2025

courtneyholcomb temporarily deployed to DW_INTEGRATION_TESTS January 13, 2025 19:04 — with GitHub Actions Inactive

github-actions bot removed the Run Tests With Other SQL Engines Runs the test suite against the SQL engines in our target environment label Jan 13, 2025

courtneyholcomb reviewed Jan 13, 2025

View reviewed changes

thiagosalvatore added 4 commits January 14, 2025 09:56

Merge branch 'main' of github.com:dbt-labs/metricflow into poc-clickh…

2fe58ad

…ouse

Update to use new version of clickhouse

cd677bc

Update snapshots & tests

ca015f0

change to use toStart instead of dateTrunc

9b0aa68

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

POC - Enable support for clickhouse on SL #1592

POC - Enable support for clickhouse on SL #1592

thiagosalvatore commented Jan 13, 2025

cla-bot bot commented Jan 13, 2025

github-actions bot commented Jan 13, 2025

thiagosalvatore Jan 13, 2025

courtneyholcomb Jan 13, 2025

thiagosalvatore Jan 13, 2025

thiagosalvatore Jan 13, 2025

thiagosalvatore Jan 13, 2025

courtneyholcomb left a comment

courtneyholcomb Jan 13, 2025

courtneyholcomb Jan 13, 2025

courtneyholcomb Jan 13, 2025

thiagosalvatore Jan 17, 2025

courtneyholcomb Jan 17, 2025

courtneyholcomb Jan 13, 2025

POC - Enable support for clickhouse on SL #1592

Are you sure you want to change the base?

POC - Enable support for clickhouse on SL #1592

Conversation

thiagosalvatore commented Jan 13, 2025

cla-bot bot commented Jan 13, 2025

github-actions bot commented Jan 13, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

courtneyholcomb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment