Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQL compilation generates too many SELECTs #476

Open
rlamy opened this issue Sep 24, 2024 · 0 comments
Open

SQL compilation generates too many SELECTs #476

rlamy opened this issue Sep 24, 2024 · 0 comments

Comments

@rlamy
Copy link
Member

rlamy commented Sep 24, 2024

The SQL queries we generate are often unnecessarily complicated, particularly because they use way too many nested SELECTs. For instance, this:

from datachain import DataChain, C

chain = DataChain.from_values(a=range(10)).mutate(b=C("a")).order_by("b").select("b")
print(chain._query.apply_steps().select())

returns

SELECT anon_1.sys__id, anon_1.sys__rand, anon_1.b 
FROM (SELECT anon_2.sys__id AS sys__id, anon_2.sys__rand AS sys__rand, anon_2.a AS a, anon_2.b AS b 
FROM (SELECT anon_3.sys__id AS sys__id, anon_3.sys__rand AS sys__rand, anon_3.a AS a, a AS b 
FROM (SELECT anon_4.sys__id AS sys__id, anon_4.sys__rand AS sys__rand, anon_4.a AS a 
FROM (SELECT "udf_B4XCPB".sys__id AS sys__id, "udf_B4XCPB".sys__rand AS sys__rand, "udf_B4XCPB".a AS a 
FROM "udf_B4XCPB") AS anon_4) AS anon_3) AS anon_2 ORDER BY anon_2.b) AS anon_1

with 5 SELECTs even though only one is needed.

That may not be much of a problem in practice since reasonable DB engines should optimise that query so that it's equivalent to to the single-select version. However:

  • We don't actually know that all DBs will properly optimise the query.
  • That could cause even moderately complex chains to hit statement size limits.
  • That makes it hard to debug SQL generation.

I expect that QueryGenerator is partly to blame for this (because it forces us to call select() after every query step), so #456 should mitigate this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant