Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(polars): use Select op within polars backend #10005

Open
jcrist opened this issue Sep 3, 2024 · 5 comments
Open

refactor(polars): use Select op within polars backend #10005

jcrist opened this issue Sep 3, 2024 · 5 comments
Labels
internals Issues or PRs related to ibis's internal APIs polars The polars backend

Comments

@jcrist
Copy link
Member

jcrist commented Sep 3, 2024

Our other SQL backends convert Project/Filter/Sort/Distinct into a single Select operation. This fusion both results in simpler SQL, and results in these operations being (with some exceptions) commutative. In #9923 I added a test for this commutativity which is currently failing for the polars backend since we don't rewrite these queries to Select nodes.

I think the easiest fix would be to use the same rewrites as the SQL backend to generate SQL nodes. With the deprecation (and future removal) of the dask/pandas backends, another option would be to drop Project/Filter/Sort/Distinct entirely internally and only make use of the more general Select op.

@cpcloud
Copy link
Member

cpcloud commented Sep 3, 2024

I think dropping those other operations makes it harder to understand sequences of operations conceptually, so I'd prefer not to drop them.

That's how Ibis used to represent all selections and it was very difficult to understand the process by which expressions moved into SQL (or whatever else).

Splitting things up into separate operations way isn't without trade-offs, but it helps a lot with isolating the various parts of the compilation pipeline and reasoning about what is and isn't allowed (structurally, typing-wise, and optimization-wise).

@cpcloud
Copy link
Member

cpcloud commented Sep 3, 2024

+1 to using Select in the polars backend if that leads to some simplification.

@jcrist
Copy link
Member Author

jcrist commented Sep 3, 2024

Splitting things up into separate operations way isn't without trade-offs, but it helps a lot with isolating the various parts of the compilation pipeline and reasoning about what is and isn't allowed (structurally, typing-wise, and optimization-wise).

Doesn't the rewrite system help a bit with alleviating this? Besides the fusion code (project_to_select, merge_select_select, ...) I didn't really see much analysis/rewrite code that dispatched on Project/Filter/Sort/Distinct. If we're always fusing for SQL generation, using Select immediately would avoid a set of rewrites later. If you have a reference to some code that uses the split types for analysis that would be a helpful reference.

@cpcloud
Copy link
Member

cpcloud commented Sep 3, 2024

I believe all the DerefMap code depends on the various operations being split up.

@jcrist
Copy link
Member Author

jcrist commented Sep 3, 2024

Ah, it does. Missed that file, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
internals Issues or PRs related to ibis's internal APIs polars The polars backend
Projects
Status: backlog
Development

No branches or pull requests

2 participants