-
Notifications
You must be signed in to change notification settings - Fork 605
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor(dask): remove the dask backend #9772
Conversation
This PR is open for discussion, just to have something to anchor discussions. Just because there's a PR doesn't mean it will be merged. |
Lint failures are related to the |
ec634f7
to
da701b5
Compare
+1 to all of that. Of the two non-SQL backends, the dask backend is the one that has some amount of value add since it provides access to a distributed compute engine (while the pandas backend is worse in every metric than the duckdb backend, while providing no added functionality). That said, the maintenance cost and current architecture make maintaining it without known users asking for it not worth it IMO. I think we should drop both the dask and pandas backends as a breaking change in 10.0. |
da701b5
to
d871dd3
Compare
I agree with everything above. I agree the dask backend could be of use if we rewrote it, but nothing about its current state would inform that rewrite. I think the pandas backend is a net-negative for the project -- that's not a strike on pandas itself, but trying to shoehorn an eager evaluation engine into Ibis leads to a very unpleasant experience, and new users often try the pandas backend because they are looking for something familiar / think they need to because they already have a dataframe in-memory. Rip it all out. |
d871dd3
to
c830a29
Compare
My preference would be:
|
This is purely additional work, without any benefit. Concretely, who is using the dask backend and would actually benefit from this? Why should we do this?
We shouldn't keep a backend around because we might use its infrastructure someday. Also, we can just move whatever polars is using into the polars backend. The pandas backend at this point just causes confusion and incorrect perceptions about Ibis's performance. Should we really put person hours of effort in to address those performance issues just because some random person decided to try it out and didn't have a good experience? |
c830a29
to
885a4eb
Compare
@gforsyth @jcrist @lostmygithubaccount @kszucs @ncclementi I would really like to avoid this PR stalling. I think we should include this as a breaking change for 10.0. |
I agree (as I said above). I also think we should do the same for the pandas backend. |
can we go through a deprecation cycle with a warning instead of removing entirely in a release? I'm worried on:
that there are actually people using it in production that we don't know about. I vaguely remember at least one person mentioning they were on GH (though I could not find this), and there could be many who don't interact in the community instead of removing the backends, could we start off with a warning if you're using them in 10.0 and look to remove in 11.0? ideally I think we should ensure users are aware that this change is coming and have an opportunity to weigh in (without needing to closely follow the project on GitHub) |
Is there any reason to wait? Here are the scenarios: plan to remove in 10.0
plan to remove in 11.0
I don't really see much benefit in waiting given that there's only a single scenario that requires a revert. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rip it out
as an outsider, what I would see is a backend being removed without warning because the maintainers did not want to maintain it. this makes me worry that the backend I use or would want to use could be removed at any moment without warning. of course in this case, there are good reasons for the removal, but I may not be aware of those or care enough to dig through GitHub to understand the context if, instead, I see that the backend was deprecated for 1-6 months with a warning message and an announcement, and then finally gets removed, my takeaway is very different I suspect:
is fairly likely, and would prefer a longer deprecation cycle with proper warning |
After talking with @ianmcook I think we should give a small buffer for folks to move away from Dask/Pandas. Next steps are:
|
-1 on pandas removal here |
fe71e80
to
1db5d0e
Compare
74a3bd9
to
306b3d4
Compare
306b3d4
to
4e5b949
Compare
Started going through and cleaning up deprecations. I'm going to submit a separate PR for each removal. In the first PR I'll disable the deprecation check in the release verification script and then once we're done removing stuff, I'll re-enable it. |
ok I'm going to back out all the changes and just keep the backend removal i will disable the check for deprecation in the release dry run script for now, and we can enable it again once we get the breakages in |
BREAKING CHANGE: The `dask` backend is removed. Please use one of the other backends that Ibis supports.
e10b105
to
8f9e213
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty comprehensive to me.
We may want to update docs/backends_sankey.py
after we've also removed pandas
and then update the sankey diagram to reflect the new backend landscape.
I figure this should have multiple approvals before merging.
# move back to 3.12 when dask-expr is supported or the dask backend is | ||
# removed | ||
default = ibis310; | ||
default = ibis312; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Description of changes
This PR removes the
dask
backend from Ibis.The
dask
backend has been in pure maintenance mode since the introduction ofdask-expr
, which overall seems like a net improvement to dask.It's not clear what value Ibis continues to add on top of dask.
Dask is a useful library on its own, and I think Dask + Ibis is adding unjustified
complexity for very little gain.
The work to get a version of Dask with
dask-expr
working doesn't seem worth it.I tried to do it a few weeks ago and there were numerous tests of ours that
started failing. Going through the whole cycle of upstream bug reporting,
fixing and waiting just doesn't seem worth the trouble.
BREAKING CHANGE: The
dask
backend is removed. Please use one of theother backends that Ibis supports.