-
Notifications
You must be signed in to change notification settings - Fork 605
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: will PyArrow+pandas be made optional for backends? #10120
Comments
asked over there for the rationale -- one of the engineers can weigh in but my understanding is it's still a good amount of work with fairly minimal benefit for users. the main reason cited in the past has been running in AWS Lambda and other FaaS, but you can very easily use PyArrow or other larger dependencies in those tools (i.e. I don't think this was ever a particularly valid reason, so would be great to understand this person's perspective) |
thanks for your response! just for my understanding - supposing it were possible, would you be open to such a PR? |
I personally don't see why we wouldn't. I think given infinite time and resources, this is definitely something we would do -- Phillip already made it possible as you note without a backend. of course, we'd want to ensure no functionality is lost. it'd be good to have the engineers weigh in (we'll discuss this at some point this week and can respond back here if they don't already from the GH notifications) |
FWIW I'm also interested in using ibis without requiring pyarrow as a dependency. I don't have anything against pyarrow personally, but it's a very big dependency to force on all users of a library (see the pandas v3 discussion) and with the Arrow PyCapsule Interface it's now a lot easier to use alternative, smaller Python Arrow implementations, like nanoarrow or my own. If substrait is now maturing, then any backend that can consume substrait (e.g. at least DuckDB) could in theory remove the pyarrow dependency pretty easily? |
I don't think these things are related -- the long-term vision is substrait as intermediary representation (and Ibis can already produce Substrait plans), but I wouldn't expect Ibis to "switch" anytime soon for a bunch of technical/data system adoption reasons (e.g. DuckDB's Substrait consumption tends to be far more buggy than SQL) not that it's hard to find but link to the pandas discussion for context: pandas-dev/pandas#57073 nanoarrow (or arro3) does seem like an interesting option but we're beyond my technical depth 😄 |
I'm closing this out in favor of #10166 -- TLDR; we're interested in making sure that our usage of pandas and pyarrow are cleanly separable from other backend functionality, but we aren't (in the short-term) going to remove |
Is your feature request related to a problem?
As far as I understand, pandas+pyarrow are now optional for
pip install ibis-framework
, but still required for all backendsWhat is the motivation behind your request?
There was a request recently in Narwhals that I thought Ibis might be better suited for, but the poster responded with
Describe the solution you'd like
Would you consider making PyArrow / pandas optional for backends?
What version of ibis are you running?
9.5.0
What backend(s) are you using, if any?
No response
Code of Conduct
The text was updated successfully, but these errors were encountered: