Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(ir): allow the creation of lists from columns #10102

Closed
1 task done
deepyaman opened this issue Sep 12, 2024 · 5 comments · Fixed by #10498
Closed
1 task done

feat(ir): allow the creation of lists from columns #10102

deepyaman opened this issue Sep 12, 2024 · 5 comments · Fixed by #10498
Assignees
Labels
feature Features or general enhancements

Comments

@deepyaman
Copy link
Contributor

Is your feature request related to a problem?

I (possibly on behalf of @lostmygithubaccount) am frustrated by writing my_column.to_pyarrow().to_pylist() to get a Python list from an Ibis column.

What is the motivation behind your request?

Discussion on #10084 (comment) during triage. The shorthand my_column.to_pylist() was also proposed, but list(my_column) would be so much more Pythonic.

Describe the solution you'd like

list(my_column)

I don't need list(my_table) or list(some_scalar) to be implemented.

What version of ibis are you running?

dev

What backend(s) are you using, if any?

N/A

Code of Conduct

  • I agree to follow this project's Code of Conduct
@deepyaman deepyaman added the feature Features or general enhancements label Sep 12, 2024
@deepyaman deepyaman self-assigned this Sep 12, 2024
@deepyaman
Copy link
Contributor Author

I can take a stab at this (later this week?), as discussed in triage.

@lostmygithubaccount
Copy link
Member

can I just express frustration in meetings and have y'all take it from there? 😂 love it

@jcrist
Copy link
Member

jcrist commented Sep 12, 2024

but list(my_column) would be so much more Pythonic.

I'm concerned about having __iter__ on columns result in (potentially expensive) query execution. We don't support __bool__/__int__/__float__ on scalar values for the same reason. Accidental coercion like this can be a performance footgun for users, resulting in query execution in unintended locations or occurring multiple times. Other lazy systems (dask, polars) also don't do automatic execution in python magic methods for similar reasons.

I'd much rather have a new to_* method that does this. Why not Column.to_list?

@deepyaman
Copy link
Contributor Author

but list(my_column) would be so much more Pythonic.

I'm concerned about having __iter__ on columns result in (potentially expensive) query execution. We don't support __bool__/__int__/__float__ on scalar values for the same reason. Accidental coercion like this can be a performance footgun for users, resulting in query execution in unintended locations or occurring multiple times. Other lazy systems (dask, polars) also don't do automatic execution in python magic methods for similar reasons.

I'd much rather have a new to_* method that does this. Why not Column.to_list?

Sorry, forgot to respond. Column.to_list seems fine.

@IndexSeek
Copy link
Member

I have also found myself using the to_pyarrow().to_pylist() approach, particularly when feeding a Streamlit selectbox or multiselect widget.

Going directly to a Python list from a column would be very convenient with something like to_list.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Features or general enhancements
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

4 participants