-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Filter a list in a column #6596
Comments
I can't seem to get this to work: @ritchie46, how does one create a list literal? pl.lit([1, 1], dtype=pl.List(pl.Int64))
We can do a literal import polars as pl
df = pl.DataFrame({'a': [[1, 1], [2, 2]]})
s = pl.lit(
pl.Series([[1, 1]], dtype=pl.List(pl.Int64))
)
df.filter(pl.col('a') == s)
...but the filter obviously does not work. I tried df.filter(pl.col('a').is_in(s))
|
The following works, but it's not pretty: col = pl.col('Statut')
mask = (col.arr.lengths() == 2) & (col.arr.get(0) == pl.lit('Vu')) & (col.arr.get(1) == pl.lit('Vu'))
df.filter(~mask) We should definitely improve the usability here. |
@stinodego that's a great solution, but I would say that's a workaround to the core issue which is that one cannot (AFAIK) supply a list as a literal. I imagine this is something that has come up before but my searching reveals nothing, so maybe this is indeed the first time. I don't know how polars does it, but in python lists are Hashable but I doubt polars would ever do a python hashing of every list element, so a list literal would probably have a lot of associated checks that would make it a pain to implement, although not prohibitively so. |
I'd say there are actually two issues here:
|
For posterity, I have found a way to make this filtering: |
Actually, I spoke too fast (found a bug?) as the multiple expression filtering does not interpret & as one would expect.... i.e. & function should be: what the expression above does in the & is: (have tried adding ( ) and it doesn't work, as soon as one expression matches, the row is filtered out) |
I think the original question was answered. Please open a feature request if there is specific functionality you are still missing. |
Research
I have searched the above polars tags on Stack Overflow for similar questions.
I have asked my usage related question on Stack Overflow.
Link to question on Stack Overflow
No response
Question about Polars
Hi, I have a Dataframe with a column that looks like that:
Statut │
│ --- │
│ list[str] │
╞═════════════════════════════════════╡
│ ["Absent excusé", "Vu"] │
│ ["Absent excusé", "Vu"] │
│ ["Absent excusé", "Absent excusé... │
│ ["Vu", "Absent excusé", "Absent ... │
│ ... │
│ ["Vu", "Absent non excusé"] │
│ ["Absent excusé", "Vu"] │
│ ["Vu", "Vu"] │
│ ["Absent excusé", "Vu"]
polar_statut_vu.select(pl.col("Statut"))
What I would like to do is exclude rows where the list exactly matches ["Vu","Vu"] but I cannot figure out howto do that...
I have tried: polar_statut_vu.filter((pl.col("Statut") != ["Vu", "Vu"])) but this throws up an error...
Any help would be appreciated, I am sure it is a simple solution :)
The text was updated successfully, but these errors were encountered: