Add a way to test for list equality #10138

m-legrand · 2023-07-28T09:46:05Z

Problem description

I'm currently working with a dataset that contains list columns, and I was surprised not to find an easy way to test for list equality:

>>> import polars as pl
>>> data = pl.DataFrame({"x": [[1], [1, 2], [2, 3]]}, schema={"x": pl.List(int)})
... data
shape: (3, 1)
┌───────────┐
│ x         │
│ ---       │
│ list[i64] │
╞═══════════╡
│ [1]       │
│ [1, 2]    │
│ [2, 3]    │
└───────────┘
>>> data.filter(pl.col("x") == pl.lit([1]))
ArrowErrorException: NotYetImplemented("Casting from Int64 to 
    LargeList(Field { name: \"item\", data_type: Int64, is_nullable: true, metadata: {} }) 
    not supported")

Maybe I missed the right incantation from the Expr.list namespace?
In the meantime I went with the following utility function:

def filter_list_equal(df: pl.DataFrame, colname: str, values: list) -> pl.DataFrame:
    col = pl.col(colname)
    lf = df.lazy()
    lf = lf.filter(col.list.lengths() == pl.lit(len(values)))
    for i, v in enumerate(values):
        lf = lf.filter(col.list[i] == pl.lit(v))
    return lf.collect()

The text was updated successfully, but these errors were encountered:

ion-elgreco · 2023-07-28T11:16:02Z

You need to wrap [1] in a list otherwise it's interpreted as an int #7879. Also you need to add it before you filter on it. Not sure why it's not working within the filter. I think I saw an issue about this before.

data.with_columns(pl.lit([[1]]).alias('y')).filter(pl.col('x') == pl.col('y'))

ritchie46 · 2023-07-28T11:19:26Z

Something seems to go wrong when we inline the predicate. Will take a look later

m-legrand · 2023-07-29T00:53:34Z

Having to assign a new column (and delete it afterwards) also makes for a more cumbersome user experience.
Not even mentioning having to come up with column names I'm sure my input dataframe doesn't already have!

cmdlineluser · 2023-08-29T22:48:14Z

This was asked again today on stackoverflow: https://stackoverflow.com/questions/77002768/how-to-filter-a-polars-dataframe-with-list-type-columns

The current workaround for adding a new column seems to be (casting to numerics if necessary e.g. str -> cat) and .hash()

df.filter(pl.col("x").hash() != pl.lit([[1]]).hash())

# shape: (2, 1)
# ┌───────────┐
# │ x         │
# │ ---       │
# │ list[i64] │
# ╞═══════════╡
# │ [1, 2]    │
# │ [2, 3]    │
# └───────────┘

Was also asked a couple of weeks ago: https://stackoverflow.com/questions/76875762/filter-on-listint64-dtype-in-polars

m-legrand added the enhancement New feature or an improvement of an existing feature label Jul 28, 2023

cmdlineluser mentioned this issue Sep 1, 2023

Series.eq does not work correctly for List types #10698

Closed

2 tasks

c-peters mentioned this issue Sep 1, 2023

fix(rust): Add broadcasting for list comparisons #10857

Merged

ritchie46 closed this as completed in #10857 Sep 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a way to test for list equality #10138

Add a way to test for list equality #10138

m-legrand commented Jul 28, 2023 •

edited

Loading

ion-elgreco commented Jul 28, 2023 •

edited

Loading

ritchie46 commented Jul 28, 2023

m-legrand commented Jul 29, 2023 •

edited

Loading

cmdlineluser commented Aug 29, 2023

Add a way to test for list equality #10138

Add a way to test for list equality #10138

Comments

m-legrand commented Jul 28, 2023 • edited Loading

Problem description

ion-elgreco commented Jul 28, 2023 • edited Loading

ritchie46 commented Jul 28, 2023

m-legrand commented Jul 29, 2023 • edited Loading

cmdlineluser commented Aug 29, 2023

m-legrand commented Jul 28, 2023 •

edited

Loading

ion-elgreco commented Jul 28, 2023 •

edited

Loading

m-legrand commented Jul 29, 2023 •

edited

Loading