Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pl.struct with literal List field #8062

Closed
2 tasks done
josh opened this issue Apr 7, 2023 · 6 comments · Fixed by #8327
Closed
2 tasks done

pl.struct with literal List field #8062

josh opened this issue Apr 7, 2023 · 6 comments · Fixed by #8327
Assignees
Labels
bug Something isn't working python Related to Python Polars

Comments

@josh
Copy link
Contributor

josh commented Apr 7, 2023

Polars version checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Issue description

Using pl.struct with a mix of expressions and literals mostly works as expected. There's a specific issue I ran into when one of the struct fields is a list.

The issue might possibly be related to #7299 as I'm seeing these list columns flattened into a series of their inner type before the unit series is expanded to match the frame's height.

Reproducible example

df = pl.DataFrame({"a": [1, 2, 3]})
schema = {"a": pl.Int64, "b": pl.List(pl.Int64)}

# returns empty frame
df.select(pl.struct([pl.col("a"), pl.lit([]).alias("b")], schema=schema))

# returns struct filled with all nulls but correct length
# shape: (3, 1)
# ┌─────────────┐
# │ a           │
# │ ---         │
# │ struct[2]   │
# ╞═════════════╡
# │ {null,null} │
# │ {null,null} │
# │ {null,null} │
# └─────────────┘
df.select(pl.struct([pl.col("a"), pl.lit([None]).alias("b")], schema=schema))

# panics
df.select(pl.struct([pl.col("a"), pl.lit([42]).alias("b")], schema=schema))

Expected behavior

# pl.struct([pl.col("a"), pl.lit([]).alias("b")])
shape: (3, 1)
┌───────────┐
│ a         │
│ ---       │
│ struct[2] │
╞═══════════╡
│ {1,[]}    │
│ {2,[]}    │
│ {3,[]}    │
└───────────┘

# pl.struct([pl.col("a"), pl.lit([None]).alias("b")])
shape: (3, 1)
┌────────────┐
│ a          │
│ ---        │
│ struct[2]  │
╞════════════╡
│ {1,[null]} │
│ {2,[null]} │
│ {2,[null]} │
└────────────┘

# pl.struct([pl.col("a"), pl.lit([42]).alias("b")])
shape: (3, 1)
┌───────────┐
│ a         │
│ ---       │
│ struct[2] │
╞═══════════╡
│ {1,[42]}  │
│ {2,[42]}  │
│ {2,[42]}  │
└───────────┘

Installed versions

---Version info---
Polars: 0.17.0
Index type: UInt32
Platform: Linux-5.4.0-1105-azure-x86_64-with-glibc2.31
Python: 3.11.1 (main, Jan 24 2023, 23:47:54) [GCC 9.4.0]
---Optional dependencies---
numpy: <not installed>
pandas: <not installed>
pyarrow: <not installed>
connectorx: <not installed>
deltalake: <not installed>
fsspec: 2023.1.0
matplotlib: <not installed>
xlsx2csv: <not installed>
xlsxwriter: <not installed>
@josh josh added bug Something isn't working python Related to Python Polars labels Apr 7, 2023
@josh
Copy link
Contributor Author

josh commented Apr 10, 2023

It looks like pl.lit(list) was added recently in #7879 by @alexander-beedie. It seems like if you just double wrap the values in another list, it kinda works. I guess it's just an unintuitive edge case of pl.lit that lists aren't actually treated as literal/unit types but rather a series of their own. I guess this just a wontfix/expected behavior?

@alexander-beedie alexander-beedie self-assigned this Apr 10, 2023
@alexander-beedie
Copy link
Collaborator

alexander-beedie commented Apr 10, 2023

I may have a solution here; the problem with providing a list literal (given the existing Series behaviour inside lit, which the list literal currently follows), is that it is potentially ambiguous... adding an orient param to lit would help explicitly disambiguate; the one question then is whether to follow the current Series behaviour as the default, or existing literal/scalar behaviour. Will experiment to see what looks most sensible 🤔

@josh
Copy link
Contributor Author

josh commented Apr 15, 2023

0.17.3 was just released with another related regression. I can no longer use pl.lit([[]]).

import polars as pl

pl.DataFrame({"a": [1, 2, 3]}).select(
    pl.struct(
        [
            pl.col("a"),
            pl.lit([[]]).alias("b"),
        ],
        schema={"a": pl.Int64, "b": pl.List(pl.Int64)},
    )
)

Previously on 0.17.2 or up to f7ea104, it would output

shape: (3, 1)
┌───────────┐
│ a         │
│ ---       │
│ struct[2] │
╞═══════════╡
│ {1,[]}    │
│ {2,[]}    │
│ {3,[]}    │
└───────────┘

Now on 0.17.3 or after 9a73d3c,

shape: (3, 1)
┌─────────────┐
│ a           │
│ ---         │
│ struct[2]   │
╞═════════════╡
│ {null,null} │
│ {null,null} │
│ {null,null} │
└─────────────┘

No combination of pl.lit([[]]) pl.lit([]) pl.lit(None), pl.lit([None]) seems to work here. 🤷🏻‍♂️

@alexander-beedie
Copy link
Collaborator

Hmm. I'll take a look at all of this tomorrow and see if I can consolidate/fix 😅

@ritchie46
Copy link
Member

It looks like pl.lit(list) was added recently in #7879 by @alexander-beedie. It seems like if you just double wrap the values in another list, it kinda works. I guess it's just an unintuitive edge case of pl.lit that lists aren't actually treated as literal/unit types but rather a series of their own. I guess this just a wontfix/expected behavior?

I think we must document this better in pl.lit. A single list passed will be of type Element of the list. So a nested list, will be of type List<Element>.

The other issue is the cast due to the Schema . I will fix this one on the rust side.

@josh
Copy link
Contributor Author

josh commented Apr 18, 2023

Thanks @ritchie46! I'll checkout the fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working python Related to Python Polars
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants