Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot add constant field to empty structs #21095

Open
2 tasks done
jankislinger opened this issue Feb 5, 2025 · 2 comments
Open
2 tasks done

Cannot add constant field to empty structs #21095

jankislinger opened this issue Feb 5, 2025 · 2 comments
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars

Comments

@jankislinger
Copy link
Contributor

jankislinger commented Feb 5, 2025

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

import polars as pl

df = pl.DataFrame({"s": [{}, {}, {}]})
df.with_columns(
    pl.col("s").struct.with_fields(a=pl.lit(42))
)

Log output

Traceback (most recent call last):
  File "/.../report.py", line 4, in <module>
    df.with_columns(
    ~~~~~~~~~~~~~~~^
        pl.col("s").struct.with_fields(a=pl.lit(42))
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/.../.venv/lib/python3.13/site-packages/polars/dataframe/frame.py", line 9586, in with_columns
    return self.lazy().with_columns(*exprs, **named_exprs).collect(_eager=True)
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^
  File "/.../.venv/lib/python3.13/site-packages/polars/lazyframe/frame.py", line 2056, in collect
    return wrap_df(ldf.collect(callback))
                   ~~~~~~~~~~~^^^^^^^^^^
polars.exceptions.InvalidOperationError: Series s, length 1 doesn't match the DataFrame height of 3

If you want expression: col("s").with_fields([dyn int: 42.alias("a")]) to be broadcasted, ensure it is a scalar (for instance by adding '.first()').

Issue description

When a struct column has no fields, it cannot add extra constant field. Adding pl.lit(42).first() didn't help. It seems like the with_fields is trying to create a series of length 1.

>>> df.select(
>>>     pl.col("s").struct.with_fields(a=pl.lit(42))
>>> )
shape: (1, 1)
┌───────────┐
│ s         │
│ ---       │
│ struct[1] │
╞═══════════╡
│ {42}      │
└───────────┘

I tried forcing some non-empty data type to the struct which fixes the issue.

>>> df = pl.DataFrame({"s": pl.Series([{}, {}, {}], dtype=pl.Struct([pl.Field("_", pl.Boolean)]))})
>>> df.with_columns(
>>>     pl.col("s").struct.with_fields(a=pl.lit(42).first())
>>> )
shape: (3, 1)
┌───────────┐
│ s         │
│ ---       │
│ struct[2] │
╞═══════════╡
│ {null,42} │
│ {null,42} │
│ {null,42} │
└───────────┘

Expected behavior

I would expect to create a column with the same field value in each element.

shape: (3, 1)
┌───────────┐
│ s         │
│ ---       │
│ struct[1] │
╞═══════════╡
│ {42}      │
│ {42}      │
│ {42}      │
└───────────┘

Installed versions

--------Version info---------
Polars:              1.21.0
Index type:          UInt32
Platform:            Linux-6.8.0-52-generic-x86_64-with-glibc2.39
Python:              3.13.1 (main, Jan  5 2025, 05:33:47) [Clang 19.1.6 ]
LTS CPU:             False
----Optional dependencies----
Azure CLI            <not installed>
adbc_driver_manager  <not installed>
altair               <not installed>
azure.identity       <not installed>
boto3                <not installed>
cloudpickle          <not installed>
connectorx           <not installed>
deltalake            <not installed>
fastexcel            <not installed>
fsspec               <not installed>
gevent               <not installed>
google.auth          <not installed>
great_tables         <not installed>
matplotlib           <not installed>
numpy                <not installed>
openpyxl             <not installed>
pandas               <not installed>
pyarrow              19.0.0
pydantic             <not installed>
pyiceberg            <not installed>
sqlalchemy           <not installed>
torch                <not installed>
xlsx2csv             <not installed>
xlsxwriter           <not installed>
@jankislinger jankislinger added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Feb 5, 2025
@mcrumiller
Copy link
Contributor

mcrumiller commented Feb 5, 2025

As a temporary workaround, you can use pl.repeat:

import polars as pl

df = pl.DataFrame({"s": [{}, {}, {}]})
df.with_columns(
    pl.col("s").struct.with_fields(
        a=pl.repeat(42, pl.len())
    )
)
# shape: (3, 1)
# ┌───────────┐
# │ s         │
# │ ---       │
# │ struct[1] │
# ╞═══════════╡
# │ {42}      │
# │ {42}      │
# │ {42}      │
# └───────────┘

@jankislinger
Copy link
Contributor Author

that is better workaround than what I have, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars
Projects
None yet
Development

No branches or pull requests

2 participants