Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Count(*) does not aggregate rows #21084

Closed
2 tasks done
Hunterlige opened this issue Feb 4, 2025 · 3 comments · Fixed by #21108
Closed
2 tasks done

Count(*) does not aggregate rows #21084

Hunterlige opened this issue Feb 4, 2025 · 3 comments · Fixed by #21108
Assignees
Labels
A-sql Area: Polars SQL functionality accepted Ready for implementation bug Something isn't working P-high Priority: high python Related to Python Polars regression Issue introduced by a new release

Comments

@Hunterlige
Copy link

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

import polars as pl

df = pl.DataFrame(
    {
        "id": ["1", "2", "3"],
        "bar": [6.0, 7.0, 8.0],
    }
)
print(df.sql("SELECT COUNT(*) FROM self"))

Log output

polars==1.21

| len |
|-----|
| 3   |
| 3   |
| 3   |

Log output

Issue description

From polars 1.18 version Count(*) returns too many rows and does not aggregate all the rows selected.

Expected behavior

polars==1.17

| len |
|-----|
| 3   |

Installed versions

--------Version info---------
Polars:              1.21.0
Index type:          UInt32
Platform:            Linux-6.1.85+-x86_64-with-glibc2.35
Python:              3.11.11 (main, Dec  4 2024, 08:55:07) [GCC 11.4.0]
LTS CPU:             False

----Optional dependencies----
Azure CLI            <not installed>
adbc_driver_manager  <not installed>
altair               5.5.0
azure.identity       <not installed>
boto3                <not installed>
cloudpickle          3.1.1
connectorx           <not installed>
deltalake            <not installed>
fastexcel            <not installed>
fsspec               2024.10.0
gevent               <not installed>
google.auth          2.27.0
great_tables         <not installed>
matplotlib           3.10.0
numpy                1.26.4
openpyxl             3.1.5
pandas               2.2.2
pyarrow              17.0.0
pydantic             2.10.6
pyiceberg            <not installed>
sqlalchemy           2.0.37
torch                2.5.1+cu124
xlsx2csv             <not installed>
xlsxwriter           <not installed>
@Hunterlige Hunterlige added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Feb 4, 2025
@orlp orlp added accepted Ready for implementation P-high Priority: high A-sql Area: Polars SQL functionality and removed needs triage Awaiting prioritization by a maintainer labels Feb 4, 2025
@Hunterlige
Copy link
Author

If it can help, the only sql related commit in 1.18 is: #20241

@nameexhaustion nameexhaustion added the regression Issue introduced by a new release label Feb 5, 2025
@nameexhaustion nameexhaustion self-assigned this Feb 5, 2025
@kosiew
Copy link

kosiew commented Feb 5, 2025

@nameexhaustion ,

I might have found the fix.

@nameexhaustion
Copy link
Collaborator

Thanks @kosiew . For this particular issue, I had already made a lot of progress in a local branch.

If you are looking to contribute, I would recommend looking for accepted issues that do not yet have an assignee - this may also not be accurate so I would also leave a comment in the issue to check if anyone else is working on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql Area: Polars SQL functionality accepted Ready for implementation bug Something isn't working P-high Priority: high python Related to Python Polars regression Issue introduced by a new release
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

4 participants