-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add strict
parameter to pl.concat(how='horizontal')
#20019
base: main
Are you sure you want to change the base?
Conversation
…d added a corresponding unit test
Heya, thank you for the PR. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #20019 +/- ##
=======================================
Coverage 79.52% 79.53%
=======================================
Files 1563 1563
Lines 217104 217121 +17
Branches 2464 2464
=======================================
+ Hits 172659 172690 +31
+ Misses 43885 43871 -14
Partials 560 560 ☔ View full report in Codecov by Sentry. |
py-polars/polars/functions/eager.py
Outdated
@@ -231,6 +240,14 @@ def concat( | |||
) | |||
) | |||
elif how == "horizontal": | |||
if strict: | |||
nrows = first.select(F.len()).collect()[0, 0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason this should be implemented on the rust side is that this collect
here could trigger a massive computation if the query plan is complex, which then gets tossed. The check should be performed when the concatenation operation is actually applied.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood. When I initially thought about it, I failed to take into account how I would compare the number of rows on Lazyframes.
@mcrumiller @coastalwhite Unfortunately, my machine is not capable of running |
@nimit I'm not a repo member, I just lurk here a lot, but I can try to help you get things working--what's the issue with running |
Thanks for your help! |
… single element and ddof=1 and there are nulls elsewhere in the Series (pola-rs#20077)
…dding a 'strict' keyword argument to concat_df(how='horizontal')
Finally, I think rename the issue to:
Cheers! |
strict
parameter to pl.concat(how='horizontal')
Thank you very much @mcrumiller for your help throughout my first open-source PR. |
I'm not sure why the test is failing, it passes on my end. |
Yeah, I am confused as well. I rely on the actions to test it |
@nimit it's failing on the new streaming engine: ~/projects/polars/py-polars$ export POLARS_AUTO_NEW_STREAMING=1
~/projects/polars/py-polars$ pytest /home/mcrumiller/projects/polars/py-polars/tests/unit/functions/test_concat.py
=========================================================================================================================================================== test session starts ===========================================================================================================================================================
platform linux -- Python 3.12.6, pytest-8.3.2, pluggy-1.5.0
codspeed: 3.0.0 (disabled, mode: walltime, timer_resolution: 1.0ns)
rootdir: /home/mcrumiller/projects/polars/py-polars
configfile: pyproject.toml
plugins: cov-6.0.0, codspeed-3.0.0, hypothesis-6.119.4, xdist-3.6.1
collected 4 items / 2 deselected / 2 selected
tests/unit/functions/test_concat.py F. [100%]
================================================================================================================================================================ FAILURES =================================================================================================================================================================
_____________________________________________________________________________________________________________________________________________________ test_concat_horizontally_strict _____________________________________________________________________________________________________________________________________________________
tests/unit/functions/test_concat.py:32: in test_concat_horizontally_strict
with pytest.raises(pl.exceptions.ShapeError):
E Failed: DID NOT RAISE <class 'polars.exceptions.ShapeError'>
========================================================================================================================================================= short test summary info =========================================================================================================================================================
FAILED tests/unit/functions/test_concat.py::test_concat_horizontally_strict - Failed: DID NOT RAISE <class 'polars.exceptions.ShapeError'>
================================================================================================================================================ 1 failed, 1 passed, 2 deselected in 0.13s ================================================================================================================================================ I'll look into it. |
@nimit can you set this PR to draft until we can get this working? |
I'm not familiar at all with the new streaming engine. After taking a look, it looks like there is a parameter in there called We need to figure out how to propagate this parameter. Two places to start are |
PR that closes #19133
Made changes to the python package so that if how='horizontal', the number of rows in the first element are checked with the rest of the elements for both: lazy and eager DataFrames.
strict is set to False by default
Also added unit tests for the changes for cases: