Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: address lit broadcasting and output name of right arithmetic ops #1424

Merged
merged 21 commits into from
Nov 30, 2024

Conversation

AlessandroMiola
Copy link
Contributor

@AlessandroMiola AlessandroMiola commented Nov 22, 2024

What type of PR is this? (check all applicable)

  • πŸ’Ύ Refactor
  • ✨ Feature
  • πŸ› Bug Fix
  • πŸ”§ Optimization
  • πŸ“ Documentation
  • βœ… Test
  • 🐳 Other

Related issues

Checklist

  • Code follows style guide (ruff)
  • Tests added
  • Documented the changes

If you have comments or can explain your changes, please do so below

Opening as draft to get comments/guidance on the approach. Also, (1) Dask behaviour, (2) column name resulting from right arithmetics on Series in Polars, (3) leftover binary dunder methods need more thourough exploration on my side.

Another point, this PR should solve both of the issues in principle. I combined the two fixes because I originally thought that the second (#509) could naturally be resolved as a consequence of the first, but at the end that's not really the case (at least via the approach I followed). Perhaps I should separate the two?

Key points:

#853:
- Tweak validate_column_comparand in _pandas_like so as to reindex other via the lhs series only when possible (which is what the error message was complaining about when outputting ValueError: Length mismatch: Expected axis has 3 elements, new values have 1 elements)
- Broadcast the lhs series via maybe_broadcast_scalar_into_series when relevant (i.e. when of different length wrt the previously validated/broadcasted rhs other)

#509:
- Add optional alias parameter to reuse_series_implementation and assign it to expr._output_names (basically, the approach Marco was showing in last week's livestream) and pass it all along to be able to rename the output of rarithmetic ops as "literal" without incurring into safety assertion errors.

Points that need - for sure - further exploration:

  • Dask behaviour related to both [Bug]: lit does not "broadcast" as expectedΒ #853 and [Bug]: right-hand-arithmetic output name differsΒ #509 (I had to xfail tests/frame/lit_test.py::test_lit_operation and tests/expr_and_series/arithmetic_test.py::test_right_arithmetic_expr); what happens here is that Dask's lit reasonably aligns the index with the native frame, but here a reduction takes place, which shrinks the index of the resulting frame causing a mismatch.
  • Right arithmetic operations on Series in Polars return an unnamed column. So, while
@nw.narwhalify
def func(df):
    return df.select(1 + nw.col('a'))

outputs "literal" as col name as reported in #509, 1 + pl.Series([1, 2, 3]) returns an unnamed series. This does prevent tests/expr_and_series/arithmetic_test.py::test_right_arithmetic_series from testing for the output column name being "literal". Point is that, as we're generally reusing series implementation in expressions, this might need to be taken into consideration as on other backends we're kind of imposing output name to be "literal".

  • Analyse behaviour of the binary dunder methods that were momentaneously left apart.

@github-actions github-actions bot added the fix label Nov 22, 2024
@AlessandroMiola AlessandroMiola changed the title fix: address lit broadcasting and fix: address lit broadcasting and output name of right arithmetic ops Nov 22, 2024
Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks a tonne @AlessandroMiola for starting this! I just took it a bit further, hopefully we may finally be able to resolve the linked issues 🀞

this also helped find a small bug in Polars pola-rs/polars#20071

@MarcoGorelli MarcoGorelli marked this pull request as ready for review November 30, 2024 11:22
@MarcoGorelli MarcoGorelli merged commit 943349d into narwhals-dev:main Nov 30, 2024
24 checks passed
@AlessandroMiola AlessandroMiola deleted the fix-lit-broadcasting branch November 30, 2024 12:35
self._native_series.__rpow__(extract_native(other))
)
result = self._native_series.__rpow__(extract_native(other))
if self._backend_version < (16, 1):
Copy link
Member

@FBruzzesi FBruzzesi Nov 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this should be version (1, 16, 1) ? I am fixing on another PR is that's ok

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: lit does not "broadcast" as expected [Bug]: right-hand-arithmetic output name differs
3 participants