Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent by meaning in rolling_* and group_by_rolling - rename? #10989

Closed
MarcoGorelli opened this issue Sep 8, 2023 · 7 comments
Closed
Labels
A-api Area: changes to the public API A-temporal Area: date/time functionality

Comments

@MarcoGorelli
Copy link
Collaborator

MarcoGorelli commented Sep 8, 2023

In group_by_rolling, by tells you which column to group by

But in rolling_* (e..g rolling_mean), by tells you what the index column is

Perhaps it should be renamed to index_column for consistency with group_by_rolling?


I'm not saying that by needs aligning everywhere - for example, by in sort means something completely different, and that's OK. But I think people would expect consistency between rolling_* and group_by_rolling

@MarcoGorelli MarcoGorelli added the A-temporal Area: date/time functionality label Sep 8, 2023
@MarcoGorelli
Copy link
Collaborator Author

@orlp
Copy link
Collaborator

orlp commented Sep 8, 2023

Honestly, I think Expr.rolling_* should be separated into two different function variants, rolling_* and rolling_by_* with only the latter supporting duration windows, as they really do different things.

In general, I've mentioned this before, I think rolling should work like over does. So instead of
x.rolling_sum(...) you would write x.sum().rolling(...). This gets rid of a whole bunch of functions.
Then we would also have x.sum().rolling_by(...) that supports windowing by another column, or a time period.

@MarcoGorelli
Copy link
Collaborator Author

hmm yeah I do like the sound of that

@stinodego stinodego added the A-api Area: changes to the public API label Feb 8, 2024
@MarcoGorelli
Copy link
Collaborator Author

I'm just revisiting this, and there's generally two usages of by:

When it's the column you apply the operation according to:

  • DataFrame.sort: sort by by
  • DataFrame.top_k: find top k according to column by
  • Partition_by: partition by by
  • join_asof: join by by before doing asof join

When it means "group by these columns before performing the operation":

  • DataFrame.rolling
  • DataFrame.upsample
  • DataFrame.group_by_dynamic

I find the contrast between DataFrame.top_k and DataFrame.upsample particularly surprising

👍 Agree with Expr.rolling_mean(..., by=...) becoming Expr.rolling_mean_by(..., by=...)

But, in addition to / independently of that, could by in DataFrame.rolling, DataFrame.upsample, DataFrame.group_by_rolling be renamed to group_by?

@MarcoGorelli
Copy link
Collaborator Author

per discussion: OK to rename by to group_by in these cases (as per #14840)

@stinodego
Copy link
Contributor

@MarcoGorelli Can this be closed now that #14840 is merged?

@MarcoGorelli
Copy link
Collaborator Author

I'd say so, yes, thanks for your reviews!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-api Area: changes to the public API A-temporal Area: date/time functionality
Projects
None yet
Development

No branches or pull requests

3 participants