Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enh]: Support polars.Expr.rank #1323

Open
adamblake opened this issue Nov 5, 2024 · 1 comment
Open

[Enh]: Support polars.Expr.rank #1323

adamblake opened this issue Nov 5, 2024 · 1 comment
Labels
accepted enhancement New feature or request

Comments

@adamblake
Copy link

adamblake commented Nov 5, 2024

We would like to learn about your use case. For example, if this feature is needed to adopt Narwhals in an open source project, could you please enter the link to it below?

I am abstracting a library for computing teaching metrics so that researchers can use their data processing library of choice. Narwhals seems like a good bet (also shout-out to @mikeckennedy for having you on the podcast!). I can't share the specific repository because it contains internal scripts, but this would be supporting CourseKata, a low-cost textbook platform dedicated to continuous improvement based on learning science principles.

Please describe the purpose of the new feature or describe the problem to solve.

I would like support for the polars.Expr.rank method. One example of how it could be used is to count how often an instructor teaches, given some grouping variable (window). In Polars it might look like this:

df.sort("academic_year").with_columns(
  years_taught=pl.col("academic_year")
    .rank(method="dense")
    .over("instructor_id")
)

This would window over instructor_id and get the rank by academic_year. Essentially, we will get a count of how many academic years an instructor has taught in, and because we are using the "dense" ranking, teaching multiple classes in a year counts as a single year taught.

Suggest a solution if possible.

No response

If you have tried alternatives, please describe them below.

I could probably achieve this by making an intermediate data frame where I filter down academic_year using unique(), and then make some kind of counter variable based on instructor_id, and then join() that back to the initial table.

Instead I would rather just go back to using Polars until this feature is supported (if it is on your roadmap!).

Additional information that may help us understand your needs.

No response

@FBruzzesi FBruzzesi added the enhancement New feature or request label Nov 5, 2024
@FBruzzesi
Copy link
Member

FBruzzesi commented Nov 5, 2024

Hey @adamblake , thanks for the feature request. This is definitly in scope 👌 we are currently finalizing an integration, but we will get soon back to expanding the API 😁

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants