Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: rolling method #10479

Open
1 task done
deepyaman opened this issue Nov 12, 2024 · 5 comments
Open
1 task done

feat: rolling method #10479

deepyaman opened this issue Nov 12, 2024 · 5 comments
Labels
feature Features or general enhancements

Comments

@deepyaman
Copy link
Contributor

Is your feature request related to a problem?

Add syntactic sugar for rolling aggregation.

What is the motivation behind your request?

Polars (and pandas) support DataFrame.rolling(). For example:

import polars as pl

(
    pl.scan_parquet("assets.parquet")
    .filter(pl.col("symbol").is_in * (["ABBV", "XOM"]))
    .with_columns(price_rolling=pl.col("price").rolling_mean(2).over("symbol"))
    .collect()
)

The desired NULL behavior makes this more involved in Ibis (or SQL):

import ibis
from ibis import _

w = ibis.window(group_by="symbol", order_by="date", preceding=1, following=0)
ibis.read_parquet("assets.parquet").filter(_.symbol.isin(["ABBV", "XOM"])).mutate(
    price_rolling=ibis.ifelse(
        _.price.count().over(w) >= 2, _.price.mean().over(w), None
    )
)

Motivated by discussion of https://www.linkedin.com/posts/marcogorelli_rolling-mean-polars-vs-duckdb-syntax-activity-7261736840561397761-LYLh/ with @MarcoGorelli.

Describe the solution you'd like

import ibis
from ibis import _

w = ibis.window(group_by="symbol", order_by="date", preceding=1, following=0)
ibis.read_parquet("assets.parquet").filter(_.symbol.isin(["ABBV", "XOM"])).mutate(
    price_rolling=_.price.rolling_mean().over(w), None
)

or

import ibis
from ibis import _

w = ibis.rolling(group_by="symbol", order_by="date", preceding=1, following=0)
ibis.read_parquet("assets.parquet").filter(_.symbol.isin(["ABBV", "XOM"])).mutate(
    price_rolling=_.price.mean().over(w), None
)

or even

import ibis
from ibis import _

(
    ibis.read_parquet("assets.parquet")
    .filter(_.symbol.isin(["ABBV", "XOM"]))
    .group_by("symbol")
    .order_by("date")
    .mutate(price_rolling=_.price.rolling_mean(2))
)

What version of ibis are you running?

9.5.0

What backend(s) are you using, if any?

DuckDB

Code of Conduct

  • I agree to follow this project's Code of Conduct
@deepyaman deepyaman added the feature Features or general enhancements label Nov 12, 2024
@deepyaman
Copy link
Contributor Author

Related to #10091, but this is actually suggesting adding the syntactic sugar. Another alternative would be to document this.

@cpcloud
Copy link
Member

cpcloud commented Nov 12, 2024

It seems like the major difference here is the setting of a lower bound for the minimum number of observations required before emitting an aggregated value.

I don't think rolling necessarily implies that, that's a particular choice by polars to behave that way.

What if I want the Ibis behavior where I start emitting values as soon as I have at least one non-null value? How would I write that with polars?

@MarcoGorelli
Copy link
Contributor

hey - with min_periods (which defaults to the window size)

@cpcloud
Copy link
Member

cpcloud commented Nov 12, 2024

Hm, I wonder if an analogous argument could be passed to ibis.window and wherever else we support window arguments.

I don't think min_periods is appropriate as a name, because it assigns a very specific meaning to rows. Something like min_obs (meaning minimum observations) seem more appropriate.

@deepyaman
Copy link
Contributor Author

hey - with min_periods (which defaults to the window size)

When I looked into this last night, min_periods is also the name used in the Spark pandas API: https://spark.apache.org/docs/latest/api/python/reference/pyspark.pandas/api/pyspark.pandas.DataFrame.rolling.html

But min_observations is also fine. 🤷

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Features or general enhancements
Projects
Status: backlog
Development

No branches or pull requests

3 participants