-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/aggregates as windows #871
Feature/aggregates as windows #871
Conversation
bdbf77f
to
64d3415
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks reasonable to me
So does this PR also close #688? |
It does! Updated the description |
I had just verified with a test - if this test is not duplicative then maybe we should add it. def test_window_frame_defaults_match_postgres(ctx):
# ref: https://github.com/apache/datafusion-python/issues/688
# create a RecordBatch and a new DataFrame from it
batch = pa.RecordBatch.from_arrays(
[pa.array([1.0, 10.0, 20.0])],
names=["a"],
)
df = ctx.create_dataframe([[batch]])
window_frame = WindowFrame("rows", None, None)
col_a = column("a")
# Using `f.window` with or without an unbounded window_frame produces the same results
no_frame = f.window("avg", [col_a]).alias('no_frame')
with_frame = f.window("avg", [col_a], window_frame=window_frame).alias('with_frame')
df_1 = df.select(col_a, no_frame, with_frame)
expected = """DataFrame()
+------+--------------------+--------------------+
| a | no_frame | with_frame |
+------+--------------------+--------------------+
| 1.0 | 10.333333333333334 | 10.333333333333334 |
| 10.0 | 10.333333333333334 | 10.333333333333334 |
| 20.0 | 10.333333333333334 | 10.333333333333334 |
+------+--------------------+--------------------+
""".strip()
assert str(df_1) == expected
# When `order_by` is set, the default window should be `unbounded preceding` to `current row`
no_order = f.avg(col_a).over(Window()).alias('over_no_order')
with_order = f.avg(col_a).over(Window(order_by=[col_a])).alias('over_with_order')
df_2 = df.select(col_a, no_order, with_order)
expected = """DataFrame()
+------+--------------------+--------------------+
| a | over_no_order | over_with_order |
+------+--------------------+--------------------+
| 1.0 | 10.333333333333334 | 1.0 |
| 10.0 | 10.333333333333334 | 5.5 |
| 20.0 | 10.333333333333334 | 10.333333333333334 |
+------+--------------------+--------------------+
""".strip()
assert str(df_2) == expected |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One suggestion on an error report, but otherwise LGTM.
src/expr.rs
Outdated
), | ||
_ => Err( | ||
DataFusionError::ExecutionError(datafusion::error::DataFusionError::Plan( | ||
"Using `over` requires an aggregate function.".to_string(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Using `over` requires an aggregate function.".to_string(), | |
format!("Using {} with `over` is not allowed. Must use an aggregate or window function.", self.expr.variant_name()) |
Which issue does this PR close?
Closes #865
Closes #688
Rationale for this change
This function allows the user to turn any aggregate function into a window function. Previously using aggregates as windows was only expressed through
functions.window()
which took the aggregate's name. This only supports built in window functions or functions that have been registered with the session context. While this worked, it's a bit of an anti-pattern for people who don't need to register functions into the session context.What changes are included in this PR?
Adds
.over()
to an expression which indicates it is to be used as a window function.Are there any user-facing changes?
During testing I noticed that the default window frames for
f.window()
are set to unbounded preceeding to unbounded following. With.over()
it uses the default frames, so if ordering is set it goes from unbounded preceeding to current row. This does match user expectation.Example
Old version:
New version: