-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consistent API to set parameters of aggregate and window functions (AggregateExt
--> ExprFunctionExt
)
#11550
Consistent API to set parameters of aggregate and window functions (AggregateExt
--> ExprFunctionExt
)
#11550
Conversation
I don't think I have permission on this repo to add labels to this PR so I wasn't able to add one for having user facing changes |
Thanks @timsaucer -- I added the |
Actually maybe hold off on reviewing. I have an idea for the morning that might make it simpler |
marking as draft as it sounds like @timsaucer has some additional ideas |
Copying @jayzhan211 's comment from
My guess is that it has to do with all of the other functions returning The reasons I can see sticking with a builder function instead of updating the
The reasons I can see preferring to work directly with the
Laying it out like this, I can see a good argument for sticking with the builder function if that's the way you two (and others) would like to go. |
I guess the chain of But I would like to optimize for |
I think another reason was that it might be easier to understand what APIs could be called wnen for example, if we created
I agree with this (though I do think if we change the API we should try to avoid any more churn than necessary). I would tend to favor @timsaucer 's opinion on what is easier / more intuitive. I think I am too close to DataFusion / Rust these days to really know what is easier from a "someone who is new to the datafusion codebase" perspective |
I looked at #11582 and after reading the rest of the commentary on this PR I think this PR's proposal is best approach Thus I suggest:
Backwards compatibilty: basically give a deprecation message that tells users how to upgrade For example, something like // Compatibility: ensure existing users of AggregateExt get a message about ExprFunctionExt
// Note the syntax isn't quite right for the deprecated macro
#[deprecated(version = 40, message=:"ExprFunctionExt instead")]
trait AggregateExt: ExprFunctionExt {} |
datafusion/expr/src/expr_fn.rs
Outdated
@@ -676,6 +679,266 @@ pub fn interval_month_day_nano_lit(value: &str) -> Expr { | |||
Expr::Literal(ScalarValue::IntervalMonthDayNano(interval)) | |||
} | |||
|
|||
/// Extensions for configuring [`Expr::AggregateFunction`] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could perhaps update this comment to refer to Window functions as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
/// .build()?; | ||
/// # Ok(()) | ||
/// # } | ||
/// ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would also be awesome to add an example here for a window function too (with partition by)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
Thanks @timsaucer and @jayzhan211 -- this is looking pretty sweet |
/// See [`ExprFunctionExt`] for usage and examples | ||
#[derive(Debug, Clone)] | ||
pub struct ExprFuncBuilder { | ||
fun: Option<ExprFuncKind>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess fun is the must have
field, option is not neccessary. fun: ExprFuncKind
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we are sticking with the builder approach, don't we need this in case someone attempts to create a builder from an Expr that is not aggregate or window?
col("a").order_by(vec![col("b")])
This is a wrong use but won't be caught until we call .build()
so up to that point don't we need the fun
field to be None?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we are sticking with the builder approach, don't we need this in case someone attempts to create a builder from an Expr that is not aggregate or window?
col("a").order_by(vec![col("b")])
This is a wrong use but won't be caught until we call
.build()
so up to that point don't we need thefun
field to be None?
Makes sense, it is for incorrect usage
Question: it looks like we have some functions that have both aggregate functions and window functions defined. Specifically looking at Right now I have My thinking is that we should stick with this PR for now but open an issue to converge all of these. I would expect the following psuedocode to be equivalent
Please correct me if I'm missing something here. |
From ^ comment I think things like the helper function |
I hadn't look into the detail of window_first implementation, but I guess we could reuse the
I think these two are not equivalent. Only the function TLDR, function is the same but expr is different I guess things get easier after udwf is introduced #8709. |
I removed |
…ttings for window functions
…guments. Other parameters will be set via the ExprFuncBuilder
…e cleanliness and not strictly required
…g ExprFunctionExt trait so we can guarantee a consistent user experience no matter which they call on the Expr and which on the builder
180e9d3
to
532f262
Compare
/// Window function | ||
/// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
datafusion/expr/src/lib.rs
Outdated
@@ -16,6 +16,8 @@ | |||
// under the License. | |||
// Make cheap clones clear: https://github.com/apache/datafusion/issues/11143 | |||
#![deny(clippy::clone_on_ref_ptr)] | |||
// TODO When the deprecated trait AggregateExt is removed, remove this unstable feature. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like this feature requires nightly rust, which isn't possible for DataFusion I don't think. I'll push a commit to this PR that removes the attempt at backwards compatibility and just updated the description
AggregateExt
add ExprFunctionExt
)
AggregateExt
add ExprFunctionExt
)AggregateExt
add ExprFunctionExt
)
AggregateExt
add ExprFunctionExt
)AggregateExt
add ExprFunctionExt
)
vec![col("time").sort(true, true)], // ORDER BY time ASC | ||
WindowFrame::new(None), | ||
); | ||
let window_expr = smooth_it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is so much nicer
I am so excited for this PR -- thank you @timsaucer -- I pushed a few more commits to improve the docs a bit more and get the CI passing. I think this is a really nice improvement to the usability of DataFusion |
AggregateExt
add ExprFunctionExt
)AggregateExt
add ExprFunctionExt
)
AggregateExt
add ExprFunctionExt
)AggregateExt
add ExprFunctionExt
)
AggregateExt
add ExprFunctionExt
)AggregateExt
--> ExprFunctionExt
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is looking great -- thank you @timsaucer and @jayzhan211
Looks great! Thanks @timsaucer and @alamb |
Also removed `sqlparser` dependency since it's re-exported upstream. Ref: apache/datafusion#11550
Also removed `sqlparser` dependency since it's re-exported upstream. Ref: apache/datafusion#11550
* update datafusion deps to point to githuc.com/apache/datafusion Datafusion 41 is not yet released on crates.io. * update TableProvider::scan Ref: apache/datafusion#11516 * use SessionStateBuilder The old constructor is deprecated. Ref: apache/datafusion#11403 * update AggregateFunction Upstream Changes: - The field name was switched from `func_name` to func. - AggregateFunctionDefinition was removed Ref: apache/datafusion#11803 * update imports in catalog Catlog API was extracted to a separate crate. Ref: apache/datafusion#11516 * use appropriate path for approx_distinct Ref: apache/datafusion#11644 * migrate AggregateExt to ExprFunctionExt Also removed `sqlparser` dependency since it's re-exported upstream. Ref: apache/datafusion#11550 * update regr_count tests for new return type Ref: apache/datafusion#11731 * migrate from function-array to functions-nested The package was renamed upstream. Ref: apache/datafusion#11602 * cargo fmt * lock datafusion deps to 41 * remove todo from cargo.toml All the datafusion dependencies are re-exported, but I still need to figure out *why*.
Which issue does this PR close?
Closes #6747.
Rationale for this change
There was excellent work done on #10560 that makes a much more easy to use interface for developers to set parameters on aggregate functions. This PR is designed to bring the same interface to window functions.
Currently it can be very clunky to write a window function for a relatively simple query. For example to create a lag function you would do:
With this PR it becomes a simpler
Even more powerful is that you can now convert window function expressions into their builder to set additional properties
What changes are included in this PR?
This PR refactors
AggregateExt
to beExprFunctionExt
and adds in additional functions forpartition_by
andwindow_frame
. It also adds in a variety of useful helper functions for things likelead
,lag
,row_number
etc.Are these changes tested?
All existing CI tests pass. Some were updated to use the new interface, but all of the functionality remains the same.
Are there any user-facing changes?
WindowFunction::new
now only requires two parameters, the function definition and arguments. Other parameters will need to be set using the builder methods provided. For example, if the user was previously callingThey would change this to