-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement ScalarUDF
in terms of ScalarUDFImpl
trait
#8713
Conversation
ScalarUDF
in terms of ScalarUDFImpl
trait
@@ -1948,6 +1948,7 @@ mod test { | |||
); | |||
|
|||
// UDF | |||
#[derive(Debug)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an example of the API change -- any new impl of ScalarUDFImpl
must also derive Debug
-- note that ScalarUDFImpl
was introduced in #8578 and not yet released so this is not a breaking change for released versions
/// 1. For simple (less performant) use cases, use [`create_udf`] and [`simple_udf.rs`]. | ||
/// | ||
/// 2. For advanced use cases, use [`ScalarUDFImpl`] and [`advanced_udf.rs`]. | ||
/// | ||
/// # API Note |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I went through this PR, ScalarUDF
is now basically a pass through wrapper to ScalarUDFImpl
-- if we didn't want to maintain backwards compatibility we could probably simply remove the ScalarUDF
struct and make it a trait, but I think that would be super disruptive to all exisiting users of DataFusion so I think we should avoid doing so unless absolutely necessary.
3abbd23
to
1334667
Compare
@@ -292,3 +271,105 @@ pub trait ScalarUDFImpl { | |||
&[] | |||
} | |||
} | |||
|
|||
/// ScalarUDF that adds an alias to the underlying function. It is better to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is somewhat boilerplate, but it is a pretty straightforward example of using Trait objects to extend functionality
/// [`create_udf`]: crate::expr_fn::create_udf | ||
/// [`simple_udf.rs`]: https://github.com/apache/arrow-datafusion/blob/main/datafusion-examples/examples/simple_udf.rs | ||
/// [`advanced_udf.rs`]: https://github.com/apache/arrow-datafusion/blob/main/datafusion-examples/examples/advanced_udf.rs | ||
#[derive(Clone)] | ||
#[derive(Debug, Clone)] | ||
pub struct ScalarUDF { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The API for ScalarUDF is not changed at all -- only its internal implementation
{ | ||
// TODO change the internal implementation to use the trait object |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅
/// | ||
/// If you implement [`ScalarUDFImpl`] directly you should return aliases directly. | ||
pub fn with_aliases(self, aliases: impl IntoIterator<Item = &'static str>) -> Self { | ||
Self::new_from_impl(AliasedScalarUDFImpl::new(self, aliases)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is somewhat of a hack (to have a wrapper around the scalar UDF). It may make more sense to simply remove the call to with_aliases
-- however, since it was released in datafusion 34.0.0 -- https://docs.rs/datafusion/34.0.0/datafusion/physical_plan/udf/struct.ScalarUDF.html -- that would be a breaking API change.
We could deprecate the API 🤔
datafusion/expr/src/udf.rs
Outdated
} | ||
} | ||
|
||
/// Implementation of [`ScalarUDFImpl`] that wraps the function style pointers of the older API |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is basically the old implementaiton of ScalarUDF
moved into its own struct
@@ -36,7 +36,7 @@ pub fn create_physical_expr( | |||
|
|||
Ok(Arc::new(ScalarFunctionExpr::new( | |||
fun.name(), | |||
fun.fun().clone(), | |||
fun.fun(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a drive by cleanup I noticed while working on the code
1334667
to
41ec08a
Compare
Thank you for the review @crepererum |
Which issue does this PR close?
Closes #8712
Rationale for this change
ScalarUDF
is a struct of function pointers for historical reasons. After #8578,ScalarUDFImpl
is available, we can clean up its internal implementation to be in terms of that trait directly, simplifying the implementation greatly.What changes are included in this PR?
ScalarUDF
to have a singleArc<dyn ScalarUDFImpl
field+Debug
toScalarUDFImpl
which forces implementations to also derive Debug which I think is much better than the opaqueFUNC
that was there previouslyAre these changes tested?
yes
Are there any user-facing changes?
New requirement to derive
Debug
forScalarUDFImpl
but I think that is good hygene anyways