-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[built-in function] add greatest(T,...)
and least(T,...)
SQL functions
#6527
Conversation
5214393
to
b6f9f35
Compare
))) | ||
} else { | ||
Err(DataFusionError::NotImplemented(format!( | ||
"{:?} expressions are not implemented for arrays yet as we need to update arrow kernels", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alamb is there already a max/min kernel in arrow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe track in this issue:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is min
like https://docs.rs/arrow/latest/arrow/compute/fn.min.html but I think it works only for primitive arrays
Perhaps you can use the existing MinAccumulator / Max Accumulators?
https://docs.rs/datafusion/latest/datafusion/physical_plan/expressions/struct.MinAccumulator.html
4a2c338
to
538e085
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! Learned a lot from that 👍🏼
But there seem some problems with the current signature definitions
https://github.com/apache/arrow-datafusion/blob/815413c4a4b473f996ccaa7deb650653430a5aba/datafusion/expr/src/signature.rs#L41-L50
This seems to be the first function that implemented variadic_equal
signature, the current implementation said it's
arbitrary number of arguments of an arbitrary but equal type
but the arguments should also be from some common type list for this case (greatest()/least()
).
So it's better to change TypeSignature
to:
pub enum TypeSignature {
/// arbitrary number of arguments of an common type out of a list of valid types
// A function such as `concat` is `Variadic(vec![DataType::Utf8, DataType::LargeUtf8])`
Variadic(Vec<DataType>),
/// arbitrary number of arguments of an common type out of a list of valid types, all arguments have the same type
VariadicEqual(Vec<DataType>),
/// arbitrary number of arguments of an arbitrary but equal type
// A function such as `array` is `VariadicEqual`
// The first argument decides the type used for coercion
VariadicEqualAny,
/// arbitrary number of arguments with arbitrary types
VariadicAny,
...
And then the planner can better catch invalid function calls like this one
DataFusion CLI v25.0.0
❯ select least(interval '1 day', interval '2 day', interval '3 day');
thread 'main' panicked at 'Unsupported data type for comparison: Interval(MonthDayNano)', /Users/yongting/Desktop/code/arrow-datafusion/datafusion/physical-expr/src/comparison_expressions.rs:74:14
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
datafusion/expr/src/function.rs
Outdated
@@ -376,6 +381,9 @@ pub fn signature(fun: &BuiltinScalarFunction) -> Signature { | |||
BuiltinScalarFunction::Chr | BuiltinScalarFunction::ToHex => { | |||
Signature::uniform(1, vec![DataType::Int64], fun.volatility()) | |||
} | |||
BuiltinScalarFunction::Greatest | BuiltinScalarFunction::Least => { | |||
Signature::variadic_equal(fun.volatility()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be Signature::variadic_equal(SUPPORTED_COMPARISON_TYPES, fun.volatility())
85bda83
to
30b575a
Compare
1b4e6c8
to
74fb1fa
Compare
greatest(T,...)
and least(T,...)
SQL functions
8f68126
to
15bc806
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
))) | ||
} else { | ||
Err(DataFusionError::NotImplemented(format!( | ||
"{:?} expressions are not implemented for arrays yet as we need to update arrow kernels", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is min
like https://docs.rs/arrow/latest/arrow/compute/fn.min.html but I think it works only for primitive arrays
Perhaps you can use the existing MinAccumulator / Max Accumulators?
https://docs.rs/datafusion/latest/datafusion/physical_plan/expressions/struct.MinAccumulator.html
// A function such as `array` is `VariadicEqual` | ||
// The first argument decides the type used for coercion | ||
VariadicEqual, | ||
VariadicEqual(Vec<DataType>), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the types are all equal, what is the purpose of storing a Vec of them? or maybe the Vec is the list of valid types?
It also looks like the code special cases when there are no types specified to mean "any type is allowed" -- can we please explicitly mention that in the docs as well? Or maybe add a new variant (VariadicEqualSpecific
?)
let is_error = get_valid_types( | ||
&signature, | ||
&[DataType::Int32, DataType::Boolean, DataType::Int32], | ||
) | ||
.is_err(); | ||
assert!(is_error); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another way to test for error is doing:
let is_error = get_valid_types( | |
&signature, | |
&[DataType::Int32, DataType::Boolean, DataType::Int32], | |
) | |
.is_err(); | |
assert!(is_error); | |
let is_error = get_valid_types( | |
&signature, | |
&[DataType::Int32, DataType::Boolean, DataType::Int32], | |
) | |
.uwrap_err(); |
Which is both more concise, but I think also prints out what the OK
value is when the result is not an Error
} | ||
|
||
/// Evaluate a greatest or least function | ||
fn compare(op: ComparisonOperator, args: &[ColumnarValue]) -> Result<ColumnarValue> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this functionality is basically the same as the Min/Max accumulators (e.g. https://docs.rs/datafusion/latest/datafusion/physical_plan/expressions/struct.MinAccumulator.html)
} | ||
|
||
#[tokio::test] | ||
async fn test_comparison_func_array_scalar_expression() -> Result<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we please write this test using sqllogictest
instead of a new rs test? I think you'll find it quite nice for sql tests and it is easier to maintain and extend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just checked the implementation of Variadic
, it's actually the VariadicEqual
we are trying to implement:
If a function is given a signature least(Variadic[Int32/Int64]
, it will try if can coerce to any of least(Int32, Int32, ...) or least(Int64, Int64, ...)
https://github.com/apache/arrow-datafusion/blob/a7970ebf6ef5181fe78de5db63dcf05760db5ace/datafusion/expr/src/type_coercion/functions.rs#L66-L69
But its name and doc are a bit misleading, which makes me ignore that previously and tried to use a new variant, maybe we can also make it more obvious.
@@ -376,6 +381,12 @@ pub fn signature(fun: &BuiltinScalarFunction) -> Signature { | |||
BuiltinScalarFunction::Chr | BuiltinScalarFunction::ToHex => { | |||
Signature::uniform(1, vec![DataType::Int64], fun.volatility()) | |||
} | |||
BuiltinScalarFunction::Greatest | BuiltinScalarFunction::Least => { | |||
Signature::variadic_equal( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried Signature::variadic(
and it just work as expected
❯ select least(1,2,3);
+-----------------------------------+
| least(Int64(1),Int64(2),Int64(3)) |
+-----------------------------------+
| 1 |
+-----------------------------------+
1 row in set. Query took 0.077 seconds.
❯ select least(1,2.0,3);
+-------------------------------------+
| least(Int64(1),Float64(2),Int64(3)) |
+-------------------------------------+
| 1.0 |
+-------------------------------------+
1 row in set. Query took 0.009 seconds.
❯ select least(interval '1 day', interval '2 day', interval '3 day');
+-----------------------------------------------------------------------------------------------------------------------------------------------+
| least(IntervalMonthDayNano("18446744073709551616"),IntervalMonthDayNano("36893488147419103232"),IntervalMonthDayNano("55340232221128654848")) |
+-----------------------------------------------------------------------------------------------------------------------------------------------+
| 0 years 0 mons 1 days 0 hours 0 mins 0.000000000 secs |
+-----------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set. Query took 0.008 seconds.
Marking as draft as this PR has feedback and is awaiting a response (I am trying to keep the list of PRs needing review clear) -- please mark as ready for review when ready |
thank you @alamb my bandwidth recently is very limited, so I am going to pause this PR for a bit, if anyone wants to take over feel free to. I can also do a merge as is (after rebase fix) and create issues as follow ups? |
Whatever you prefer -- I defer to your judgement |
Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days. |
Which issue does this PR close?
greatest(T,...)
andleast(T,...)
SQL functions #6531Rationale for this change
add
greatest(T,..)
andleast(T,..)
variadic functions as per SQL 2023.What changes are included in this PR?
greatest(T,..)
andleast(T,..)
variadic functions as per SQL 2023.zip
andord
kernels are usedAre these changes tested?
Are there any user-facing changes?