-
Notifications
You must be signed in to change notification settings - Fork 169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Support Ansi mode in abs function #500
Conversation
@@ -1474,15 +1481,14 @@ object QueryPlanSerde extends Logging with ShimQueryPlanSerde with CometExprShim | |||
None | |||
} | |||
|
|||
case Abs(child, _) => | |||
case Abs(child, failOnErr) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since we are already using failOnErr, can we simply use this boolean instead of evalmode struct? what are you thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, thanks for your review! The intention why it is done this way is that in the Rust code the ansi mode is always treated the same way whether the expression supports the three ansi modes or only two. If not, we have to do a different treatment for both cases.
@@ -850,6 +850,34 @@ class CometExpressionSuite extends CometTestBase with AdaptiveSparkPlanHelper { | |||
} | |||
} | |||
|
|||
test("abs Overflow ansi mode") { | |||
val data: Seq[(Int, Int)] = Seq((Int.MaxValue, Int.MinValue)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have tests for all numerical values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done!
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #500 +/- ##
============================================
+ Coverage 34.05% 34.08% +0.02%
+ Complexity 859 812 -47
============================================
Files 116 105 -11
Lines 38679 38516 -163
Branches 8567 8555 -12
============================================
- Hits 13173 13127 -46
+ Misses 22745 22644 -101
+ Partials 2761 2745 -16 ☔ View full report in Codecov by Sentry. |
match self.inner_abs_func.invoke(args) { | ||
Ok(result) => Ok(result), | ||
Err(DataFusionError::ArrowError(ArrowError::ComputeError(msg), trace)) | ||
if msg.contains("overflow") => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice if Arrow/DataFusion threw a specific overflow error so that we didn't have to look for a string within the error message, but I guess that isn't available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am going to file a feature request in DataFusion. I will post the link here later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! I understand that we have to wait for the next version of arror-rs to be released and integrate it here to be able to make the changes, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can continue with this and do a follow up once the arrow-rs change is available. Or you can wait; the arrow-rs community is very responsive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably better to do it in a follow-up PR, I think. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would help to log an issue to keep track
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apparently my PR in Arrow will not be available for 3 months because it is an API change 😞
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we upgrade to the version of the arrow with the improved overflow eror reporing then the tests in this PR will fail (because we are looking for ComputeError
but instead will get ArithmeticOverflow
) so I don't think we need to file an issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@parthchandra fyi ☝️
|
||
fn invoke(&self, args: &[ColumnarValue]) -> Result<ColumnarValue, DataFusionError> { | ||
match self.inner_abs_func.invoke(args) { | ||
Ok(result) => Ok(result), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically there is no need to match on an Ok
result because there is already the catch all other
handling.
Ok(result) => Ok(result), |
fn arithmetic_overflow_error(from_type: &str) -> CometError { | ||
CometError::ArithmeticOverflow { | ||
from_type: from_type.to_string(), | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to duplicate the same function from negative.rs
. Perhaps that one could be moved so that it can be reused here.
let eval_mode = match spark_expression::EvalMode::try_from(expr.eval_mode)? { | ||
spark_expression::EvalMode::Legacy => EvalMode::Legacy, | ||
spark_expression::EvalMode::Ansi => EvalMode::Ansi, | ||
spark_expression::EvalMode::Try => { | ||
return Err(ExecutionError::GeneralError( | ||
"Invalid EvalMode: \"TRY\"".to_string(), | ||
)) | ||
} | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that we have this code block duplicated for abs
and cast
now. Perhaps this could be extracted into a function. Another option would be to implement TryFrom
or TryInto
for this conversion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @planga82. This is looking good. I left some feedback on reducing code duplication that would be good to fix before merging.
I have done a refactor with all the comments, thanks for the revision! |
The problems in the tests seem to be because the branch was not aligned with main. By aligning it, the tests have passed in my repo. |
spark_expression::EvalMode::Try => EvalMode::Try, | ||
spark_expression::EvalMode::Ansi => EvalMode::Ansi, | ||
}; | ||
let eval_mode = EvalMode::try_from(expr.eval_mode)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be more idiomatic to use try_into
here.
let eval_mode = EvalMode::try_from(expr.eval_mode)?; | |
let eval_mode = expr.eval_mode.try_into()?; |
@@ -499,7 +497,12 @@ impl PhysicalPlanner { | |||
let child = self.create_expr(expr.child.as_ref().unwrap(), input_schema.clone())?; | |||
let return_type = child.data_type(&input_schema)?; | |||
let args = vec![child]; | |||
let scalar_def = ScalarFunctionDefinition::UDF(math::abs()); | |||
let eval_mode = EvalMode::try_from(expr.eval_mode)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be more idiomatic to use try_into
here.
let eval_mode = EvalMode::try_from(expr.eval_mode)?; | |
let eval_mode = expr.eval_mode.try_into()?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again @planga82. I left one more nit, but will be happy to merge after that unless @parthchandra has more feedback
Thanks!! I am a rust beginner, I appreciate any comments! |
* change proto msg * QueryPlanSerde with eval mode * Move eval mode * Add abs in planner * CometAbsFunc wrapper * Add error management * Add tests * Add license * spotless apply * format * Fix clippy * error msg for all spark versions * Fix benches * Use enum to ansi mode * Fix format * Add more tests * Format * Refactor * refactor * fix merge * fix merge
Which issue does this PR close?
Closes #464 .
Rationale for this change
This PR adds support for ansi mode in the abs function. This is done by adding a wrapper to the abs datafusion function to add the different behavior between Spark and Datafusion. The main differences are the overflow behavior in legacy mode and the message exception in ansi mode.
What changes are included in this PR?
In addition to introducing the wrapper, some minor refactoring is done to move some code to a more general location.
In Spark, the abs function does not support try execution mode, only ansi or legacy.
How are these changes tested?
The new tests test the correct execution in the event of an overflow.