-
Notifications
You must be signed in to change notification settings - Fork 824
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: add missing precision overflow checking for cast_string_to_decimal
#4830
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2801,6 +2801,11 @@ where | |
if cast_options.safe { | ||
let iter = from.iter().map(|v| { | ||
v.and_then(|v| parse_string_to_decimal_native::<T>(v, scale as usize).ok()) | ||
.and_then(|v| { | ||
T::validate_decimal_precision(v, precision) | ||
.is_ok() | ||
.then_some(v) | ||
}) | ||
}); | ||
// Benefit: | ||
// 20% performance improvement | ||
|
@@ -2815,13 +2820,17 @@ where | |
.iter() | ||
.map(|v| { | ||
v.map(|v| { | ||
parse_string_to_decimal_native::<T>(v, scale as usize).map_err(|_| { | ||
ArrowError::CastError(format!( | ||
"Cannot cast string '{}' to value of {:?} type", | ||
v, | ||
T::DATA_TYPE, | ||
)) | ||
}) | ||
parse_string_to_decimal_native::<T>(v, scale as usize) | ||
.map_err(|_| { | ||
ArrowError::CastError(format!( | ||
"Cannot cast string '{}' to value of {:?} type", | ||
v, | ||
T::DATA_TYPE, | ||
)) | ||
}) | ||
.and_then(|v| { | ||
T::validate_decimal_precision(v, precision).map(|_| v) | ||
}) | ||
}) | ||
.transpose() | ||
}) | ||
|
@@ -8097,6 +8106,32 @@ mod tests { | |
test_cast_string_to_decimal(array); | ||
} | ||
|
||
#[test] | ||
fn test_cast_string_to_decimal128_precision_overflow() { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. cast_string_to_decimal is not only for decimal128 but also for decimal256, do you want to add a test for decimal256 too? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @viirya I think it is necessary, and a new test has been added for decimal256. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thank you. |
||
let array = StringArray::from(vec!["1000".to_string()]); | ||
let array = Arc::new(array) as ArrayRef; | ||
let casted_array = cast_with_options( | ||
&array, | ||
&DataType::Decimal128(10, 8), | ||
&CastOptions { | ||
safe: true, | ||
format_options: FormatOptions::default(), | ||
}, | ||
); | ||
assert!(casted_array.is_ok()); | ||
assert!(casted_array.unwrap().is_null(0)); | ||
|
||
let err = cast_with_options( | ||
&array, | ||
&DataType::Decimal128(10, 8), | ||
&CastOptions { | ||
safe: false, | ||
format_options: FormatOptions::default(), | ||
}, | ||
); | ||
assert_eq!("Invalid argument error: 100000000000 is too large to store in a Decimal128 of precision 10. Max is 9999999999", err.unwrap_err().to_string()); | ||
} | ||
|
||
#[test] | ||
fn test_cast_invalid_utf8_to_decimal() { | ||
let str_array = StringArray::from(vec!["4.4.5", ". 0.123"]); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that the original idea is to avoid this validation and leave the decision to the caller. But after several rounds of revamp on decimal, I'm not sure if that design idea is still kept and we want to add validation now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@viirya I noticed that casting from integers has this validation, which is inconsistent with the behavior of casting from strings.
There is an intuitive example in Datafusion.
If they could have a unified behavior, it would be preferable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't feel strongly here, casting in general does tend to check for overflow, and given it is parsing a string is unlikely to massively regress performance. I defer to you @viirya