Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: add missing precision overflow checking for cast_string_to_decimal #4830

Merged
merged 2 commits into from
Sep 25, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
75 changes: 68 additions & 7 deletions arrow-cast/src/cast.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2801,6 +2801,11 @@ where
if cast_options.safe {
let iter = from.iter().map(|v| {
v.and_then(|v| parse_string_to_decimal_native::<T>(v, scale as usize).ok())
.and_then(|v| {
T::validate_decimal_precision(v, precision)
.is_ok()
.then_some(v)
})
Comment on lines +2804 to +2808
Copy link
Member

@viirya viirya Sep 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that the original idea is to avoid this validation and leave the decision to the caller. But after several rounds of revamp on decimal, I'm not sure if that design idea is still kept and we want to add validation now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@viirya I noticed that casting from integers has this validation, which is inconsistent with the behavior of casting from strings.

There is an intuitive example in Datafusion.

DataFusion CLI v31.0.0
❯ select cast(1000 as decimal(10,8));
Optimizer rule 'simplify_expressions' failed
caused by
Arrow error: Invalid argument error: 100000000000 is too large to store in a Decimal128 of precision 10. Max is 9999999999

❯ select cast('1000' as decimal(10,8));
+--------------+
| Utf8("1000") |
+--------------+
| 10.00000000  |
+--------------+

If they could have a unified behavior, it would be preferable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel strongly here, casting in general does tend to check for overflow, and given it is parsing a string is unlikely to massively regress performance. I defer to you @viirya

});
// Benefit:
// 20% performance improvement
Expand All @@ -2815,13 +2820,17 @@ where
.iter()
.map(|v| {
v.map(|v| {
parse_string_to_decimal_native::<T>(v, scale as usize).map_err(|_| {
ArrowError::CastError(format!(
"Cannot cast string '{}' to value of {:?} type",
v,
T::DATA_TYPE,
))
})
parse_string_to_decimal_native::<T>(v, scale as usize)
.map_err(|_| {
ArrowError::CastError(format!(
"Cannot cast string '{}' to value of {:?} type",
v,
T::DATA_TYPE,
))
})
.and_then(|v| {
T::validate_decimal_precision(v, precision).map(|_| v)
})
})
.transpose()
})
Expand Down Expand Up @@ -8152,6 +8161,32 @@ mod tests {
);
}

#[test]
fn test_cast_string_to_decimal128_precision_overflow() {
let array = StringArray::from(vec!["1000".to_string()]);
let array = Arc::new(array) as ArrayRef;
let casted_array = cast_with_options(
&array,
&DataType::Decimal128(10, 8),
&CastOptions {
safe: true,
format_options: FormatOptions::default(),
},
);
assert!(casted_array.is_ok());
assert!(casted_array.unwrap().is_null(0));

let err = cast_with_options(
&array,
&DataType::Decimal128(10, 8),
&CastOptions {
safe: false,
format_options: FormatOptions::default(),
},
);
assert_eq!("Invalid argument error: 100000000000 is too large to store in a Decimal128 of precision 10. Max is 9999999999", err.unwrap_err().to_string());
}

#[test]
fn test_cast_utf8_to_decimal128_overflow() {
let overflow_str_array = StringArray::from(vec![
Expand Down Expand Up @@ -8209,6 +8244,32 @@ mod tests {
assert!(decimal_arr.is_null(6));
}

#[test]
fn test_cast_string_to_decimal256_precision_overflow() {
let array = StringArray::from(vec!["1000".to_string()]);
let array = Arc::new(array) as ArrayRef;
let casted_array = cast_with_options(
&array,
&DataType::Decimal256(10, 8),
&CastOptions {
safe: true,
format_options: FormatOptions::default(),
},
);
assert!(casted_array.is_ok());
assert!(casted_array.unwrap().is_null(0));

let err = cast_with_options(
&array,
&DataType::Decimal256(10, 8),
&CastOptions {
safe: false,
format_options: FormatOptions::default(),
},
);
assert_eq!("Invalid argument error: 100000000000 is too large to store in a Decimal256 of precision 10. Max is 9999999999", err.unwrap_err().to_string());
}

#[test]
fn test_cast_utf8_to_decimal256_overflow() {
let overflow_str_array = StringArray::from(vec![
Expand Down
Loading