Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

coerce_primitive not honored when decoding from serde object #5095

Closed
jonmmease opened this issue Nov 18, 2023 · 2 comments · Fixed by #5101
Closed

coerce_primitive not honored when decoding from serde object #5095

jonmmease opened this issue Nov 18, 2023 · 2 comments · Fixed by #5101
Labels
arrow Changes to the arrow crate bug help wanted

Comments

@jonmmease
Copy link
Contributor

Describe the bug
It looks like the coerce_primitive configuration option on ReaderBuilder is not honored when decoding from serde-compatible objects like serde_json::Value.

To Reproduce
Code to add a test to arrow-json/src/reader/mod.rs that triggers the error

    #[test]
    fn test_coercing_primitive_into_string_decoder() {
        let buf = r#"[
            {"a": 1, "b": "A", "c": "T"},
            {"a": 2, "b": "BB", "c": "F"},
            {"a": 3, "b": 123, "c": false}
        ]"#;
        let schema = Schema::new(vec![
            Field::new("a", DataType::Float64, true),
            Field::new("b", DataType::Utf8, true),
            Field::new("c", DataType::Utf8, true),
        ]);
        let json_array: Vec<serde_json::Value> = serde_json::from_str(buf).unwrap();

        let schema_ref = Arc::new(schema);

        // read record batches
        let reader =
            ReaderBuilder::new(schema_ref.clone()).with_coerce_primitive(true);
        let mut decoder = reader.build_decoder().unwrap();
        decoder.serialize(json_array.as_slice()).unwrap();
        let batch = decoder.flush().unwrap().unwrap();
        println!("{:?}", batch);
    }
called `Result::unwrap()` on an `Err` value: JsonError("whilst decoding field 'b': expected string got 123")
thread 'reader::tests::test_coercing_primitive_into_string_decoder' panicked at 'called `Result::unwrap()` on an `Err` value: JsonError("whilst decoding field 'b': expected string got 123")', arrow-json/src/reader/mod.rs:2284:37
stack backtrace:
   0: rust_begin_unwind
             at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/panicking.rs:593:5
   1: core::panicking::panic_fmt
             at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/core/src/panicking.rs:67:14
   2: core::result::unwrap_failed
             at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/core/src/result.rs:1651:5
   3: core::result::Result<T,E>::unwrap
             at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/core/src/result.rs:1076:23
   4: arrow_json::reader::tests::test_coercing_primitive_into_string_decoder
             at ./src/reader/mod.rs:2284:21
   5: arrow_json::reader::tests::test_coercing_primitive_into_string_decoder::{{closure}}
             at ./src/reader/mod.rs:2264:54
   6: core::ops::function::FnOnce::call_once
             at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/core/src/ops/function.rs:250:5
   7: core::ops::function::FnOnce::call_once
             at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/core/src/ops/function.rs:250:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

Expected behavior
I expect the value 123 in column b to be converted to a string.

Additional context
This worked in version 47. I hit the error when updating to DataFusion 33.0.0 with Arrow 48.0.1, but have also confirmed it on main.

My hunch is that this was introduced with #4861, and that some additional logic is needed around here

fn decode(&mut self, tape: &Tape<'_>, pos: &[u32]) -> Result<ArrayData, ArrowError> {
let coerce_primitive = self.coerce_primitive;
let mut data_capacity = 0;
for p in pos {
match tape.get(*p) {
TapeElement::String(idx) => {
data_capacity += tape.get_string(idx).len();
}
TapeElement::Null => {}
TapeElement::True if coerce_primitive => {
data_capacity += TRUE.len();
}
TapeElement::False if coerce_primitive => {
data_capacity += FALSE.len();
}
TapeElement::Number(idx) if coerce_primitive => {
data_capacity += tape.get_string(idx).len();
}
_ => return Err(tape.error(*p, "string")),
}
}

to handle coercing the new TapeElement enumerations (I64, I32, F64, and F32).

@jonmmease
Copy link
Contributor Author

Thanks @fansehep and @tustvold for the quick fix!

@tustvold tustvold added the arrow Changes to the arrow crate label Jan 5, 2024
@tustvold
Copy link
Contributor

tustvold commented Jan 5, 2024

label_issue.py automatically added labels {'arrow'} from #5101

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate bug help wanted
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants