Convert some panics that happen on invalid parquet files to error results #6738

jp0317 · 2024-11-16T06:09:04Z

Which issue does this PR close?

This solves some of #6737.

Rationale for this change

Some code changes to replace some panics with proper errors

What changes are included in this PR?

Some codes that lead to panic are converted to returning error results.

Are there any user-facing changes?

Behavior change from panics to errors.

tustvold

I'm not sure about this, this seems to add a number of untested additional checks, some to very hot codepaths.

I suggest rather than just looking for things that might panic, instead going from a failing test to a fix. This would also better capture the more problematic cases where the reader gets stuck on malformed input, a panic is a good outcome IMO...

tustvold · 2024-11-16T08:08:35Z

parquet/src/file/metadata/mod.rs

@@ -959,17 +959,18 @@ impl ColumnChunkMetaData {
    }

    /// Returns the offset and length in bytes of the column chunk within the file
-    pub fn byte_range(&self) -> (u64, u64) {
+    pub fn byte_range(&self) -> Result<(u64, u64)> {


This is a breaking API change

yeah..is it fine given that it just wraps the return value within Result? The behavior change is just "panic --> error".

No this is a breaking change

removed this change. btw i saw that the planned 54.0.0 will have "major api change", how is an api change considered as acceptable breaking change??

parquet/src/format.rs

tustvold · 2024-11-16T10:09:06Z

parquet/src/schema/types.rs

@@ -1227,6 +1231,10 @@ fn from_thrift_helper(elements: &[SchemaElement], index: usize) -> Result<(usize
                if !is_root_node {
                    builder = builder.with_repetition(rep);
                }
+            } else if !is_root_node {


Do we need this check?

this is based on the comment at line 1230 which says All other types must have one, and the assert at line 1066: assert!(tp.get_basic_info.()has_repetitio())

My concern with this check is unless it is necessary for correctness, there is potential adding it breaks something for someone. Parquet is a very broad ecosystem, lots of writers have interesting interpretations of the specification

it breaks something for someone

if it happened then someone must have already experienced panics. But i get your point, removed this change. thx

tustvold · 2024-11-16T10:09:42Z

parquet/src/thrift.rs

@@ -67,7 +67,17 @@ impl<'a> TCompactSliceInputProtocol<'a> {
        let mut shift = 0;
        loop {
            let byte = self.read_byte()?;
-            in_progress |= ((byte & 0x7F) as u64) << shift;
+            let val = (byte & 0x7F) as u64;
+            let val = val.checked_shl(shift).map_or_else(


This is a very performance critical code path, this probably should use wrapping_shl

i'm afraid wrapping might cause correctness issues...besides, would the checked_shl really make a noticeable performance difference here?

It shouldn't cause correctness issues, and yes it will matter. There are benchmarks that will likely show this

okay, changed to wrapping_shl, thx!

tustvold · 2024-11-16T10:10:34Z

parquet/src/thrift.rs

 impl TInputProtocol for TCompactSliceInputProtocol<'_> {
    fn read_message_begin(&mut self) -> thrift::Result<TMessageIdentifier> {
-        unimplemented!()
+        thrift_unimplemented!()


This should be unreachable, a panic is the correct thing to do here

iiuc it "should be unreachable" unless the input file is malformed? I guess this goes back to the discussion on how to handle invalid inputs:

No it is actually genuinely unreachable, we don't use thrift messages

changed back to unimplemented

jp0317 · 2024-11-19T03:28:54Z

Thanks @tustvold for the review!

this seems to add a number of untested additional checks...

I think the changes are more about converting panics to errors, rather than actual code logic.

looking for things that might panic, instead going from a failing test to a fix

these panics were triggered in my own fuzzing test with invalid parquet files. Nevertheless, i think it's a similar topic of "how to handle invalid inputs" as discussed in #5323. Reading this doc, imho errors better than panics unless it's really something unrecoverable.

jp0317 · 2024-11-20T05:06:53Z

Hi @tustvold, I removed some changes based on your comment. PTAL, thanks!

etseidl

Thanks! Just a few nits. However, I wonder if perhaps this should be multiple PRs with rationales given for each change.

etseidl · 2024-11-20T19:05:23Z

parquet/src/file/metadata/reader.rs

+        let column_orders =
+            Self::parse_column_orders(t_file_metadata.column_orders, &schema_descr)?;


I realize this would currently panic, but would one ever prefer to just set column_orders to None and continue? The only impact AFAIK would be statistics being unusable, which would only matter if predicates were in use.

good point! i agree with the setting to None idea. but i guess this worths a separate issue to discuss and fix.

etseidl · 2024-11-20T19:05:44Z

parquet/src/file/statistics.rs

+                if let Some(min) = min {
+                    if min.len() < len {
+                        return Err(ParquetError::General(
+                            "Insufficient bytes to parse max statistic".to_string(),


Suggested change

"Insufficient bytes to parse max statistic".to_string(),

"Insufficient bytes to parse min statistic".to_string(),

thanks for catching this.

jp0317 · 2024-11-21T00:30:05Z

Thanks! Just a few nits. However, I wonder if perhaps this should be multiple PRs with rationales given for each change.

Thanks for the review! imho these changes share the same rationale in that they just convert panics to errors

etseidl

This all looks correct to me. I personally would prefer errors to panics when processing multiple files.

jp0317 · 2024-11-26T17:37:57Z

Hi @tustvold could you please take another look at this one? Thanks!

jp0317 · 2024-12-10T19:49:44Z

Hi @tustvold could you please take another look at this one? Thanks!

just a friendly ping @tustvold :)

alamb

Thank you very much @jp0317 and @etseidl

This looks reasonable to me -- thank you 🙏

I also made a PR to test this out with the DataFusion tests as well: apache/datafusion#13820

Just as a way to double check this doesn't cause any unintended issues

Wait to run benchmarks before approval

alamb · 2024-12-17T22:24:07Z

I ran the parquet benchmarks to verify there is no performance implications with this PR

My initial results seem to show no noticable pattern / difference in this PR than main. I will run it again to be sure.

Details

++ critcmp master panic
group                                                                                                      master                                 panic
-----                                                                                                      ------                                 -----
arrow_array_reader/BYTE_ARRAY/Decimal128Array/plain encoded, mandatory, no NULLs                           1.01   1083.3±9.48µs        ? ?/sec    1.00   1070.4±3.07µs        ? ?/sec
arrow_array_reader/BYTE_ARRAY/Decimal128Array/plain encoded, optional, half NULLs                          1.05   1218.9±3.27µs        ? ?/sec    1.00   1162.7±1.70µs        ? ?/sec
arrow_array_reader/BYTE_ARRAY/Decimal128Array/plain encoded, optional, no NULLs                            1.01   1085.7±3.01µs        ? ?/sec    1.00   1077.0±3.48µs        ? ?/sec
arrow_array_reader/BinaryArray/dictionary encoded, mandatory, no NULLs                                     1.19    474.6±3.46µs        ? ?/sec    1.00    399.4±1.83µs        ? ?/sec
arrow_array_reader/BinaryArray/dictionary encoded, optional, half NULLs                                    1.07    705.2±1.57µs        ? ?/sec    1.00    660.0±1.61µs        ? ?/sec
arrow_array_reader/BinaryArray/dictionary encoded, optional, no NULLs                                      1.20    485.9±6.66µs        ? ?/sec    1.00    404.9±2.86µs        ? ?/sec
arrow_array_reader/BinaryArray/plain encoded, mandatory, no NULLs                                          1.00    539.4±3.68µs        ? ?/sec    1.02    550.1±2.50µs        ? ?/sec
arrow_array_reader/BinaryArray/plain encoded, optional, half NULLs                                         1.00    751.6±5.43µs        ? ?/sec    1.00    754.5±2.25µs        ? ?/sec
arrow_array_reader/BinaryArray/plain encoded, optional, no NULLs                                           1.00    546.4±3.48µs        ? ?/sec    1.02    556.4±3.23µs        ? ?/sec
arrow_array_reader/BinaryViewArray/dictionary encoded, mandatory, no NULLs                                 1.00    226.4±3.34µs        ? ?/sec    1.22    276.8±2.89µs        ? ?/sec
arrow_array_reader/BinaryViewArray/dictionary encoded, optional, half NULLs                                1.00    253.7±0.71µs        ? ?/sec    1.08    272.8±0.79µs        ? ?/sec
arrow_array_reader/BinaryViewArray/dictionary encoded, optional, no NULLs                                  1.00    233.8±2.52µs        ? ?/sec    1.19    278.6±3.04µs        ? ?/sec
arrow_array_reader/BinaryViewArray/plain encoded, mandatory, no NULLs                                      1.18    373.4±4.79µs        ? ?/sec    1.00    316.4±1.83µs        ? ?/sec
arrow_array_reader/BinaryViewArray/plain encoded, mandatory, no NULLs, short string                        1.11    348.1±1.22µs        ? ?/sec    1.00    312.4±1.62µs        ? ?/sec
arrow_array_reader/BinaryViewArray/plain encoded, optional, half NULLs                                     1.12    343.9±1.96µs        ? ?/sec    1.00    307.5±0.87µs        ? ?/sec
arrow_array_reader/BinaryViewArray/plain encoded, optional, no NULLs                                       1.18    382.8±2.03µs        ? ?/sec    1.00    324.7±3.93µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/byte_stream_split encoded, mandatory, no NULLs     1.00    881.0±5.07µs        ? ?/sec    1.07    940.5±1.37µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/byte_stream_split encoded, optional, half NULLs    1.00    755.7±1.45µs        ? ?/sec    1.03    782.1±2.54µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/byte_stream_split encoded, optional, no NULLs      1.00    888.3±3.13µs        ? ?/sec    1.07    947.5±2.53µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/plain encoded, mandatory, no NULLs                 1.00    255.0±2.80µs        ? ?/sec    1.05    268.1±3.59µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/plain encoded, optional, half NULLs                1.02    459.8±3.34µs        ? ?/sec    1.00    452.2±2.30µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/plain encoded, optional, no NULLs                  1.00    266.7±4.40µs        ? ?/sec    1.02    271.7±4.04µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/byte_stream_split encoded, mandatory, no NULLs        1.00    114.3±0.26µs        ? ?/sec    1.07    121.9±0.40µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/byte_stream_split encoded, optional, half NULLs       1.17    245.8±0.80µs        ? ?/sec    1.00    209.3±0.66µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/byte_stream_split encoded, optional, no NULLs         1.00    119.8±0.31µs        ? ?/sec    1.06    126.9±0.40µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/plain encoded, mandatory, no NULLs                    1.00     37.5±0.11µs        ? ?/sec    1.00     37.5±0.06µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/plain encoded, optional, half NULLs                   1.24    207.2±0.87µs        ? ?/sec    1.00    166.4±0.24µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/plain encoded, optional, no NULLs                     1.00     42.6±0.14µs        ? ?/sec    1.00     42.4±0.09µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(16)/byte_stream_split encoded, mandatory, no NULLs                    1.00    687.3±1.18µs        ? ?/sec    1.07    736.8±2.17µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(16)/byte_stream_split encoded, optional, half NULLs                   1.00    567.1±1.21µs        ? ?/sec    1.03    583.1±4.18µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(16)/byte_stream_split encoded, optional, no NULLs                     1.00    693.4±1.45µs        ? ?/sec    1.07    742.2±2.03µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(16)/plain encoded, mandatory, no NULLs                                1.00     66.3±6.44µs        ? ?/sec    1.04     68.6±2.82µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(16)/plain encoded, optional, half NULLs                               1.02    256.5±1.49µs        ? ?/sec    1.00    252.5±2.04µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(16)/plain encoded, optional, no NULLs                                 1.06     77.7±6.21µs        ? ?/sec    1.00     73.3±3.61µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(2)/byte_stream_split encoded, mandatory, no NULLs                     1.00     85.8±0.27µs        ? ?/sec    1.09     93.4±0.25µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(2)/byte_stream_split encoded, optional, half NULLs                    1.20    217.2±0.77µs        ? ?/sec    1.00    180.4±0.79µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(2)/byte_stream_split encoded, optional, no NULLs                      1.00     90.7±0.40µs        ? ?/sec    1.09     98.7±0.29µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(2)/plain encoded, mandatory, no NULLs                                 1.00      8.8±0.26µs        ? ?/sec    1.00      8.8±0.22µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(2)/plain encoded, optional, half NULLs                                1.30    179.2±1.57µs        ? ?/sec    1.00    137.9±0.22µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(2)/plain encoded, optional, no NULLs                                  1.00     13.8±0.23µs        ? ?/sec    1.01     14.0±0.18µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(4)/byte_stream_split encoded, mandatory, no NULLs                     1.00    167.8±0.26µs        ? ?/sec    1.09    183.2±0.59µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(4)/byte_stream_split encoded, optional, half NULLs                    1.21    335.0±0.83µs        ? ?/sec    1.00    276.7±0.65µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(4)/byte_stream_split encoded, optional, no NULLs                      1.00    175.3±0.45µs        ? ?/sec    1.08    188.7±0.52µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(4)/plain encoded, mandatory, no NULLs                                 1.00     12.4±0.16µs        ? ?/sec    1.12     13.9±0.59µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(4)/plain encoded, optional, half NULLs                                1.34    259.0±0.54µs        ? ?/sec    1.00    193.0±0.33µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(4)/plain encoded, optional, no NULLs                                  1.00     19.1±0.30µs        ? ?/sec    1.04     19.8±1.04µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(8)/byte_stream_split encoded, mandatory, no NULLs                     1.00    341.3±1.07µs        ? ?/sec    1.07    364.9±0.86µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(8)/byte_stream_split encoded, optional, half NULLs                    1.00    388.9±0.93µs        ? ?/sec    1.00    389.3±0.76µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(8)/byte_stream_split encoded, optional, no NULLs                      1.00    348.9±0.77µs        ? ?/sec    1.07    371.8±1.39µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(8)/plain encoded, mandatory, no NULLs                                 1.01     25.6±0.56µs        ? ?/sec    1.00     25.2±0.55µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(8)/plain encoded, optional, half NULLs                                1.01    224.2±0.61µs        ? ?/sec    1.00    222.0±1.35µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(8)/plain encoded, optional, no NULLs                                  1.00     35.0±1.42µs        ? ?/sec    1.00     34.9±1.62µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/binary packed skip, optional, half NULLs                          1.00    140.0±0.27µs        ? ?/sec    1.02    142.7±0.46µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/binary packed skip, optional, no NULLs                            1.07    132.0±0.39µs        ? ?/sec    1.00    123.7±0.27µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/binary packed, mandatory, no NULLs                                1.03    183.4±0.46µs        ? ?/sec    1.00    178.7±0.90µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/binary packed, optional, half NULLs                               1.00    235.6±1.67µs        ? ?/sec    1.04    245.8±2.21µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/binary packed, optional, no NULLs                                 1.03    189.1±0.61µs        ? ?/sec    1.00    183.6±0.53µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/byte_stream_split encoded, mandatory, no NULLs                    1.00     77.6±0.34µs        ? ?/sec    1.00     77.5±0.39µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/byte_stream_split encoded, optional, half NULLs                   1.00    179.5±0.62µs        ? ?/sec    1.06    190.9±0.39µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/byte_stream_split encoded, optional, no NULLs                     1.01     82.7±0.38µs        ? ?/sec    1.00     82.2±0.20µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/dictionary encoded, mandatory, no NULLs                           1.00    186.1±0.33µs        ? ?/sec    1.00    185.9±0.27µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/dictionary encoded, optional, half NULLs                          1.00    239.8±0.82µs        ? ?/sec    1.05    251.3±9.54µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/dictionary encoded, optional, no NULLs                            1.00    191.8±0.40µs        ? ?/sec    1.00    191.5±0.35µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/plain encoded, mandatory, no NULLs                                1.00     72.7±0.50µs        ? ?/sec    1.01     73.7±0.40µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/plain encoded, optional, half NULLs                               1.00    177.1±0.39µs        ? ?/sec    1.07    188.7±0.46µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/plain encoded, optional, no NULLs                                 1.02     78.8±0.34µs        ? ?/sec    1.00     77.5±0.24µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed skip, mandatory, no NULLs                           1.00    109.9±0.63µs        ? ?/sec    1.02    112.3±0.40µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed skip, optional, half NULLs                          1.04    137.3±0.42µs        ? ?/sec    1.00    132.3±0.37µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed skip, optional, no NULLs                            1.00    111.0±0.34µs        ? ?/sec    1.03    114.6±0.32µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed, mandatory, no NULLs                                1.00    163.3±0.40µs        ? ?/sec    1.02    167.1±0.28µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed, optional, half NULLs                               1.05    246.6±1.32µs        ? ?/sec    1.00    234.5±0.51µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed, optional, no NULLs                                 1.00    168.5±0.55µs        ? ?/sec    1.02    172.1±0.29µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/byte_stream_split encoded, mandatory, no NULLs                    1.00    203.4±0.60µs        ? ?/sec    1.00    202.7±0.33µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/byte_stream_split encoded, optional, half NULLs                   1.05    262.8±1.36µs        ? ?/sec    1.00    249.7±1.02µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/byte_stream_split encoded, optional, no NULLs                     1.00    208.9±0.88µs        ? ?/sec    1.00    208.6±0.36µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/dictionary encoded, mandatory, no NULLs                           1.01    194.6±0.76µs        ? ?/sec    1.00    192.9±0.38µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/dictionary encoded, optional, half NULLs                          1.06    257.9±1.42µs        ? ?/sec    1.00    243.9±0.66µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/dictionary encoded, optional, no NULLs                            1.01    200.6±2.60µs        ? ?/sec    1.00    198.8±0.34µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/plain encoded, mandatory, no NULLs                                1.00     97.3±0.25µs        ? ?/sec    1.09    106.4±1.12µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/plain encoded, optional, half NULLs                               1.07    212.7±0.92µs        ? ?/sec    1.00    198.6±1.01µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/plain encoded, optional, no NULLs                                 1.00    104.8±1.14µs        ? ?/sec    1.10    115.5±0.64µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed skip, mandatory, no NULLs                                      1.06    101.6±0.28µs        ? ?/sec    1.00     95.6±0.28µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed skip, optional, half NULLs                                     1.00    111.5±0.22µs        ? ?/sec    1.02    113.4±0.46µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed skip, optional, no NULLs                                       1.08    104.4±0.27µs        ? ?/sec    1.00     96.2±0.33µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed, mandatory, no NULLs                                           1.03    131.2±0.44µs        ? ?/sec    1.00    126.9±0.38µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed, optional, half NULLs                                          1.00    179.8±0.51µs        ? ?/sec    1.06    189.7±0.53µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed, optional, no NULLs                                            1.03    136.1±0.44µs        ? ?/sec    1.00    131.7±0.33µs        ? ?/sec
arrow_array_reader/Int32Array/byte_stream_split encoded, mandatory, no NULLs                               1.05     25.6±0.26µs        ? ?/sec    1.00     24.4±0.07µs        ? ?/sec
arrow_array_reader/Int32Array/byte_stream_split encoded, optional, half NULLs                              1.00    123.9±0.40µs        ? ?/sec    1.08    134.5±0.36µs        ? ?/sec
arrow_array_reader/Int32Array/byte_stream_split encoded, optional, no NULLs                                1.05     30.2±0.28µs        ? ?/sec    1.00     28.8±0.08µs        ? ?/sec
arrow_array_reader/Int32Array/dictionary encoded, mandatory, no NULLs                                      1.02    134.7±0.43µs        ? ?/sec    1.00    132.3±0.27µs        ? ?/sec
arrow_array_reader/Int32Array/dictionary encoded, optional, half NULLs                                     1.00    183.9±0.64µs        ? ?/sec    1.05    193.9±0.31µs        ? ?/sec
arrow_array_reader/Int32Array/dictionary encoded, optional, no NULLs                                       1.02    139.4±0.45µs        ? ?/sec    1.00    136.5±0.34µs        ? ?/sec
arrow_array_reader/Int32Array/plain encoded, mandatory, no NULLs                                           1.00     17.9±0.63µs        ? ?/sec    1.00     17.9±0.78µs        ? ?/sec
arrow_array_reader/Int32Array/plain encoded, optional, half NULLs                                          1.00    118.4±0.33µs        ? ?/sec    1.10    130.2±0.35µs        ? ?/sec
arrow_array_reader/Int32Array/plain encoded, optional, no NULLs                                            1.00     23.3±0.33µs        ? ?/sec    1.01     23.5±0.64µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed skip, mandatory, no NULLs                                      1.00     82.4±0.27µs        ? ?/sec    1.03     84.7±0.29µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed skip, optional, half NULLs                                     1.04    108.5±0.40µs        ? ?/sec    1.00    104.6±0.40µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed skip, optional, no NULLs                                       1.00     83.2±0.22µs        ? ?/sec    1.05     87.6±0.36µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed, mandatory, no NULLs                                           1.00    106.8±0.60µs        ? ?/sec    1.04    111.5±0.52µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed, optional, half NULLs                                          1.05    180.6±0.41µs        ? ?/sec    1.00    172.1±0.77µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed, optional, no NULLs                                            1.00    111.4±0.47µs        ? ?/sec    1.04    116.0±0.54µs        ? ?/sec
arrow_array_reader/Int64Array/byte_stream_split encoded, mandatory, no NULLs                               1.00    148.8±0.29µs        ? ?/sec    1.00    149.0±0.70µs        ? ?/sec
arrow_array_reader/Int64Array/byte_stream_split encoded, optional, half NULLs                              1.06    204.1±0.59µs        ? ?/sec    1.00    191.9±0.40µs        ? ?/sec
arrow_array_reader/Int64Array/byte_stream_split encoded, optional, no NULLs                                1.00    153.6±0.42µs        ? ?/sec    1.00    153.5±0.39µs        ? ?/sec
arrow_array_reader/Int64Array/dictionary encoded, mandatory, no NULLs                                      1.02   143.6±17.83µs        ? ?/sec    1.00    140.7±0.70µs        ? ?/sec
arrow_array_reader/Int64Array/dictionary encoded, optional, half NULLs                                     1.07    199.9±0.92µs        ? ?/sec    1.00    186.9±0.80µs        ? ?/sec
arrow_array_reader/Int64Array/dictionary encoded, optional, no NULLs                                       1.00    145.2±0.50µs        ? ?/sec    1.00    144.8±0.70µs        ? ?/sec
arrow_array_reader/Int64Array/plain encoded, mandatory, no NULLs                                           1.05     42.8±2.98µs        ? ?/sec    1.00     40.7±2.26µs        ? ?/sec
arrow_array_reader/Int64Array/plain encoded, optional, half NULLs                                          1.09    147.3±0.48µs        ? ?/sec    1.00    135.6±0.42µs        ? ?/sec
arrow_array_reader/Int64Array/plain encoded, optional, no NULLs                                            1.00     45.3±2.72µs        ? ?/sec    1.06     47.9±2.78µs        ? ?/sec
arrow_array_reader/ListArray/plain encoded optional strings half NULLs                                     1.00      7.4±0.03ms        ? ?/sec    1.00      7.4±0.05ms        ? ?/sec
arrow_array_reader/ListArray/plain encoded optional strings no NULLs                                       1.00     13.1±0.11ms        ? ?/sec    1.00     13.2±0.14ms        ? ?/sec
arrow_array_reader/StringArray/dictionary encoded, mandatory, no NULLs                                     1.23    491.7±1.50µs        ? ?/sec    1.00    398.6±2.24µs        ? ?/sec
arrow_array_reader/StringArray/dictionary encoded, optional, half NULLs                                    1.07    706.5±1.27µs        ? ?/sec    1.00    663.1±2.02µs        ? ?/sec
arrow_array_reader/StringArray/dictionary encoded, optional, no NULLs                                      1.21    492.1±4.88µs        ? ?/sec    1.00    406.8±2.01µs        ? ?/sec
arrow_array_reader/StringArray/plain encoded, mandatory, no NULLs                                          1.02    652.5±3.21µs        ? ?/sec    1.00    640.8±2.96µs        ? ?/sec
arrow_array_reader/StringArray/plain encoded, optional, half NULLs                                         1.00    794.8±2.75µs        ? ?/sec    1.02    812.4±3.20µs        ? ?/sec
arrow_array_reader/StringArray/plain encoded, optional, no NULLs                                           1.02    661.7±3.06µs        ? ?/sec    1.00    649.2±3.18µs        ? ?/sec
arrow_array_reader/StringDictionary/dictionary encoded, mandatory, no NULLs                                1.05    321.0±2.76µs        ? ?/sec    1.00    305.9±1.18µs        ? ?/sec
arrow_array_reader/StringDictionary/dictionary encoded, optional, half NULLs                               1.00    385.0±1.55µs        ? ?/sec    1.00    385.3±1.52µs        ? ?/sec
arrow_array_reader/StringDictionary/dictionary encoded, optional, no NULLs                                 1.03    321.4±1.79µs        ? ?/sec    1.00    312.9±0.76µs        ? ?/sec
arrow_array_reader/StringViewArray/dictionary encoded, mandatory, no NULLs                                 1.00    238.4±3.04µs        ? ?/sec    1.13    269.9±3.39µs        ? ?/sec
arrow_array_reader/StringViewArray/dictionary encoded, optional, half NULLs                                1.00    252.2±0.81µs        ? ?/sec    1.11    280.9±0.77µs        ? ?/sec
arrow_array_reader/StringViewArray/dictionary encoded, optional, no NULLs                                  1.00    232.6±2.74µs        ? ?/sec    1.22    282.8±2.79µs        ? ?/sec
arrow_array_reader/StringViewArray/plain encoded, mandatory, no NULLs                                      1.14    478.4±1.87µs        ? ?/sec    1.00    420.7±1.76µs        ? ?/sec
arrow_array_reader/StringViewArray/plain encoded, optional, half NULLs                                     1.09    388.8±1.97µs        ? ?/sec    1.00    355.9±1.13µs        ? ?/sec
arrow_array_reader/StringViewArray/plain encoded, optional, no NULLs                                       1.13    486.7±3.31µs        ? ?/sec    1.00    429.4±2.38µs        ? ?/sec

alamb · 2024-12-18T12:54:40Z

My second performance run likewise shows now particular peformance changes (there is a lot of noise with some benchmarks being faster and some being slower)

Details

++ critcmp master panic
group                                                                                                      master                                 panic
-----                                                                                                      ------                                 -----
arrow_array_reader/BYTE_ARRAY/Decimal128Array/plain encoded, mandatory, no NULLs                           1.00   1076.3±2.20µs        ? ?/sec    1.00   1071.1±3.04µs        ? ?/sec
arrow_array_reader/BYTE_ARRAY/Decimal128Array/plain encoded, optional, half NULLs                          1.05  1221.6±35.71µs        ? ?/sec    1.00   1166.5±2.18µs        ? ?/sec
arrow_array_reader/BYTE_ARRAY/Decimal128Array/plain encoded, optional, no NULLs                            1.01   1085.4±6.20µs        ? ?/sec    1.00   1075.8±2.70µs        ? ?/sec
arrow_array_reader/BinaryArray/dictionary encoded, mandatory, no NULLs                                     1.21    485.5±3.26µs        ? ?/sec    1.00    401.1±2.35µs        ? ?/sec
arrow_array_reader/BinaryArray/dictionary encoded, optional, half NULLs                                    1.07    700.5±1.01µs        ? ?/sec    1.00    656.2±1.33µs        ? ?/sec
arrow_array_reader/BinaryArray/dictionary encoded, optional, no NULLs                                      1.25    485.3±1.33µs        ? ?/sec    1.00    388.2±3.31µs        ? ?/sec
arrow_array_reader/BinaryArray/plain encoded, mandatory, no NULLs                                          1.00    544.1±3.74µs        ? ?/sec    1.03    559.7±3.06µs        ? ?/sec
arrow_array_reader/BinaryArray/plain encoded, optional, half NULLs                                         1.00    735.6±2.51µs        ? ?/sec    1.04    762.6±3.18µs        ? ?/sec
arrow_array_reader/BinaryArray/plain encoded, optional, no NULLs                                           1.00    548.7±4.11µs        ? ?/sec    1.03    566.2±3.81µs        ? ?/sec
arrow_array_reader/BinaryViewArray/dictionary encoded, mandatory, no NULLs                                 1.00    230.4±2.52µs        ? ?/sec    1.17    270.2±2.54µs        ? ?/sec
arrow_array_reader/BinaryViewArray/dictionary encoded, optional, half NULLs                                1.00    253.3±1.98µs        ? ?/sec    1.11    280.7±0.59µs        ? ?/sec
arrow_array_reader/BinaryViewArray/dictionary encoded, optional, no NULLs                                  1.00    236.3±3.13µs        ? ?/sec    1.20    284.7±2.70µs        ? ?/sec
arrow_array_reader/BinaryViewArray/plain encoded, mandatory, no NULLs                                      1.18    369.4±2.30µs        ? ?/sec    1.00    313.0±1.47µs        ? ?/sec
arrow_array_reader/BinaryViewArray/plain encoded, mandatory, no NULLs, short string                        1.12    345.1±1.20µs        ? ?/sec    1.00    309.1±1.11µs        ? ?/sec
arrow_array_reader/BinaryViewArray/plain encoded, optional, half NULLs                                     1.11    338.0±1.88µs        ? ?/sec    1.00    305.0±0.97µs        ? ?/sec
arrow_array_reader/BinaryViewArray/plain encoded, optional, no NULLs                                       1.17    377.7±1.10µs        ? ?/sec    1.00    322.6±1.06µs        ? ?/sec
arrow_array_reader/BYTE_ARRAY/Decimal128Array/plain encoded, mandatory, no NULLs                           1.00   1076.3±2.20µs        ? ?/sec    1.00   1071.1±3.04µs        ? ?/sec
arrow_array_reader/BYTE_ARRAY/Decimal128Array/plain encoded, optional, half NULLs                          1.05  1221.6±35.71µs        ? ?/sec    1.00   1166.5±2.18µs        ? ?/sec
arrow_array_reader/BYTE_ARRAY/Decimal128Array/plain encoded, optional, no NULLs                            1.01   1085.4±6.20µs        ? ?/sec    1.00   1075.8±2.70µs        ? ?/sec
arrow_array_reader/BinaryArray/dictionary encoded, mandatory, no NULLs                                     1.21    485.5±3.26µs        ? ?/sec    1.00    401.1±2.35µs        ? ?/sec
arrow_array_reader/BinaryArray/dictionary encoded, optional, half NULLs                                    1.07    700.5±1.01µs        ? ?/sec    1.00    656.2±1.33µs        ? ?/sec
arrow_array_reader/BinaryArray/dictionary encoded, optional, no NULLs                                      1.25    485.3±1.33µs        ? ?/sec    1.00    388.2±3.31µs        ? ?/sec
arrow_array_reader/BinaryArray/plain encoded, mandatory, no NULLs                                          1.00    544.1±3.74µs        ? ?/sec    1.03    559.7±3.06µs        ? ?/sec
arrow_array_reader/BinaryArray/plain encoded, optional, half NULLs                                         1.00    735.6±2.51µs        ? ?/sec    1.04    762.6±3.18µs        ? ?/sec
arrow_array_reader/BinaryArray/plain encoded, optional, no NULLs                                           1.00    548.7±4.11µs        ? ?/sec    1.03    566.2±3.81µs        ? ?/sec
arrow_array_reader/BinaryViewArray/dictionary encoded, mandatory, no NULLs                                 1.00    230.4±2.52µs        ? ?/sec    1.17    270.2±2.54µs        ? ?/sec
arrow_array_reader/BinaryViewArray/dictionary encoded, optional, half NULLs                                1.00    253.3±1.98µs        ? ?/sec    1.11    280.7±0.59µs        ? ?/sec
arrow_array_reader/BinaryViewArray/dictionary encoded, optional, no NULLs                                  1.00    236.3±3.13µs        ? ?/sec    1.20    284.7±2.70µs        ? ?/sec
arrow_array_reader/BinaryViewArray/plain encoded, mandatory, no NULLs                                      1.18    369.4±2.30µs        ? ?/sec    1.00    313.0±1.47µs        ? ?/sec
arrow_array_reader/BinaryViewArray/plain encoded, mandatory, no NULLs, short string                        1.12    345.1±1.20µs        ? ?/sec    1.00    309.1±1.11µs        ? ?/sec
arrow_array_reader/BinaryViewArray/plain encoded, optional, half NULLs                                     1.11    338.0±1.88µs        ? ?/sec    1.00    305.0±0.97µs        ? ?/sec
arrow_array_reader/BinaryViewArray/plain encoded, optional, no NULLs                                       1.17    377.7±1.10µs        ? ?/sec    1.00    322.6±1.06µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/byte_stream_split encoded, mandatory, no NULLs     1.00    881.0±2.59µs        ? ?/sec    1.07    941.8±2.70µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/byte_stream_split encoded, optional, half NULLs    1.00    770.3±2.67µs        ? ?/sec    1.03    792.2±2.45µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/byte_stream_split encoded, optional, no NULLs      1.00    899.0±3.05µs        ? ?/sec    1.06    951.0±2.63µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/plain encoded, mandatory, no NULLs                 1.00    254.4±1.08µs        ? ?/sec    1.07    273.1±3.08µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/plain encoded, optional, half NULLs                1.00    457.8±1.79µs        ? ?/sec    1.00    457.3±2.98µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/plain encoded, optional, no NULLs                  1.00    269.1±2.79µs        ? ?/sec    1.06    286.3±3.69µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/byte_stream_split encoded, mandatory, no NULLs        1.00    114.2±0.21µs        ? ?/sec    1.07    122.2±0.25µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/byte_stream_split encoded, optional, half NULLs       1.17    246.5±0.59µs        ? ?/sec    1.00    210.0±1.10µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/byte_stream_split encoded, optional, no NULLs         1.00    119.7±0.32µs        ? ?/sec    1.07    128.5±2.17µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/plain encoded, mandatory, no NULLs                    1.01     37.8±0.11µs        ? ?/sec    1.00     37.4±0.18µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/plain encoded, optional, half NULLs                   1.25    207.7±0.26µs        ? ?/sec    1.00    166.0±0.40µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/plain encoded, optional, no NULLs                     1.00     42.7±0.15µs        ? ?/sec    1.00     42.6±0.13µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(16)/byte_stream_split encoded, mandatory, no NULLs                    1.00    689.7±1.93µs        ? ?/sec    1.07    735.8±1.57µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(16)/byte_stream_split encoded, optional, half NULLs                   1.00    563.7±1.78µs        ? ?/sec    1.04    585.2±1.88µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(16)/byte_stream_split encoded, optional, no NULLs                     1.00    697.3±1.83µs        ? ?/sec    1.06    742.3±2.07µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(16)/plain encoded, mandatory, no NULLs                                1.25     70.7±1.86µs        ? ?/sec    1.00     56.6±3.79µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(16)/plain encoded, optional, half NULLs                               1.00    250.3±1.97µs        ? ?/sec    1.00    251.1±1.27µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(16)/plain encoded, optional, no NULLs                                 1.20     78.0±1.75µs        ? ?/sec    1.00     64.8±4.23µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(2)/byte_stream_split encoded, mandatory, no NULLs                     1.00     85.4±0.26µs        ? ?/sec    1.10     93.7±0.27µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(2)/byte_stream_split encoded, optional, half NULLs                    1.20    216.4±0.91µs        ? ?/sec    1.00    180.0±0.31µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(2)/byte_stream_split encoded, optional, no NULLs                      1.00     91.2±0.32µs        ? ?/sec    1.08     98.8±0.38µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(2)/plain encoded, mandatory, no NULLs                                 1.00      8.9±0.21µs        ? ?/sec    1.00      8.9±0.16µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(2)/plain encoded, optional, half NULLs                                1.30    177.5±0.56µs        ? ?/sec    1.00    136.6±0.36µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(2)/plain encoded, optional, no NULLs                                  1.00     13.7±0.18µs        ? ?/sec    1.01     13.9±0.24µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(4)/byte_stream_split encoded, mandatory, no NULLs                     1.00    169.7±0.54µs        ? ?/sec    1.08    183.2±0.45µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(4)/byte_stream_split encoded, optional, half NULLs                    1.22    332.9±0.88µs        ? ?/sec    1.00    273.6±0.35µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(4)/byte_stream_split encoded, optional, no NULLs                      1.00    174.6±0.55µs        ? ?/sec    1.08    188.5±0.35µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(4)/plain encoded, mandatory, no NULLs                                 1.00     12.8±0.19µs        ? ?/sec    1.07     13.7±0.29µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(4)/plain encoded, optional, half NULLs                                1.36    256.8±0.74µs        ? ?/sec    1.00    189.2±0.66µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(4)/plain encoded, optional, no NULLs                                  1.00     18.2±0.26µs        ? ?/sec    1.05     19.1±0.36µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(8)/byte_stream_split encoded, mandatory, no NULLs                     1.00    340.0±1.07µs        ? ?/sec    1.07    364.6±0.78µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(8)/byte_stream_split encoded, optional, half NULLs                    1.00    387.0±3.17µs        ? ?/sec    1.00    387.3±1.12µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(8)/byte_stream_split encoded, optional, no NULLs                      1.00    346.6±0.83µs        ? ?/sec    1.07    371.2±1.05µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(8)/plain encoded, mandatory, no NULLs                                 1.08     27.7±0.99µs        ? ?/sec    1.00     25.7±0.37µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(8)/plain encoded, optional, half NULLs                                1.01    222.3±0.66µs        ? ?/sec    1.00    219.1±1.84µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(8)/plain encoded, optional, no NULLs                                  1.14     36.0±1.94µs        ? ?/sec    1.00     31.5±0.33µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/binary packed skip, mandatory, no NULLs                           1.02    128.7±0.41µs        ? ?/sec    1.00    125.7±0.29µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/binary packed skip, optional, half NULLs                          1.00    138.6±0.40µs        ? ?/sec    1.04    144.4±0.46µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/binary packed skip, optional, no NULLs                            1.02    131.9±0.42µs        ? ?/sec    1.00    129.4±1.54µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/binary packed, mandatory, no NULLs                                1.00    183.4±0.53µs        ? ?/sec    1.03    188.4±0.63µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/binary packed, optional, half NULLs                               1.00    233.9±0.48µs        ? ?/sec    1.06    248.3±0.70µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/binary packed, optional, no NULLs                                 1.00    188.9±0.66µs        ? ?/sec    1.02    193.6±0.57µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/byte_stream_split encoded, mandatory, no NULLs                    1.00     77.2±0.22µs        ? ?/sec    1.01     77.6±0.34µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/byte_stream_split encoded, optional, half NULLs                   1.00    177.9±0.58µs        ? ?/sec    1.07    191.1±0.48µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/byte_stream_split encoded, optional, no NULLs                     1.00     82.6±0.39µs        ? ?/sec    1.00     82.6±0.29µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/dictionary encoded, mandatory, no NULLs                           1.00    186.3±0.59µs        ? ?/sec    1.00    185.9±0.53µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/dictionary encoded, optional, half NULLs                          1.00    237.0±0.55µs        ? ?/sec    1.05    249.2±1.25µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/dictionary encoded, optional, no NULLs                            1.00    191.3±0.64µs        ? ?/sec    1.01    192.3±0.91µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/plain encoded, mandatory, no NULLs                                1.00     71.7±0.33µs        ? ?/sec    1.02     73.5±0.24µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/plain encoded, optional, half NULLs                               1.00    175.3±0.45µs        ? ?/sec    1.07    188.3±1.20µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/plain encoded, optional, no NULLs                                 1.00     77.2±0.29µs        ? ?/sec    1.00     77.4±0.60µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed skip, mandatory, no NULLs                           1.00    108.8±0.25µs        ? ?/sec    1.02    111.5±0.22µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed skip, optional, half NULLs                          1.05    139.3±0.33µs        ? ?/sec    1.00    132.2±0.29µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed skip, optional, no NULLs                            1.00    111.7±0.28µs        ? ?/sec    1.02    114.0±0.59µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed, mandatory, no NULLs                                1.00    161.5±0.44µs        ? ?/sec    1.00    161.3±0.45µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed, optional, half NULLs                               1.05    242.8±0.49µs        ? ?/sec    1.00    231.1±0.85µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed, optional, no NULLs                                 1.00    166.7±0.44µs        ? ?/sec    1.00    167.1±0.45µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/byte_stream_split encoded, mandatory, no NULLs                    1.01    204.8±0.48µs        ? ?/sec    1.00    201.9±0.61µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/byte_stream_split encoded, optional, half NULLs                   1.04    261.0±0.49µs        ? ?/sec    1.00    250.0±0.93µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/byte_stream_split encoded, optional, no NULLs                     1.02    210.9±0.61µs        ? ?/sec    1.00    207.0±0.34µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/dictionary encoded, mandatory, no NULLs                           1.00    193.6±0.38µs        ? ?/sec    1.00    194.0±0.56µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/dictionary encoded, optional, half NULLs                          1.05    254.5±1.28µs        ? ?/sec    1.00    242.2±0.69µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/dictionary encoded, optional, no NULLs                            1.01    199.1±0.55µs        ? ?/sec    1.00    198.0±0.41µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/plain encoded, mandatory, no NULLs                                1.00    100.1±0.65µs        ? ?/sec    1.10    110.1±1.44µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/plain encoded, optional, half NULLs                               1.05    207.3±0.68µs        ? ?/sec    1.00    197.2±0.50µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/plain encoded, optional, no NULLs                                 1.00    107.5±0.97µs        ? ?/sec    1.11    119.6±1.25µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed skip, mandatory, no NULLs                                      1.02    101.5±0.38µs        ? ?/sec    1.00     99.4±0.20µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed skip, optional, half NULLs                                     1.00    112.6±0.45µs        ? ?/sec    1.03    115.5±0.29µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed skip, optional, no NULLs                                       1.02    104.0±0.22µs        ? ?/sec    1.00    101.8±0.29µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed, mandatory, no NULLs                                           1.00    130.4±0.40µs        ? ?/sec    1.05    136.8±0.29µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed, optional, half NULLs                                          1.00    179.1±0.41µs        ? ?/sec    1.08    192.8±0.66µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed, optional, no NULLs                                            1.00    135.1±0.37µs        ? ?/sec    1.05    141.2±0.60µs        ? ?/sec
arrow_array_reader/Int32Array/byte_stream_split encoded, mandatory, no NULLs                               1.00     24.5±0.08µs        ? ?/sec    1.06     26.0±0.29µs        ? ?/sec
arrow_array_reader/Int32Array/byte_stream_split encoded, optional, half NULLs                              1.00    123.4±0.61µs        ? ?/sec    1.10    135.3±0.34µs        ? ?/sec
arrow_array_reader/Int32Array/byte_stream_split encoded, optional, no NULLs                                1.00     28.8±0.09µs        ? ?/sec    1.05     30.3±0.58µs        ? ?/sec
arrow_array_reader/Int32Array/dictionary encoded, mandatory, no NULLs                                      1.00    134.7±0.33µs        ? ?/sec    1.00    134.5±5.11µs        ? ?/sec
arrow_array_reader/Int32Array/dictionary encoded, optional, half NULLs                                     1.00    183.0±0.35µs        ? ?/sec    1.06    194.6±0.73µs        ? ?/sec
arrow_array_reader/Int32Array/dictionary encoded, optional, no NULLs                                       1.01    139.4±0.32µs        ? ?/sec    1.00    138.1±0.26µs        ? ?/sec
arrow_array_reader/Int32Array/plain encoded, mandatory, no NULLs                                           1.01     17.9±0.67µs        ? ?/sec    1.00     17.7±0.44µs        ? ?/sec
arrow_array_reader/Int32Array/plain encoded, optional, half NULLs                                          1.00    118.3±0.32µs        ? ?/sec    1.10    130.1±0.25µs        ? ?/sec
arrow_array_reader/Int32Array/plain encoded, optional, no NULLs                                            1.01     23.4±0.43µs        ? ?/sec    1.00     23.2±0.34µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed skip, mandatory, no NULLs                                      1.00     81.2±0.33µs        ? ?/sec    1.03     83.5±1.07µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed skip, optional, half NULLs                                     1.04    107.9±0.36µs        ? ?/sec    1.00    103.5±0.29µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed skip, optional, no NULLs                                       1.00     83.3±0.52µs        ? ?/sec    1.05     87.5±0.43µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed, mandatory, no NULLs                                           1.00    107.1±0.55µs        ? ?/sec    1.01    108.4±0.59µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed, optional, half NULLs                                          1.06    179.0±0.71µs        ? ?/sec    1.00    169.0±0.64µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed, optional, no NULLs                                            1.00    111.5±1.06µs        ? ?/sec    1.00    111.2±0.51µs        ? ?/sec
arrow_array_reader/Int64Array/byte_stream_split encoded, mandatory, no NULLs                               1.00    149.0±0.40µs        ? ?/sec    1.00    149.5±2.65µs        ? ?/sec
arrow_array_reader/Int64Array/byte_stream_split encoded, optional, half NULLs                              1.06    203.1±0.49µs        ? ?/sec    1.00    192.0±0.46µs        ? ?/sec
arrow_array_reader/Int64Array/byte_stream_split encoded, optional, no NULLs                                1.00    153.7±0.35µs        ? ?/sec    1.00    153.7±0.64µs        ? ?/sec
arrow_array_reader/Int64Array/dictionary encoded, mandatory, no NULLs                                      1.01    141.1±0.62µs        ? ?/sec    1.00    140.4±0.97µs        ? ?/sec
arrow_array_reader/Int64Array/dictionary encoded, optional, half NULLs                                     1.07    199.9±0.73µs        ? ?/sec    1.00    187.6±0.87µs        ? ?/sec
arrow_array_reader/Int64Array/dictionary encoded, optional, no NULLs                                       1.01    145.3±0.70µs        ? ?/sec    1.00    144.4±0.55µs        ? ?/sec
arrow_array_reader/Int64Array/plain encoded, mandatory, no NULLs                                           1.04     42.3±1.77µs        ? ?/sec    1.00     40.7±0.52µs        ? ?/sec
arrow_array_reader/Int64Array/plain encoded, optional, half NULLs                                          1.07    145.7±0.74µs        ? ?/sec    1.00    135.8±0.53µs        ? ?/sec
arrow_array_reader/Int64Array/plain encoded, optional, no NULLs                                            1.00     48.6±2.15µs        ? ?/sec    1.01     48.8±0.96µs        ? ?/sec
arrow_array_reader/ListArray/plain encoded optional strings half NULLs                                     1.00      7.4±0.04ms        ? ?/sec    1.00      7.4±0.04ms        ? ?/sec
arrow_array_reader/ListArray/plain encoded optional strings no NULLs                                       1.01     13.1±0.11ms        ? ?/sec    1.00     13.0±0.15ms        ? ?/sec
arrow_array_reader/StringArray/dictionary encoded, mandatory, no NULLs                                     1.22    490.5±2.04µs        ? ?/sec    1.00    400.5±1.32µs        ? ?/sec
arrow_array_reader/StringArray/dictionary encoded, optional, half NULLs                                    1.07    703.2±1.93µs        ? ?/sec    1.00    658.1±1.60µs        ? ?/sec
arrow_array_reader/StringArray/dictionary encoded, optional, no NULLs                                      1.21    494.6±2.49µs        ? ?/sec    1.00    408.2±1.64µs        ? ?/sec
arrow_array_reader/StringArray/plain encoded, mandatory, no NULLs                                          1.00    652.6±2.49µs        ? ?/sec    1.01    661.5±3.60µs        ? ?/sec
arrow_array_reader/StringArray/plain encoded, optional, half NULLs                                         1.01    818.5±5.65µs        ? ?/sec    1.00    812.1±3.26µs        ? ?/sec
arrow_array_reader/StringArray/plain encoded, optional, no NULLs                                           1.00    661.0±3.68µs        ? ?/sec    1.02    671.2±3.45µs        ? ?/sec
arrow_array_reader/StringDictionary/dictionary encoded, mandatory, no NULLs                                1.04    316.9±1.20µs        ? ?/sec    1.00    305.3±0.81µs        ? ?/sec
arrow_array_reader/StringDictionary/dictionary encoded, optional, half NULLs                               1.01    386.7±1.95µs        ? ?/sec    1.00    384.3±1.12µs        ? ?/sec
arrow_array_reader/StringDictionary/dictionary encoded, optional, no NULLs                                 1.03    321.8±1.09µs        ? ?/sec    1.00    311.0±0.92µs        ? ?/sec
arrow_array_reader/StringViewArray/dictionary encoded, mandatory, no NULLs                                 1.00    227.7±3.35µs        ? ?/sec    1.22    277.8±3.08µs        ? ?/sec
arrow_array_reader/StringViewArray/dictionary encoded, optional, half NULLs                                1.00    252.7±0.74µs        ? ?/sec    1.08    272.6±0.63µs        ? ?/sec
arrow_array_reader/StringViewArray/dictionary encoded, optional, no NULLs                                  1.00    233.4±2.67µs        ? ?/sec    1.18    275.9±2.60µs        ? ?/sec
arrow_array_reader/StringViewArray/plain encoded, mandatory, no NULLs                                      1.15    478.1±1.98µs        ? ?/sec    1.00    417.2±1.76µs        ? ?/sec
arrow_array_reader/StringViewArray/plain encoded, optional, half NULLs                                     1.08    383.6±1.94µs        ? ?/sec    1.00    355.4±1.71µs        ? ?/sec
arrow_array_reader/StringViewArray/plain encoded, optional, no NULLs                                       1.14    485.8±1.50µs        ? ?/sec    1.00    426.7±2.74µs        ? ?/sec

alamb

Thank you @jp0317 and @etseidl

I reviewed this PR carefully, and I believe that it:

It is better than the current code (Detects more errors / avoids panics on bad input)
Unlikely to impact performance (changes are either added to the metadata parsing code, or called once per page)

However, I would like @tustvold to give it a final look (or say he won't have time to do so) before we merge it

alamb · 2024-12-18T13:01:06Z

parquet/src/thrift.rs

@@ -67,7 +67,7 @@ impl<'a> TCompactSliceInputProtocol<'a> {
        let mut shift = 0;
        loop {
            let byte = self.read_byte()?;
-            in_progress |= ((byte & 0x7F) as u64) << shift;
+            in_progress |= ((byte & 0x7F) as u64).wrapping_shl(shift);


What is the purpose of this change?

As I understand it << panics in debug builds on overflow but wraps in release builds

It seems to me that this change now avoids panic'ing in debug builds too, which isn't obviously better to me

thanks for bringing this up! I'm not sure if << is guaranteed to always have the same behavior as wrapping_shl in release builds. On the other hand, imho this is not really a logic bug, and the only benefit keeping << is that debug mode can be used to identify invalid inputs (via panics), but i'm not sure if anyone will rely on that in practice.

Note the C++ code for this loop actually validates [total bytes decoded] (https://github.com/apache/thrift/blob/master/lib/cpp/src/thrift/protocol/TCompactProtocol.tcc#L759) which is probably a good idea (I think this prevents the panic that the original checked shift was meant to do?

thanks for the pointer. iiuc the c++ one may still have an overflow on the 10th bytes where it shifts 63 bits?

I might be misunderstanding but I thoght shifting by 63 bits is well defined on u64? So overflow yes but I wouldn't expect a panic? Also I think silently passing here if there is an overflow would be data corrupt data?

Maybe we can consider this change independently if we are concerned about perf impacts

overflow yes, i agree we can add check on # of bytes decoded separately. For this cl let's just solve the panic ?

emkornfield · 2024-12-18T17:58:01Z

parquet/src/schema/types.rs

@@ -556,7 +556,8 @@ impl<'a> PrimitiveTypeBuilder<'a> {
                }
            }
            PhysicalType::FIXED_LEN_BYTE_ARRAY => {
-                let max_precision = (2f64.powi(8 * self.length - 1) - 1f64).log10().floor() as i32;
+                let length = self.length.checked_mul(8).unwrap_or(i32::MAX);


should the overflow error instead of falling through to max precision?

I agree it would be better to return an error:

I double checked that using i32::MAX results in

Max precision: 2147483647

https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=8b510b23f773540e2b348e80f0691b41

sg, changed to returning error, thanks!

Feedback has been addressed

alamb · 2024-12-19T20:42:23Z

Thank you for all the work here @jp0317 - I think we have bikeshed this PR enough and plan to merge it in tomorrow unless there are any objections

FYI @tustvold I also dismissed your "request changes" review as from what I can see the changes you requested have been made. Please let me know if you disagree

github-actions bot added the parquet Changes to the parquet crate label Nov 16, 2024

jp0317 force-pushed the panic branch from 18c494b to 8235bf3 Compare November 16, 2024 06:25

Reduce panics

f481dff

jp0317 force-pushed the panic branch from 8235bf3 to f481dff Compare November 16, 2024 06:29

tustvold previously requested changes Nov 16, 2024

View reviewed changes

t pushmove integer logical type from format.rs to schema type.rs

a4f8286

jp0317 force-pushed the panic branch from 42f0223 to a4f8286 Compare November 19, 2024 03:30

jp0317 requested a review from tustvold November 19, 2024 03:32

jp0317 added 2 commits November 20, 2024 04:45

remove some changes as per reviews

a88bc81

use wrapping_shl

3c97bbc

etseidl reviewed Nov 20, 2024

View reviewed changes

fix typo in error message

a7db494

jp0317 requested a review from etseidl November 21, 2024 00:43

etseidl mentioned this pull request Nov 22, 2024

Read nested Parquet 2-level lists correctly #6757

Merged

etseidl approved these changes Nov 22, 2024

View reviewed changes

Merge remote-tracking branch 'apache/main' into panic

78994df

alamb mentioned this pull request Dec 17, 2024

Alamb/test parquet apache/datafusion#13820

Closed

alamb previously approved these changes Dec 17, 2024

View reviewed changes

alamb approved these changes Dec 18, 2024

View reviewed changes

emkornfield reviewed Dec 18, 2024

View reviewed changes

jp0317 and others added 2 commits December 19, 2024 10:47

Merge branch 'apache:main' into panic

55f3f64

return error for invalid decimal length

2597c9a

jp0317 force-pushed the panic branch from 21e7bf9 to 2597c9a Compare December 19, 2024 16:30

		let column_orders =
		Self::parse_column_orders(t_file_metadata.column_orders, &schema_descr)?;

	"Insufficient bytes to parse max statistic".to_string(),
	"Insufficient bytes to parse min statistic".to_string(),

Convert some panics that happen on invalid parquet files to error results #6738

Are you sure you want to change the base?

Convert some panics that happen on invalid parquet files to error results #6738

Conversation

jp0317 commented Nov 16, 2024

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

tustvold left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jp0317 commented Nov 19, 2024

jp0317 commented Nov 20, 2024

etseidl left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jp0317 commented Nov 21, 2024

etseidl left a comment

Choose a reason for hiding this comment

jp0317 commented Nov 26, 2024

jp0317 commented Dec 10, 2024

alamb left a comment

Choose a reason for hiding this comment

alamb commented Dec 17, 2024

alamb commented Dec 18, 2024

alamb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jp0317 Dec 18, 2024 • edited Loading

Choose a reason for hiding this comment

emkornfield Dec 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

emkornfield Dec 19, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alamb commented Dec 19, 2024

jp0317 Dec 18, 2024 •

edited

Loading

emkornfield Dec 18, 2024 •

edited

Loading

emkornfield Dec 19, 2024 •

edited

Loading