Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert some panics that happen on invalid parquet files to error results #6738

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

jp0317
Copy link
Contributor

@jp0317 jp0317 commented Nov 16, 2024

Which issue does this PR close?

This solves some of #6737.

Rationale for this change

Some code changes to replace some panics with proper errors

What changes are included in this PR?

Some codes that lead to panic are converted to returning error results.

Are there any user-facing changes?

Behavior change from panics to errors.

@github-actions github-actions bot added the parquet Changes to the parquet crate label Nov 16, 2024
Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about this, this seems to add a number of untested additional checks, some to very hot codepaths.

I suggest rather than just looking for things that might panic, instead going from a failing test to a fix. This would also better capture the more problematic cases where the reader gets stuck on malformed input, a panic is a good outcome IMO...

@@ -959,17 +959,18 @@ impl ColumnChunkMetaData {
}

/// Returns the offset and length in bytes of the column chunk within the file
pub fn byte_range(&self) -> (u64, u64) {
pub fn byte_range(&self) -> Result<(u64, u64)> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a breaking API change

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah..is it fine given that it just wraps the return value within Result? The behavior change is just "panic --> error".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No this is a breaking change

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed this change. btw i saw that the planned 54.0.0 will have "major api change", how is an api change considered as acceptable breaking change??

parquet/src/format.rs Outdated Show resolved Hide resolved
@@ -1227,6 +1231,10 @@ fn from_thrift_helper(elements: &[SchemaElement], index: usize) -> Result<(usize
if !is_root_node {
builder = builder.with_repetition(rep);
}
} else if !is_root_node {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is based on the comment at line 1230 which says All other types must have one, and the assert at line 1066: assert!(tp.get_basic_info.()has_repetitio())

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern with this check is unless it is necessary for correctness, there is potential adding it breaks something for someone. Parquet is a very broad ecosystem, lots of writers have interesting interpretations of the specification

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it breaks something for someone

if it happened then someone must have already experienced panics. But i get your point, removed this change. thx

@@ -67,7 +67,17 @@ impl<'a> TCompactSliceInputProtocol<'a> {
let mut shift = 0;
loop {
let byte = self.read_byte()?;
in_progress |= ((byte & 0x7F) as u64) << shift;
let val = (byte & 0x7F) as u64;
let val = val.checked_shl(shift).map_or_else(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very performance critical code path, this probably should use wrapping_shl

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm afraid wrapping might cause correctness issues...besides, would the checked_shl really make a noticeable performance difference here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't cause correctness issues, and yes it will matter. There are benchmarks that will likely show this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, changed to wrapping_shl, thx!

impl TInputProtocol for TCompactSliceInputProtocol<'_> {
fn read_message_begin(&mut self) -> thrift::Result<TMessageIdentifier> {
unimplemented!()
thrift_unimplemented!()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be unreachable, a panic is the correct thing to do here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iiuc it "should be unreachable" unless the input file is malformed? I guess this goes back to the discussion on how to handle invalid inputs:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it is actually genuinely unreachable, we don't use thrift messages

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed back to unimplemented

@jp0317
Copy link
Contributor Author

jp0317 commented Nov 19, 2024

Thanks @tustvold for the review!

this seems to add a number of untested additional checks...

I think the changes are more about converting panics to errors, rather than actual code logic.

looking for things that might panic, instead going from a failing test to a fix

these panics were triggered in my own fuzzing test with invalid parquet files. Nevertheless, i think it's a similar topic of "how to handle invalid inputs" as discussed in #5323. Reading this doc, imho errors better than panics unless it's really something unrecoverable.

@jp0317
Copy link
Contributor Author

jp0317 commented Nov 20, 2024

Hi @tustvold, I removed some changes based on your comment. PTAL, thanks!

Copy link
Contributor

@etseidl etseidl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Just a few nits. However, I wonder if perhaps this should be multiple PRs with rationales given for each change.

Comment on lines +620 to +621
let column_orders =
Self::parse_column_orders(t_file_metadata.column_orders, &schema_descr)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize this would currently panic, but would one ever prefer to just set column_orders to None and continue? The only impact AFAIK would be statistics being unusable, which would only matter if predicates were in use.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point! i agree with the setting to None idea. but i guess this worths a separate issue to discuss and fix.

if let Some(min) = min {
if min.len() < len {
return Err(ParquetError::General(
"Insufficient bytes to parse max statistic".to_string(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Insufficient bytes to parse max statistic".to_string(),
"Insufficient bytes to parse min statistic".to_string(),

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for catching this.

@jp0317
Copy link
Contributor Author

jp0317 commented Nov 21, 2024

Thanks! Just a few nits. However, I wonder if perhaps this should be multiple PRs with rationales given for each change.

Thanks for the review! imho these changes share the same rationale in that they just convert panics to errors

Copy link
Contributor

@etseidl etseidl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks correct to me. I personally would prefer errors to panics when processing multiple files.

@jp0317
Copy link
Contributor Author

jp0317 commented Nov 26, 2024

Hi @tustvold could you please take another look at this one? Thanks!

@jp0317
Copy link
Contributor Author

jp0317 commented Dec 10, 2024

Hi @tustvold could you please take another look at this one? Thanks!

just a friendly ping @tustvold :)

alamb
alamb previously approved these changes Dec 17, 2024
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much @jp0317 and @etseidl

This looks reasonable to me -- thank you 🙏

I also made a PR to test this out with the DataFusion tests as well: apache/datafusion#13820

Just as a way to double check this doesn't cause any unintended issues

@alamb alamb dismissed their stale review December 17, 2024 20:25

Wait to run benchmarks before approval

@alamb
Copy link
Contributor

alamb commented Dec 17, 2024

I ran the parquet benchmarks to verify there is no performance implications with this PR

My initial results seem to show no noticable pattern / difference in this PR than main. I will run it again to be sure.

Details

++ critcmp master panic
group                                                                                                      master                                 panic
-----                                                                                                      ------                                 -----
arrow_array_reader/BYTE_ARRAY/Decimal128Array/plain encoded, mandatory, no NULLs                           1.01   1083.3±9.48µs        ? ?/sec    1.00   1070.4±3.07µs        ? ?/sec
arrow_array_reader/BYTE_ARRAY/Decimal128Array/plain encoded, optional, half NULLs                          1.05   1218.9±3.27µs        ? ?/sec    1.00   1162.7±1.70µs        ? ?/sec
arrow_array_reader/BYTE_ARRAY/Decimal128Array/plain encoded, optional, no NULLs                            1.01   1085.7±3.01µs        ? ?/sec    1.00   1077.0±3.48µs        ? ?/sec
arrow_array_reader/BinaryArray/dictionary encoded, mandatory, no NULLs                                     1.19    474.6±3.46µs        ? ?/sec    1.00    399.4±1.83µs        ? ?/sec
arrow_array_reader/BinaryArray/dictionary encoded, optional, half NULLs                                    1.07    705.2±1.57µs        ? ?/sec    1.00    660.0±1.61µs        ? ?/sec
arrow_array_reader/BinaryArray/dictionary encoded, optional, no NULLs                                      1.20    485.9±6.66µs        ? ?/sec    1.00    404.9±2.86µs        ? ?/sec
arrow_array_reader/BinaryArray/plain encoded, mandatory, no NULLs                                          1.00    539.4±3.68µs        ? ?/sec    1.02    550.1±2.50µs        ? ?/sec
arrow_array_reader/BinaryArray/plain encoded, optional, half NULLs                                         1.00    751.6±5.43µs        ? ?/sec    1.00    754.5±2.25µs        ? ?/sec
arrow_array_reader/BinaryArray/plain encoded, optional, no NULLs                                           1.00    546.4±3.48µs        ? ?/sec    1.02    556.4±3.23µs        ? ?/sec
arrow_array_reader/BinaryViewArray/dictionary encoded, mandatory, no NULLs                                 1.00    226.4±3.34µs        ? ?/sec    1.22    276.8±2.89µs        ? ?/sec
arrow_array_reader/BinaryViewArray/dictionary encoded, optional, half NULLs                                1.00    253.7±0.71µs        ? ?/sec    1.08    272.8±0.79µs        ? ?/sec
arrow_array_reader/BinaryViewArray/dictionary encoded, optional, no NULLs                                  1.00    233.8±2.52µs        ? ?/sec    1.19    278.6±3.04µs        ? ?/sec
arrow_array_reader/BinaryViewArray/plain encoded, mandatory, no NULLs                                      1.18    373.4±4.79µs        ? ?/sec    1.00    316.4±1.83µs        ? ?/sec
arrow_array_reader/BinaryViewArray/plain encoded, mandatory, no NULLs, short string                        1.11    348.1±1.22µs        ? ?/sec    1.00    312.4±1.62µs        ? ?/sec
arrow_array_reader/BinaryViewArray/plain encoded, optional, half NULLs                                     1.12    343.9±1.96µs        ? ?/sec    1.00    307.5±0.87µs        ? ?/sec
arrow_array_reader/BinaryViewArray/plain encoded, optional, no NULLs                                       1.18    382.8±2.03µs        ? ?/sec    1.00    324.7±3.93µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/byte_stream_split encoded, mandatory, no NULLs     1.00    881.0±5.07µs        ? ?/sec    1.07    940.5±1.37µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/byte_stream_split encoded, optional, half NULLs    1.00    755.7±1.45µs        ? ?/sec    1.03    782.1±2.54µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/byte_stream_split encoded, optional, no NULLs      1.00    888.3±3.13µs        ? ?/sec    1.07    947.5±2.53µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/plain encoded, mandatory, no NULLs                 1.00    255.0±2.80µs        ? ?/sec    1.05    268.1±3.59µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/plain encoded, optional, half NULLs                1.02    459.8±3.34µs        ? ?/sec    1.00    452.2±2.30µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/plain encoded, optional, no NULLs                  1.00    266.7±4.40µs        ? ?/sec    1.02    271.7±4.04µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/byte_stream_split encoded, mandatory, no NULLs        1.00    114.3±0.26µs        ? ?/sec    1.07    121.9±0.40µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/byte_stream_split encoded, optional, half NULLs       1.17    245.8±0.80µs        ? ?/sec    1.00    209.3±0.66µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/byte_stream_split encoded, optional, no NULLs         1.00    119.8±0.31µs        ? ?/sec    1.06    126.9±0.40µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/plain encoded, mandatory, no NULLs                    1.00     37.5±0.11µs        ? ?/sec    1.00     37.5±0.06µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/plain encoded, optional, half NULLs                   1.24    207.2±0.87µs        ? ?/sec    1.00    166.4±0.24µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/plain encoded, optional, no NULLs                     1.00     42.6±0.14µs        ? ?/sec    1.00     42.4±0.09µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(16)/byte_stream_split encoded, mandatory, no NULLs                    1.00    687.3±1.18µs        ? ?/sec    1.07    736.8±2.17µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(16)/byte_stream_split encoded, optional, half NULLs                   1.00    567.1±1.21µs        ? ?/sec    1.03    583.1±4.18µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(16)/byte_stream_split encoded, optional, no NULLs                     1.00    693.4±1.45µs        ? ?/sec    1.07    742.2±2.03µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(16)/plain encoded, mandatory, no NULLs                                1.00     66.3±6.44µs        ? ?/sec    1.04     68.6±2.82µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(16)/plain encoded, optional, half NULLs                               1.02    256.5±1.49µs        ? ?/sec    1.00    252.5±2.04µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(16)/plain encoded, optional, no NULLs                                 1.06     77.7±6.21µs        ? ?/sec    1.00     73.3±3.61µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(2)/byte_stream_split encoded, mandatory, no NULLs                     1.00     85.8±0.27µs        ? ?/sec    1.09     93.4±0.25µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(2)/byte_stream_split encoded, optional, half NULLs                    1.20    217.2±0.77µs        ? ?/sec    1.00    180.4±0.79µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(2)/byte_stream_split encoded, optional, no NULLs                      1.00     90.7±0.40µs        ? ?/sec    1.09     98.7±0.29µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(2)/plain encoded, mandatory, no NULLs                                 1.00      8.8±0.26µs        ? ?/sec    1.00      8.8±0.22µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(2)/plain encoded, optional, half NULLs                                1.30    179.2±1.57µs        ? ?/sec    1.00    137.9±0.22µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(2)/plain encoded, optional, no NULLs                                  1.00     13.8±0.23µs        ? ?/sec    1.01     14.0±0.18µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(4)/byte_stream_split encoded, mandatory, no NULLs                     1.00    167.8±0.26µs        ? ?/sec    1.09    183.2±0.59µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(4)/byte_stream_split encoded, optional, half NULLs                    1.21    335.0±0.83µs        ? ?/sec    1.00    276.7±0.65µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(4)/byte_stream_split encoded, optional, no NULLs                      1.00    175.3±0.45µs        ? ?/sec    1.08    188.7±0.52µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(4)/plain encoded, mandatory, no NULLs                                 1.00     12.4±0.16µs        ? ?/sec    1.12     13.9±0.59µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(4)/plain encoded, optional, half NULLs                                1.34    259.0±0.54µs        ? ?/sec    1.00    193.0±0.33µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(4)/plain encoded, optional, no NULLs                                  1.00     19.1±0.30µs        ? ?/sec    1.04     19.8±1.04µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(8)/byte_stream_split encoded, mandatory, no NULLs                     1.00    341.3±1.07µs        ? ?/sec    1.07    364.9±0.86µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(8)/byte_stream_split encoded, optional, half NULLs                    1.00    388.9±0.93µs        ? ?/sec    1.00    389.3±0.76µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(8)/byte_stream_split encoded, optional, no NULLs                      1.00    348.9±0.77µs        ? ?/sec    1.07    371.8±1.39µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(8)/plain encoded, mandatory, no NULLs                                 1.01     25.6±0.56µs        ? ?/sec    1.00     25.2±0.55µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(8)/plain encoded, optional, half NULLs                                1.01    224.2±0.61µs        ? ?/sec    1.00    222.0±1.35µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(8)/plain encoded, optional, no NULLs                                  1.00     35.0±1.42µs        ? ?/sec    1.00     34.9±1.62µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/binary packed skip, optional, half NULLs                          1.00    140.0±0.27µs        ? ?/sec    1.02    142.7±0.46µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/binary packed skip, optional, no NULLs                            1.07    132.0±0.39µs        ? ?/sec    1.00    123.7±0.27µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/binary packed, mandatory, no NULLs                                1.03    183.4±0.46µs        ? ?/sec    1.00    178.7±0.90µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/binary packed, optional, half NULLs                               1.00    235.6±1.67µs        ? ?/sec    1.04    245.8±2.21µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/binary packed, optional, no NULLs                                 1.03    189.1±0.61µs        ? ?/sec    1.00    183.6±0.53µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/byte_stream_split encoded, mandatory, no NULLs                    1.00     77.6±0.34µs        ? ?/sec    1.00     77.5±0.39µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/byte_stream_split encoded, optional, half NULLs                   1.00    179.5±0.62µs        ? ?/sec    1.06    190.9±0.39µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/byte_stream_split encoded, optional, no NULLs                     1.01     82.7±0.38µs        ? ?/sec    1.00     82.2±0.20µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/dictionary encoded, mandatory, no NULLs                           1.00    186.1±0.33µs        ? ?/sec    1.00    185.9±0.27µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/dictionary encoded, optional, half NULLs                          1.00    239.8±0.82µs        ? ?/sec    1.05    251.3±9.54µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/dictionary encoded, optional, no NULLs                            1.00    191.8±0.40µs        ? ?/sec    1.00    191.5±0.35µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/plain encoded, mandatory, no NULLs                                1.00     72.7±0.50µs        ? ?/sec    1.01     73.7±0.40µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/plain encoded, optional, half NULLs                               1.00    177.1±0.39µs        ? ?/sec    1.07    188.7±0.46µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/plain encoded, optional, no NULLs                                 1.02     78.8±0.34µs        ? ?/sec    1.00     77.5±0.24µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed skip, mandatory, no NULLs                           1.00    109.9±0.63µs        ? ?/sec    1.02    112.3±0.40µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed skip, optional, half NULLs                          1.04    137.3±0.42µs        ? ?/sec    1.00    132.3±0.37µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed skip, optional, no NULLs                            1.00    111.0±0.34µs        ? ?/sec    1.03    114.6±0.32µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed, mandatory, no NULLs                                1.00    163.3±0.40µs        ? ?/sec    1.02    167.1±0.28µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed, optional, half NULLs                               1.05    246.6±1.32µs        ? ?/sec    1.00    234.5±0.51µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed, optional, no NULLs                                 1.00    168.5±0.55µs        ? ?/sec    1.02    172.1±0.29µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/byte_stream_split encoded, mandatory, no NULLs                    1.00    203.4±0.60µs        ? ?/sec    1.00    202.7±0.33µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/byte_stream_split encoded, optional, half NULLs                   1.05    262.8±1.36µs        ? ?/sec    1.00    249.7±1.02µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/byte_stream_split encoded, optional, no NULLs                     1.00    208.9±0.88µs        ? ?/sec    1.00    208.6±0.36µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/dictionary encoded, mandatory, no NULLs                           1.01    194.6±0.76µs        ? ?/sec    1.00    192.9±0.38µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/dictionary encoded, optional, half NULLs                          1.06    257.9±1.42µs        ? ?/sec    1.00    243.9±0.66µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/dictionary encoded, optional, no NULLs                            1.01    200.6±2.60µs        ? ?/sec    1.00    198.8±0.34µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/plain encoded, mandatory, no NULLs                                1.00     97.3±0.25µs        ? ?/sec    1.09    106.4±1.12µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/plain encoded, optional, half NULLs                               1.07    212.7±0.92µs        ? ?/sec    1.00    198.6±1.01µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/plain encoded, optional, no NULLs                                 1.00    104.8±1.14µs        ? ?/sec    1.10    115.5±0.64µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed skip, mandatory, no NULLs                                      1.06    101.6±0.28µs        ? ?/sec    1.00     95.6±0.28µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed skip, optional, half NULLs                                     1.00    111.5±0.22µs        ? ?/sec    1.02    113.4±0.46µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed skip, optional, no NULLs                                       1.08    104.4±0.27µs        ? ?/sec    1.00     96.2±0.33µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed, mandatory, no NULLs                                           1.03    131.2±0.44µs        ? ?/sec    1.00    126.9±0.38µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed, optional, half NULLs                                          1.00    179.8±0.51µs        ? ?/sec    1.06    189.7±0.53µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed, optional, no NULLs                                            1.03    136.1±0.44µs        ? ?/sec    1.00    131.7±0.33µs        ? ?/sec
arrow_array_reader/Int32Array/byte_stream_split encoded, mandatory, no NULLs                               1.05     25.6±0.26µs        ? ?/sec    1.00     24.4±0.07µs        ? ?/sec
arrow_array_reader/Int32Array/byte_stream_split encoded, optional, half NULLs                              1.00    123.9±0.40µs        ? ?/sec    1.08    134.5±0.36µs        ? ?/sec
arrow_array_reader/Int32Array/byte_stream_split encoded, optional, no NULLs                                1.05     30.2±0.28µs        ? ?/sec    1.00     28.8±0.08µs        ? ?/sec
arrow_array_reader/Int32Array/dictionary encoded, mandatory, no NULLs                                      1.02    134.7±0.43µs        ? ?/sec    1.00    132.3±0.27µs        ? ?/sec
arrow_array_reader/Int32Array/dictionary encoded, optional, half NULLs                                     1.00    183.9±0.64µs        ? ?/sec    1.05    193.9±0.31µs        ? ?/sec
arrow_array_reader/Int32Array/dictionary encoded, optional, no NULLs                                       1.02    139.4±0.45µs        ? ?/sec    1.00    136.5±0.34µs        ? ?/sec
arrow_array_reader/Int32Array/plain encoded, mandatory, no NULLs                                           1.00     17.9±0.63µs        ? ?/sec    1.00     17.9±0.78µs        ? ?/sec
arrow_array_reader/Int32Array/plain encoded, optional, half NULLs                                          1.00    118.4±0.33µs        ? ?/sec    1.10    130.2±0.35µs        ? ?/sec
arrow_array_reader/Int32Array/plain encoded, optional, no NULLs                                            1.00     23.3±0.33µs        ? ?/sec    1.01     23.5±0.64µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed skip, mandatory, no NULLs                                      1.00     82.4±0.27µs        ? ?/sec    1.03     84.7±0.29µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed skip, optional, half NULLs                                     1.04    108.5±0.40µs        ? ?/sec    1.00    104.6±0.40µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed skip, optional, no NULLs                                       1.00     83.2±0.22µs        ? ?/sec    1.05     87.6±0.36µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed, mandatory, no NULLs                                           1.00    106.8±0.60µs        ? ?/sec    1.04    111.5±0.52µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed, optional, half NULLs                                          1.05    180.6±0.41µs        ? ?/sec    1.00    172.1±0.77µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed, optional, no NULLs                                            1.00    111.4±0.47µs        ? ?/sec    1.04    116.0±0.54µs        ? ?/sec
arrow_array_reader/Int64Array/byte_stream_split encoded, mandatory, no NULLs                               1.00    148.8±0.29µs        ? ?/sec    1.00    149.0±0.70µs        ? ?/sec
arrow_array_reader/Int64Array/byte_stream_split encoded, optional, half NULLs                              1.06    204.1±0.59µs        ? ?/sec    1.00    191.9±0.40µs        ? ?/sec
arrow_array_reader/Int64Array/byte_stream_split encoded, optional, no NULLs                                1.00    153.6±0.42µs        ? ?/sec    1.00    153.5±0.39µs        ? ?/sec
arrow_array_reader/Int64Array/dictionary encoded, mandatory, no NULLs                                      1.02   143.6±17.83µs        ? ?/sec    1.00    140.7±0.70µs        ? ?/sec
arrow_array_reader/Int64Array/dictionary encoded, optional, half NULLs                                     1.07    199.9±0.92µs        ? ?/sec    1.00    186.9±0.80µs        ? ?/sec
arrow_array_reader/Int64Array/dictionary encoded, optional, no NULLs                                       1.00    145.2±0.50µs        ? ?/sec    1.00    144.8±0.70µs        ? ?/sec
arrow_array_reader/Int64Array/plain encoded, mandatory, no NULLs                                           1.05     42.8±2.98µs        ? ?/sec    1.00     40.7±2.26µs        ? ?/sec
arrow_array_reader/Int64Array/plain encoded, optional, half NULLs                                          1.09    147.3±0.48µs        ? ?/sec    1.00    135.6±0.42µs        ? ?/sec
arrow_array_reader/Int64Array/plain encoded, optional, no NULLs                                            1.00     45.3±2.72µs        ? ?/sec    1.06     47.9±2.78µs        ? ?/sec
arrow_array_reader/ListArray/plain encoded optional strings half NULLs                                     1.00      7.4±0.03ms        ? ?/sec    1.00      7.4±0.05ms        ? ?/sec
arrow_array_reader/ListArray/plain encoded optional strings no NULLs                                       1.00     13.1±0.11ms        ? ?/sec    1.00     13.2±0.14ms        ? ?/sec
arrow_array_reader/StringArray/dictionary encoded, mandatory, no NULLs                                     1.23    491.7±1.50µs        ? ?/sec    1.00    398.6±2.24µs        ? ?/sec
arrow_array_reader/StringArray/dictionary encoded, optional, half NULLs                                    1.07    706.5±1.27µs        ? ?/sec    1.00    663.1±2.02µs        ? ?/sec
arrow_array_reader/StringArray/dictionary encoded, optional, no NULLs                                      1.21    492.1±4.88µs        ? ?/sec    1.00    406.8±2.01µs        ? ?/sec
arrow_array_reader/StringArray/plain encoded, mandatory, no NULLs                                          1.02    652.5±3.21µs        ? ?/sec    1.00    640.8±2.96µs        ? ?/sec
arrow_array_reader/StringArray/plain encoded, optional, half NULLs                                         1.00    794.8±2.75µs        ? ?/sec    1.02    812.4±3.20µs        ? ?/sec
arrow_array_reader/StringArray/plain encoded, optional, no NULLs                                           1.02    661.7±3.06µs        ? ?/sec    1.00    649.2±3.18µs        ? ?/sec
arrow_array_reader/StringDictionary/dictionary encoded, mandatory, no NULLs                                1.05    321.0±2.76µs        ? ?/sec    1.00    305.9±1.18µs        ? ?/sec
arrow_array_reader/StringDictionary/dictionary encoded, optional, half NULLs                               1.00    385.0±1.55µs        ? ?/sec    1.00    385.3±1.52µs        ? ?/sec
arrow_array_reader/StringDictionary/dictionary encoded, optional, no NULLs                                 1.03    321.4±1.79µs        ? ?/sec    1.00    312.9±0.76µs        ? ?/sec
arrow_array_reader/StringViewArray/dictionary encoded, mandatory, no NULLs                                 1.00    238.4±3.04µs        ? ?/sec    1.13    269.9±3.39µs        ? ?/sec
arrow_array_reader/StringViewArray/dictionary encoded, optional, half NULLs                                1.00    252.2±0.81µs        ? ?/sec    1.11    280.9±0.77µs        ? ?/sec
arrow_array_reader/StringViewArray/dictionary encoded, optional, no NULLs                                  1.00    232.6±2.74µs        ? ?/sec    1.22    282.8±2.79µs        ? ?/sec
arrow_array_reader/StringViewArray/plain encoded, mandatory, no NULLs                                      1.14    478.4±1.87µs        ? ?/sec    1.00    420.7±1.76µs        ? ?/sec
arrow_array_reader/StringViewArray/plain encoded, optional, half NULLs                                     1.09    388.8±1.97µs        ? ?/sec    1.00    355.9±1.13µs        ? ?/sec
arrow_array_reader/StringViewArray/plain encoded, optional, no NULLs                                       1.13    486.7±3.31µs        ? ?/sec    1.00    429.4±2.38µs        ? ?/sec

@alamb
Copy link
Contributor

alamb commented Dec 18, 2024

My second performance run likewise shows now particular peformance changes (there is a lot of noise with some benchmarks being faster and some being slower)

Details

++ critcmp master panic
group                                                                                                      master                                 panic
-----                                                                                                      ------                                 -----
arrow_array_reader/BYTE_ARRAY/Decimal128Array/plain encoded, mandatory, no NULLs                           1.00   1076.3±2.20µs        ? ?/sec    1.00   1071.1±3.04µs        ? ?/sec
arrow_array_reader/BYTE_ARRAY/Decimal128Array/plain encoded, optional, half NULLs                          1.05  1221.6±35.71µs        ? ?/sec    1.00   1166.5±2.18µs        ? ?/sec
arrow_array_reader/BYTE_ARRAY/Decimal128Array/plain encoded, optional, no NULLs                            1.01   1085.4±6.20µs        ? ?/sec    1.00   1075.8±2.70µs        ? ?/sec
arrow_array_reader/BinaryArray/dictionary encoded, mandatory, no NULLs                                     1.21    485.5±3.26µs        ? ?/sec    1.00    401.1±2.35µs        ? ?/sec
arrow_array_reader/BinaryArray/dictionary encoded, optional, half NULLs                                    1.07    700.5±1.01µs        ? ?/sec    1.00    656.2±1.33µs        ? ?/sec
arrow_array_reader/BinaryArray/dictionary encoded, optional, no NULLs                                      1.25    485.3±1.33µs        ? ?/sec    1.00    388.2±3.31µs        ? ?/sec
arrow_array_reader/BinaryArray/plain encoded, mandatory, no NULLs                                          1.00    544.1±3.74µs        ? ?/sec    1.03    559.7±3.06µs        ? ?/sec
arrow_array_reader/BinaryArray/plain encoded, optional, half NULLs                                         1.00    735.6±2.51µs        ? ?/sec    1.04    762.6±3.18µs        ? ?/sec
arrow_array_reader/BinaryArray/plain encoded, optional, no NULLs                                           1.00    548.7±4.11µs        ? ?/sec    1.03    566.2±3.81µs        ? ?/sec
arrow_array_reader/BinaryViewArray/dictionary encoded, mandatory, no NULLs                                 1.00    230.4±2.52µs        ? ?/sec    1.17    270.2±2.54µs        ? ?/sec
arrow_array_reader/BinaryViewArray/dictionary encoded, optional, half NULLs                                1.00    253.3±1.98µs        ? ?/sec    1.11    280.7±0.59µs        ? ?/sec
arrow_array_reader/BinaryViewArray/dictionary encoded, optional, no NULLs                                  1.00    236.3±3.13µs        ? ?/sec    1.20    284.7±2.70µs        ? ?/sec
arrow_array_reader/BinaryViewArray/plain encoded, mandatory, no NULLs                                      1.18    369.4±2.30µs        ? ?/sec    1.00    313.0±1.47µs        ? ?/sec
arrow_array_reader/BinaryViewArray/plain encoded, mandatory, no NULLs, short string                        1.12    345.1±1.20µs        ? ?/sec    1.00    309.1±1.11µs        ? ?/sec
arrow_array_reader/BinaryViewArray/plain encoded, optional, half NULLs                                     1.11    338.0±1.88µs        ? ?/sec    1.00    305.0±0.97µs        ? ?/sec
arrow_array_reader/BinaryViewArray/plain encoded, optional, no NULLs                                       1.17    377.7±1.10µs        ? ?/sec    1.00    322.6±1.06µs        ? ?/sec
arrow_array_reader/BYTE_ARRAY/Decimal128Array/plain encoded, mandatory, no NULLs                           1.00   1076.3±2.20µs        ? ?/sec    1.00   1071.1±3.04µs        ? ?/sec
arrow_array_reader/BYTE_ARRAY/Decimal128Array/plain encoded, optional, half NULLs                          1.05  1221.6±35.71µs        ? ?/sec    1.00   1166.5±2.18µs        ? ?/sec
arrow_array_reader/BYTE_ARRAY/Decimal128Array/plain encoded, optional, no NULLs                            1.01   1085.4±6.20µs        ? ?/sec    1.00   1075.8±2.70µs        ? ?/sec
arrow_array_reader/BinaryArray/dictionary encoded, mandatory, no NULLs                                     1.21    485.5±3.26µs        ? ?/sec    1.00    401.1±2.35µs        ? ?/sec
arrow_array_reader/BinaryArray/dictionary encoded, optional, half NULLs                                    1.07    700.5±1.01µs        ? ?/sec    1.00    656.2±1.33µs        ? ?/sec
arrow_array_reader/BinaryArray/dictionary encoded, optional, no NULLs                                      1.25    485.3±1.33µs        ? ?/sec    1.00    388.2±3.31µs        ? ?/sec
arrow_array_reader/BinaryArray/plain encoded, mandatory, no NULLs                                          1.00    544.1±3.74µs        ? ?/sec    1.03    559.7±3.06µs        ? ?/sec
arrow_array_reader/BinaryArray/plain encoded, optional, half NULLs                                         1.00    735.6±2.51µs        ? ?/sec    1.04    762.6±3.18µs        ? ?/sec
arrow_array_reader/BinaryArray/plain encoded, optional, no NULLs                                           1.00    548.7±4.11µs        ? ?/sec    1.03    566.2±3.81µs        ? ?/sec
arrow_array_reader/BinaryViewArray/dictionary encoded, mandatory, no NULLs                                 1.00    230.4±2.52µs        ? ?/sec    1.17    270.2±2.54µs        ? ?/sec
arrow_array_reader/BinaryViewArray/dictionary encoded, optional, half NULLs                                1.00    253.3±1.98µs        ? ?/sec    1.11    280.7±0.59µs        ? ?/sec
arrow_array_reader/BinaryViewArray/dictionary encoded, optional, no NULLs                                  1.00    236.3±3.13µs        ? ?/sec    1.20    284.7±2.70µs        ? ?/sec
arrow_array_reader/BinaryViewArray/plain encoded, mandatory, no NULLs                                      1.18    369.4±2.30µs        ? ?/sec    1.00    313.0±1.47µs        ? ?/sec
arrow_array_reader/BinaryViewArray/plain encoded, mandatory, no NULLs, short string                        1.12    345.1±1.20µs        ? ?/sec    1.00    309.1±1.11µs        ? ?/sec
arrow_array_reader/BinaryViewArray/plain encoded, optional, half NULLs                                     1.11    338.0±1.88µs        ? ?/sec    1.00    305.0±0.97µs        ? ?/sec
arrow_array_reader/BinaryViewArray/plain encoded, optional, no NULLs                                       1.17    377.7±1.10µs        ? ?/sec    1.00    322.6±1.06µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/byte_stream_split encoded, mandatory, no NULLs     1.00    881.0±2.59µs        ? ?/sec    1.07    941.8±2.70µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/byte_stream_split encoded, optional, half NULLs    1.00    770.3±2.67µs        ? ?/sec    1.03    792.2±2.45µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/byte_stream_split encoded, optional, no NULLs      1.00    899.0±3.05µs        ? ?/sec    1.06    951.0±2.63µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/plain encoded, mandatory, no NULLs                 1.00    254.4±1.08µs        ? ?/sec    1.07    273.1±3.08µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/plain encoded, optional, half NULLs                1.00    457.8±1.79µs        ? ?/sec    1.00    457.3±2.98µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/plain encoded, optional, no NULLs                  1.00    269.1±2.79µs        ? ?/sec    1.06    286.3±3.69µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/byte_stream_split encoded, mandatory, no NULLs        1.00    114.2±0.21µs        ? ?/sec    1.07    122.2±0.25µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/byte_stream_split encoded, optional, half NULLs       1.17    246.5±0.59µs        ? ?/sec    1.00    210.0±1.10µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/byte_stream_split encoded, optional, no NULLs         1.00    119.7±0.32µs        ? ?/sec    1.07    128.5±2.17µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/plain encoded, mandatory, no NULLs                    1.01     37.8±0.11µs        ? ?/sec    1.00     37.4±0.18µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/plain encoded, optional, half NULLs                   1.25    207.7±0.26µs        ? ?/sec    1.00    166.0±0.40µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/plain encoded, optional, no NULLs                     1.00     42.7±0.15µs        ? ?/sec    1.00     42.6±0.13µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(16)/byte_stream_split encoded, mandatory, no NULLs                    1.00    689.7±1.93µs        ? ?/sec    1.07    735.8±1.57µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(16)/byte_stream_split encoded, optional, half NULLs                   1.00    563.7±1.78µs        ? ?/sec    1.04    585.2±1.88µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(16)/byte_stream_split encoded, optional, no NULLs                     1.00    697.3±1.83µs        ? ?/sec    1.06    742.3±2.07µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(16)/plain encoded, mandatory, no NULLs                                1.25     70.7±1.86µs        ? ?/sec    1.00     56.6±3.79µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(16)/plain encoded, optional, half NULLs                               1.00    250.3±1.97µs        ? ?/sec    1.00    251.1±1.27µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(16)/plain encoded, optional, no NULLs                                 1.20     78.0±1.75µs        ? ?/sec    1.00     64.8±4.23µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(2)/byte_stream_split encoded, mandatory, no NULLs                     1.00     85.4±0.26µs        ? ?/sec    1.10     93.7±0.27µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(2)/byte_stream_split encoded, optional, half NULLs                    1.20    216.4±0.91µs        ? ?/sec    1.00    180.0±0.31µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(2)/byte_stream_split encoded, optional, no NULLs                      1.00     91.2±0.32µs        ? ?/sec    1.08     98.8±0.38µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(2)/plain encoded, mandatory, no NULLs                                 1.00      8.9±0.21µs        ? ?/sec    1.00      8.9±0.16µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(2)/plain encoded, optional, half NULLs                                1.30    177.5±0.56µs        ? ?/sec    1.00    136.6±0.36µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(2)/plain encoded, optional, no NULLs                                  1.00     13.7±0.18µs        ? ?/sec    1.01     13.9±0.24µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(4)/byte_stream_split encoded, mandatory, no NULLs                     1.00    169.7±0.54µs        ? ?/sec    1.08    183.2±0.45µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(4)/byte_stream_split encoded, optional, half NULLs                    1.22    332.9±0.88µs        ? ?/sec    1.00    273.6±0.35µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(4)/byte_stream_split encoded, optional, no NULLs                      1.00    174.6±0.55µs        ? ?/sec    1.08    188.5±0.35µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(4)/plain encoded, mandatory, no NULLs                                 1.00     12.8±0.19µs        ? ?/sec    1.07     13.7±0.29µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(4)/plain encoded, optional, half NULLs                                1.36    256.8±0.74µs        ? ?/sec    1.00    189.2±0.66µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(4)/plain encoded, optional, no NULLs                                  1.00     18.2±0.26µs        ? ?/sec    1.05     19.1±0.36µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(8)/byte_stream_split encoded, mandatory, no NULLs                     1.00    340.0±1.07µs        ? ?/sec    1.07    364.6±0.78µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(8)/byte_stream_split encoded, optional, half NULLs                    1.00    387.0±3.17µs        ? ?/sec    1.00    387.3±1.12µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(8)/byte_stream_split encoded, optional, no NULLs                      1.00    346.6±0.83µs        ? ?/sec    1.07    371.2±1.05µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(8)/plain encoded, mandatory, no NULLs                                 1.08     27.7±0.99µs        ? ?/sec    1.00     25.7±0.37µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(8)/plain encoded, optional, half NULLs                                1.01    222.3±0.66µs        ? ?/sec    1.00    219.1±1.84µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(8)/plain encoded, optional, no NULLs                                  1.14     36.0±1.94µs        ? ?/sec    1.00     31.5±0.33µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/binary packed skip, mandatory, no NULLs                           1.02    128.7±0.41µs        ? ?/sec    1.00    125.7±0.29µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/binary packed skip, optional, half NULLs                          1.00    138.6±0.40µs        ? ?/sec    1.04    144.4±0.46µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/binary packed skip, optional, no NULLs                            1.02    131.9±0.42µs        ? ?/sec    1.00    129.4±1.54µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/binary packed, mandatory, no NULLs                                1.00    183.4±0.53µs        ? ?/sec    1.03    188.4±0.63µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/binary packed, optional, half NULLs                               1.00    233.9±0.48µs        ? ?/sec    1.06    248.3±0.70µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/binary packed, optional, no NULLs                                 1.00    188.9±0.66µs        ? ?/sec    1.02    193.6±0.57µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/byte_stream_split encoded, mandatory, no NULLs                    1.00     77.2±0.22µs        ? ?/sec    1.01     77.6±0.34µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/byte_stream_split encoded, optional, half NULLs                   1.00    177.9±0.58µs        ? ?/sec    1.07    191.1±0.48µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/byte_stream_split encoded, optional, no NULLs                     1.00     82.6±0.39µs        ? ?/sec    1.00     82.6±0.29µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/dictionary encoded, mandatory, no NULLs                           1.00    186.3±0.59µs        ? ?/sec    1.00    185.9±0.53µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/dictionary encoded, optional, half NULLs                          1.00    237.0±0.55µs        ? ?/sec    1.05    249.2±1.25µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/dictionary encoded, optional, no NULLs                            1.00    191.3±0.64µs        ? ?/sec    1.01    192.3±0.91µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/plain encoded, mandatory, no NULLs                                1.00     71.7±0.33µs        ? ?/sec    1.02     73.5±0.24µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/plain encoded, optional, half NULLs                               1.00    175.3±0.45µs        ? ?/sec    1.07    188.3±1.20µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/plain encoded, optional, no NULLs                                 1.00     77.2±0.29µs        ? ?/sec    1.00     77.4±0.60µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed skip, mandatory, no NULLs                           1.00    108.8±0.25µs        ? ?/sec    1.02    111.5±0.22µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed skip, optional, half NULLs                          1.05    139.3±0.33µs        ? ?/sec    1.00    132.2±0.29µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed skip, optional, no NULLs                            1.00    111.7±0.28µs        ? ?/sec    1.02    114.0±0.59µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed, mandatory, no NULLs                                1.00    161.5±0.44µs        ? ?/sec    1.00    161.3±0.45µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed, optional, half NULLs                               1.05    242.8±0.49µs        ? ?/sec    1.00    231.1±0.85µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed, optional, no NULLs                                 1.00    166.7±0.44µs        ? ?/sec    1.00    167.1±0.45µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/byte_stream_split encoded, mandatory, no NULLs                    1.01    204.8±0.48µs        ? ?/sec    1.00    201.9±0.61µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/byte_stream_split encoded, optional, half NULLs                   1.04    261.0±0.49µs        ? ?/sec    1.00    250.0±0.93µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/byte_stream_split encoded, optional, no NULLs                     1.02    210.9±0.61µs        ? ?/sec    1.00    207.0±0.34µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/dictionary encoded, mandatory, no NULLs                           1.00    193.6±0.38µs        ? ?/sec    1.00    194.0±0.56µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/dictionary encoded, optional, half NULLs                          1.05    254.5±1.28µs        ? ?/sec    1.00    242.2±0.69µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/dictionary encoded, optional, no NULLs                            1.01    199.1±0.55µs        ? ?/sec    1.00    198.0±0.41µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/plain encoded, mandatory, no NULLs                                1.00    100.1±0.65µs        ? ?/sec    1.10    110.1±1.44µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/plain encoded, optional, half NULLs                               1.05    207.3±0.68µs        ? ?/sec    1.00    197.2±0.50µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/plain encoded, optional, no NULLs                                 1.00    107.5±0.97µs        ? ?/sec    1.11    119.6±1.25µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed skip, mandatory, no NULLs                                      1.02    101.5±0.38µs        ? ?/sec    1.00     99.4±0.20µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed skip, optional, half NULLs                                     1.00    112.6±0.45µs        ? ?/sec    1.03    115.5±0.29µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed skip, optional, no NULLs                                       1.02    104.0±0.22µs        ? ?/sec    1.00    101.8±0.29µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed, mandatory, no NULLs                                           1.00    130.4±0.40µs        ? ?/sec    1.05    136.8±0.29µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed, optional, half NULLs                                          1.00    179.1±0.41µs        ? ?/sec    1.08    192.8±0.66µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed, optional, no NULLs                                            1.00    135.1±0.37µs        ? ?/sec    1.05    141.2±0.60µs        ? ?/sec
arrow_array_reader/Int32Array/byte_stream_split encoded, mandatory, no NULLs                               1.00     24.5±0.08µs        ? ?/sec    1.06     26.0±0.29µs        ? ?/sec
arrow_array_reader/Int32Array/byte_stream_split encoded, optional, half NULLs                              1.00    123.4±0.61µs        ? ?/sec    1.10    135.3±0.34µs        ? ?/sec
arrow_array_reader/Int32Array/byte_stream_split encoded, optional, no NULLs                                1.00     28.8±0.09µs        ? ?/sec    1.05     30.3±0.58µs        ? ?/sec
arrow_array_reader/Int32Array/dictionary encoded, mandatory, no NULLs                                      1.00    134.7±0.33µs        ? ?/sec    1.00    134.5±5.11µs        ? ?/sec
arrow_array_reader/Int32Array/dictionary encoded, optional, half NULLs                                     1.00    183.0±0.35µs        ? ?/sec    1.06    194.6±0.73µs        ? ?/sec
arrow_array_reader/Int32Array/dictionary encoded, optional, no NULLs                                       1.01    139.4±0.32µs        ? ?/sec    1.00    138.1±0.26µs        ? ?/sec
arrow_array_reader/Int32Array/plain encoded, mandatory, no NULLs                                           1.01     17.9±0.67µs        ? ?/sec    1.00     17.7±0.44µs        ? ?/sec
arrow_array_reader/Int32Array/plain encoded, optional, half NULLs                                          1.00    118.3±0.32µs        ? ?/sec    1.10    130.1±0.25µs        ? ?/sec
arrow_array_reader/Int32Array/plain encoded, optional, no NULLs                                            1.01     23.4±0.43µs        ? ?/sec    1.00     23.2±0.34µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed skip, mandatory, no NULLs                                      1.00     81.2±0.33µs        ? ?/sec    1.03     83.5±1.07µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed skip, optional, half NULLs                                     1.04    107.9±0.36µs        ? ?/sec    1.00    103.5±0.29µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed skip, optional, no NULLs                                       1.00     83.3±0.52µs        ? ?/sec    1.05     87.5±0.43µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed, mandatory, no NULLs                                           1.00    107.1±0.55µs        ? ?/sec    1.01    108.4±0.59µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed, optional, half NULLs                                          1.06    179.0±0.71µs        ? ?/sec    1.00    169.0±0.64µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed, optional, no NULLs                                            1.00    111.5±1.06µs        ? ?/sec    1.00    111.2±0.51µs        ? ?/sec
arrow_array_reader/Int64Array/byte_stream_split encoded, mandatory, no NULLs                               1.00    149.0±0.40µs        ? ?/sec    1.00    149.5±2.65µs        ? ?/sec
arrow_array_reader/Int64Array/byte_stream_split encoded, optional, half NULLs                              1.06    203.1±0.49µs        ? ?/sec    1.00    192.0±0.46µs        ? ?/sec
arrow_array_reader/Int64Array/byte_stream_split encoded, optional, no NULLs                                1.00    153.7±0.35µs        ? ?/sec    1.00    153.7±0.64µs        ? ?/sec
arrow_array_reader/Int64Array/dictionary encoded, mandatory, no NULLs                                      1.01    141.1±0.62µs        ? ?/sec    1.00    140.4±0.97µs        ? ?/sec
arrow_array_reader/Int64Array/dictionary encoded, optional, half NULLs                                     1.07    199.9±0.73µs        ? ?/sec    1.00    187.6±0.87µs        ? ?/sec
arrow_array_reader/Int64Array/dictionary encoded, optional, no NULLs                                       1.01    145.3±0.70µs        ? ?/sec    1.00    144.4±0.55µs        ? ?/sec
arrow_array_reader/Int64Array/plain encoded, mandatory, no NULLs                                           1.04     42.3±1.77µs        ? ?/sec    1.00     40.7±0.52µs        ? ?/sec
arrow_array_reader/Int64Array/plain encoded, optional, half NULLs                                          1.07    145.7±0.74µs        ? ?/sec    1.00    135.8±0.53µs        ? ?/sec
arrow_array_reader/Int64Array/plain encoded, optional, no NULLs                                            1.00     48.6±2.15µs        ? ?/sec    1.01     48.8±0.96µs        ? ?/sec
arrow_array_reader/ListArray/plain encoded optional strings half NULLs                                     1.00      7.4±0.04ms        ? ?/sec    1.00      7.4±0.04ms        ? ?/sec
arrow_array_reader/ListArray/plain encoded optional strings no NULLs                                       1.01     13.1±0.11ms        ? ?/sec    1.00     13.0±0.15ms        ? ?/sec
arrow_array_reader/StringArray/dictionary encoded, mandatory, no NULLs                                     1.22    490.5±2.04µs        ? ?/sec    1.00    400.5±1.32µs        ? ?/sec
arrow_array_reader/StringArray/dictionary encoded, optional, half NULLs                                    1.07    703.2±1.93µs        ? ?/sec    1.00    658.1±1.60µs        ? ?/sec
arrow_array_reader/StringArray/dictionary encoded, optional, no NULLs                                      1.21    494.6±2.49µs        ? ?/sec    1.00    408.2±1.64µs        ? ?/sec
arrow_array_reader/StringArray/plain encoded, mandatory, no NULLs                                          1.00    652.6±2.49µs        ? ?/sec    1.01    661.5±3.60µs        ? ?/sec
arrow_array_reader/StringArray/plain encoded, optional, half NULLs                                         1.01    818.5±5.65µs        ? ?/sec    1.00    812.1±3.26µs        ? ?/sec
arrow_array_reader/StringArray/plain encoded, optional, no NULLs                                           1.00    661.0±3.68µs        ? ?/sec    1.02    671.2±3.45µs        ? ?/sec
arrow_array_reader/StringDictionary/dictionary encoded, mandatory, no NULLs                                1.04    316.9±1.20µs        ? ?/sec    1.00    305.3±0.81µs        ? ?/sec
arrow_array_reader/StringDictionary/dictionary encoded, optional, half NULLs                               1.01    386.7±1.95µs        ? ?/sec    1.00    384.3±1.12µs        ? ?/sec
arrow_array_reader/StringDictionary/dictionary encoded, optional, no NULLs                                 1.03    321.8±1.09µs        ? ?/sec    1.00    311.0±0.92µs        ? ?/sec
arrow_array_reader/StringViewArray/dictionary encoded, mandatory, no NULLs                                 1.00    227.7±3.35µs        ? ?/sec    1.22    277.8±3.08µs        ? ?/sec
arrow_array_reader/StringViewArray/dictionary encoded, optional, half NULLs                                1.00    252.7±0.74µs        ? ?/sec    1.08    272.6±0.63µs        ? ?/sec
arrow_array_reader/StringViewArray/dictionary encoded, optional, no NULLs                                  1.00    233.4±2.67µs        ? ?/sec    1.18    275.9±2.60µs        ? ?/sec
arrow_array_reader/StringViewArray/plain encoded, mandatory, no NULLs                                      1.15    478.1±1.98µs        ? ?/sec    1.00    417.2±1.76µs        ? ?/sec
arrow_array_reader/StringViewArray/plain encoded, optional, half NULLs                                     1.08    383.6±1.94µs        ? ?/sec    1.00    355.4±1.71µs        ? ?/sec
arrow_array_reader/StringViewArray/plain encoded, optional, no NULLs                                       1.14    485.8±1.50µs        ? ?/sec    1.00    426.7±2.74µs        ? ?/sec

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @jp0317 and @etseidl

I reviewed this PR carefully, and I believe that it:

  1. It is better than the current code (Detects more errors / avoids panics on bad input)
  2. Unlikely to impact performance (changes are either added to the metadata parsing code, or called once per page)

However, I would like @tustvold to give it a final look (or say he won't have time to do so) before we merge it

@@ -67,7 +67,7 @@ impl<'a> TCompactSliceInputProtocol<'a> {
let mut shift = 0;
loop {
let byte = self.read_byte()?;
in_progress |= ((byte & 0x7F) as u64) << shift;
in_progress |= ((byte & 0x7F) as u64).wrapping_shl(shift);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of this change?

As I understand it << panics in debug builds on overflow but wraps in release builds

It seems to me that this change now avoids panic'ing in debug builds too, which isn't obviously better to me

Copy link
Contributor Author

@jp0317 jp0317 Dec 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for bringing this up! I'm not sure if << is guaranteed to always have the same behavior as wrapping_shl in release builds. On the other hand, imho this is not really a logic bug, and the only benefit keeping << is that debug mode can be used to identify invalid inputs (via panics), but i'm not sure if anyone will rely on that in practice.

Copy link
Contributor

@emkornfield emkornfield Dec 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note the C++ code for this loop actually validates [total bytes decoded] (https://github.com/apache/thrift/blob/master/lib/cpp/src/thrift/protocol/TCompactProtocol.tcc#L759) which is probably a good idea (I think this prevents the panic that the original checked shift was meant to do?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the pointer. iiuc the c++ one may still have an overflow on the 10th bytes where it shifts 63 bits?

Copy link
Contributor

@emkornfield emkornfield Dec 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be misunderstanding but I thoght shifting by 63 bits is well defined on u64? So overflow yes but I wouldn't expect a panic? Also I think silently passing here if there is an overflow would be data corrupt data?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can consider this change independently if we are concerned about perf impacts

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overflow yes, i agree we can add check on # of bytes decoded separately. For this cl let's just solve the panic ?

@@ -556,7 +556,8 @@ impl<'a> PrimitiveTypeBuilder<'a> {
}
}
PhysicalType::FIXED_LEN_BYTE_ARRAY => {
let max_precision = (2f64.powi(8 * self.length - 1) - 1f64).log10().floor() as i32;
let length = self.length.checked_mul(8).unwrap_or(i32::MAX);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should the overflow error instead of falling through to max precision?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it would be better to return an error:

I double checked that using i32::MAX results in

Max precision: 2147483647

https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=8b510b23f773540e2b348e80f0691b41

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sg, changed to returning error, thanks!

@alamb alamb dismissed tustvold’s stale review December 19, 2024 20:41

Feedback has been addressed

@alamb
Copy link
Contributor

alamb commented Dec 19, 2024

Thank you for all the work here @jp0317 - I think we have bikeshed this PR enough and plan to merge it in tomorrow unless there are any objections

FYI @tustvold I also dismissed your "request changes" review as from what I can see the changes you requested have been made. Please let me know if you disagree

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants