Skip to content

Conversation

samueleresca
Copy link
Member

@samueleresca samueleresca commented Oct 12, 2025

Which issue does this PR close?

Rationale for this change

These changes add a safer version of append_value in ByteViewBuilder that handles panics called try_append_value. Datafusions will consume the API and handle the Result coming back from the function.

What changes are included in this PR?

Are these changes tested?

The method is already covered by existing tests.

Are there any user-facing changes?

No breaking changes, as the original append_value method hasn't changed.

@github-actions github-actions bot added the arrow Changes to the arrow crate label Oct 12, 2025
@samueleresca samueleresca force-pushed the safer-appendvalue-bytes-view branch from 73faf99 to 8859ff7 Compare October 12, 2025 19:45
@samueleresca samueleresca marked this pull request as ready for review October 13, 2025 21:03
.map(u32::from_le_bytes)
.ok_or_else(|| {
ArrowError::InvalidArgumentError(
"String must be at least 4 bytes for non-inline view".to_string(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error is unreachable as we checked that the value is longer than MAX_INLINE_VIEW_LEN (12 bytes) above.

let offset = self.in_progress.len() as u32;
let offset: u32 = self.in_progress.len().try_into().map_err(|_| {
ArrowError::InvalidArgumentError(format!(
"In-progress buffer length {} exceeds u32::MAX",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I think the method can recover by starting a new in-progress buffer instead of returning an error here.

  2. I am unsure if this error is even reachable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a new buffer would be allocated in the line immediately above this. Maybe we should do a checked add in let required_cap = self.in_progress.len() + v.len(); 🤔

To error here, we would need a usize that doesn't fit into a u32.. I think all platforms we care about have usize that is at least u32 (aka 32-bit architectures)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To error here, we would need a usize that doesn't fit into a u32.. I think all platforms we care about have usize that is at least u32 (aka 32-bit architectures)

I think that would be the opposite, a usize in a 64-bit arch wouldn't fit a u32? Anyway, I will review and update these changes over the weekend

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right -- thank you

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this @samueleresca

let offset = self.in_progress.len() as u32;
let offset: u32 = self.in_progress.len().try_into().map_err(|_| {
ArrowError::InvalidArgumentError(format!(
"In-progress buffer length {} exceeds u32::MAX",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a new buffer would be allocated in the line immediately above this. Maybe we should do a checked add in let required_cap = self.in_progress.len() + v.len(); 🤔

To error here, we would need a usize that doesn't fit into a u32.. I think all platforms we care about have usize that is at least u32 (aka 32-bit architectures)

@alamb
Copy link
Contributor

alamb commented Oct 16, 2025

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.14.0-1016-gcp #17~24.04.1-Ubuntu SMP Wed Sep 3 01:55:36 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing safer-appendvalue-bytes-view (bfe53a8) to 5a384f4 diff
BENCH_NAME=view_types
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench view_types
BENCH_FILTER=
BENCH_BRANCH_NAME=safer-appendvalue-bytes-view
Results will be posted here when complete

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @samueleresca and @ctsk -- this change makes sense to me.

I have kicked off some benchmark runs just to make sure this doesn't affect performance somehow. Assuming they look good I think we can merge this one in

@alamb
Copy link
Contributor

alamb commented Oct 16, 2025

🤖: Benchmark completed

Details

group                                             main                                   safer-appendvalue-bytes-view
-----                                             ----                                   ----------------------------
gc view types all without nulls[100000]           1.02      2.5±0.07ms        ? ?/sec    1.00      2.5±0.08ms        ? ?/sec
gc view types all without nulls[8000]             1.01     92.9±2.25µs        ? ?/sec    1.00     92.1±3.18µs        ? ?/sec
gc view types all[100000]                         1.11   458.7±27.66µs        ? ?/sec    1.00   415.1±28.14µs        ? ?/sec
gc view types all[8000]                           1.31     32.7±5.98µs        ? ?/sec    1.00     24.9±6.02µs        ? ?/sec
gc view types slice half without nulls[100000]    1.02   783.5±26.10µs        ? ?/sec    1.00   771.4±38.10µs        ? ?/sec
gc view types slice half without nulls[8000]      1.00     40.6±2.12µs        ? ?/sec    1.01     41.0±1.36µs        ? ?/sec
gc view types slice half[100000]                  1.39    217.9±8.67µs        ? ?/sec    1.00   156.8±34.25µs        ? ?/sec
gc view types slice half[8000]                    1.00     15.7±3.11µs        ? ?/sec    1.02     16.0±1.30µs        ? ?/sec
view types slice                                  1.04  1093.2±43.30ns        ? ?/sec    1.00  1047.9±40.54ns        ? ?/sec

@alamb
Copy link
Contributor

alamb commented Oct 16, 2025

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.14.0-1016-gcp #17~24.04.1-Ubuntu SMP Wed Sep 3 01:55:36 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing safer-appendvalue-bytes-view (bfe53a8) to 5a384f4 diff
BENCH_NAME=filter_kernels
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench filter_kernels
BENCH_FILTER=
BENCH_BRANCH_NAME=safer-appendvalue-bytes-view
Results will be posted here when complete

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants