Skip to content

Commit

Permalink
Update parquet/src/arrow/array_reader/byte_view_array.rs
Browse files Browse the repository at this point in the history
Co-authored-by: Andrew Lamb <[email protected]>
  • Loading branch information
XiangpengHao and alamb authored Jul 6, 2024
1 parent 9c5b31c commit 6604216
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion parquet/src/arrow/array_reader/byte_view_array.rs
Original file line number Diff line number Diff line change
Expand Up @@ -347,7 +347,7 @@ impl ByteViewArrayDecoderPlain {
// I.e., the validation cannot validate the buffer in one pass, but instead, validate strings chunk by chunk.
//
// Given the above observations, the goal is to do batch validation as much as possible.
// The key idea is that if the length is smaller than 128 (99% of the case), then the length bytes are valid utf-8, as reasoned blow:
// The key idea is that if the length is smaller than 128 (99% of the case), then the length bytes are valid utf-8, as reasoned below:
// If the length is smaller than 128, its 4-byte encoding are [0, 0, 0, len].
// Each of the byte is a valid ASCII character, so they are valid utf-8.
// Since they are all smaller than 128, the won't break a utf-8 code point (won't mess with later bytes).
Expand Down

0 comments on commit 6604216

Please sign in to comment.