Skip to content

Commit

Permalink
Address review comments
Browse files Browse the repository at this point in the history
  • Loading branch information
pitrou committed Jan 29, 2024
1 parent e989f51 commit 88d3cbd
Show file tree
Hide file tree
Showing 2 changed files with 5 additions and 2 deletions.
2 changes: 1 addition & 1 deletion Encodings.md
Original file line number Diff line number Diff line change
Expand Up @@ -335,7 +335,7 @@ Note that, even for FIXED_LEN_BYTE_ARRAY, all lengths are encoded despite the re

### Byte Stream Split: (BYTE_STREAM_SPLIT = 9)

Supported Types: INT32, INT64, FLOAT, DOUBLE, FIXED_LEN_BYTE_ARRAY
Supported Types: FLOAT, DOUBLE, INT32, INT64, FIXED_LEN_BYTE_ARRAY

This encoding does not reduce the size of the data but can lead to a significantly better
compression ratio and speed when a compression algorithm is used afterwards.
Expand Down
5 changes: 4 additions & 1 deletion src/main/thrift/parquet.thrift
Original file line number Diff line number Diff line change
Expand Up @@ -526,12 +526,15 @@ enum Encoding {
*/
RLE_DICTIONARY = 8;

/** Encoding for fixed-width data (INT32, INT64, FLOAT, DOUBLE, FIXED_LEN_BYTE_ARRAY).
/** Encoding for fixed-width data (FLOAT, DOUBLE, INT32, INT64, FIXED_LEN_BYTE_ARRAY).
K byte-streams are created where K is the size in bytes of the data type.
The individual bytes of a value are scattered to the corresponding stream and
the streams are concatenated.
This itself does not reduce the size of the data but can lead to better compression
afterwards.
Added in 2.8 for FLOAT and DOUBLE.
Support for INT32, INT64 and FIXED_LEN_BYTE_ARRAY added in 2.11.
*/
BYTE_STREAM_SPLIT = 9;
}
Expand Down

0 comments on commit 88d3cbd

Please sign in to comment.