-
Notifications
You must be signed in to change notification settings - Fork 839
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update parquet encoding docs #5053
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM -- thank you @tustvold
parquet/README.md
Outdated
@@ -55,7 +55,7 @@ The `parquet` crate provides the following features which may be enabled in your | |||
|
|||
## Parquet Feature Status | |||
|
|||
- [x] All encodings supported | |||
- [x] All encodings except BYTE_STREAM_SPLIT supported |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can leave a link to #4102 to see if we can find someone to help
parquet/src/basic.rs
Outdated
/// Not all encodings are valid for all types. These enums are also used to specify the | ||
/// encoding of definition and repetition levels. | ||
/// | ||
/// Additionally, not all encodings are well supported by the broader ecosystem. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Additionally, not all encodings are well supported by the broader ecosystem. | |
/// Additionally, not all encodings are well supported by the broader Parquet ecosystem, such as parquet-mr. |
Should we call out parquet-mr? In general it would be great to add some specificity if possible (aka what does "broad" mean, can we give examples of widely used systems that only practically support PLAIN
/ RLE
/ RLE_DICTIONARY
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is mainly newer systems like DuckDB that have poor support for these encodings, I have opted to reword this slightly to emphasise that the defaults are supported by everything, as opposed to casting aspersions on the other encodings.
parquet/src/basic.rs
Outdated
@@ -304,6 +316,18 @@ impl FromStr for Encoding { | |||
// Mirrors `parquet::CompressionCodec` | |||
|
|||
/// Supported compression algorithms. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Supported compression algorithms. | |
/// Supported block compression algorithms. Defaults to ``Self::UNCOMPRESSED`` |
If we are changing the docs anyway, maybe we can also call out that the default compression is NONE
as I at least have been surprised by that in the past
Which issue does this PR close?
Closes #.
Rationale for this change
What changes are included in this PR?
Are there any user-facing changes?