You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the Parquet format specification, under the section for Plain encoding, boolean is encoded using the deprecated bit-packed encoding. However, the section for bit-packed encoding specifies that it is only used for repetition/definition levels. This seems contradictory.
The section for RLE/bit-packed hybrid encoding says "Boolean values in data pages, as an alternative to PLAIN encoding" - perhaps we should be specific and indicate this is only used for data page V2?
Also, implementation-wise, I saw parquet-cpp still encode boolean as plain 1-bit value while parquet-mr uses bit-packed encoding as described in the specification. Perhaps consolidation should be done for this.
In the Parquet format specification, under the section for Plain encoding, boolean is encoded using the deprecated bit-packed encoding. However, the section for bit-packed encoding specifies that it is only used for repetition/definition levels. This seems contradictory.
The section for RLE/bit-packed hybrid encoding says "Boolean values in data pages, as an alternative to PLAIN encoding" - perhaps we should be specific and indicate this is only used for data page V2?
Also, implementation-wise, I saw parquet-cpp still encode boolean as plain 1-bit value while parquet-mr uses bit-packed encoding as described in the specification. Perhaps consolidation should be done for this.
Reporter: Chao Sun / @sunchao
Assignee: Chao Sun / @sunchao
Note: This issue was originally created as PARQUET-1249. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: