Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify encoding schemes for boolean types #347

Open
asfimport opened this issue Mar 17, 2018 · 3 comments
Open

Clarify encoding schemes for boolean types #347

asfimport opened this issue Mar 17, 2018 · 3 comments

Comments

@asfimport
Copy link
Collaborator

asfimport commented Mar 17, 2018

In the Parquet format specification, under the section for Plain encoding, boolean is encoded using the deprecated bit-packed encoding. However, the section for bit-packed encoding specifies that it is only used for repetition/definition levels. This seems contradictory.

The section for RLE/bit-packed hybrid encoding says "Boolean values in data pages, as an alternative to PLAIN encoding" - perhaps we should be specific and indicate this is only used for data page V2?

Also, implementation-wise, I saw parquet-cpp still encode boolean as plain 1-bit value while parquet-mr uses bit-packed encoding as described in the specification. Perhaps consolidation should be done for this.

Reporter: Chao Sun / @sunchao
Assignee: Chao Sun / @sunchao

Note: This issue was originally created as PARQUET-1249. Please see the migration documentation for further details.

@asfimport
Copy link
Collaborator Author

Chao Sun / @sunchao:
Trying to make my first contribution here. Can someone add me as contributor? cc @rdblue, @wesm, @xhochy

@asfimport
Copy link
Collaborator Author

Wes McKinney / @wesm:
Added you

@asfimport
Copy link
Collaborator Author

Chao Sun / @sunchao:
Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants