Skip to content

Conversation

@pitrou
Copy link
Member

@pitrou pitrou commented Oct 29, 2025

Rationale for this change

RLE-bit-packed streams are required by the Parquet spec to have 8-padded bit-packed runs, but some non-compliant encoders (such as the extremely old Impala 1.1.1) might generated a truncated last bit-packed run, which nevertheless contains enough logical values.

What changes are included in this PR?

  1. Compatibility code for non-compliant RLE streams as described above
  2. Guard against zero-size dictionaries to avoid hitting an assertion in DictionaryConverter

Are these changes tested?

Not yet, need to add a unit test.

Are there any user-facing changes?

Just a bugfix for an improbable situation.

@pitrou
Copy link
Member Author

pitrou commented Oct 29, 2025

@AntoinePrv FYI

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant