Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider adding BloomFilterHeader to ColumnMetaData #285

Open
asfimport opened this issue Feb 16, 2021 · 0 comments
Open

Consider adding BloomFilterHeader to ColumnMetaData #285

asfimport opened this issue Feb 16, 2021 · 0 comments

Comments

@asfimport
Copy link
Collaborator

Currently ColumnMetaData only contains bloom_filter_offset, which points to BloomFilterHeader followed by the bloom filter data.

This solution is not optimal during reading, as two IO reads are needed once we know bloom_filter_offset - one to read the header, which contains the size of the bloom filter, then another to read the actual bloom filter to a buffer. Having the size near bloom_filter_offset would allow to do this in a single read.

Having algorithm/hash/compression could be also useful by allowing skipping the read of the bloom filter if one of those parameters is not supported.

Reporter: Csaba Ringhofer / @csringhofer

Note: This issue was originally created as PARQUET-1981. Please see the migration documentation for further details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants