Consider adding BloomFilterHeader to ColumnMetaData #285

asfimport · 2021-02-16T13:45:35Z

Currently ColumnMetaData only contains bloom_filter_offset, which points to BloomFilterHeader followed by the bloom filter data.

This solution is not optimal during reading, as two IO reads are needed once we know bloom_filter_offset - one to read the header, which contains the size of the bloom filter, then another to read the actual bloom filter to a buffer. Having the size near bloom_filter_offset would allow to do this in a single read.

Having algorithm/hash/compression could be also useful by allowing skipping the read of the bloom filter if one of those parameters is not supported.

Reporter: Csaba Ringhofer / @csringhofer

_{Note: This issue was originally created as PARQUET-1981. Please see the migration documentation for further details.}

wgtmac removed Component: Format labels Jul 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider adding BloomFilterHeader to ColumnMetaData #285

Consider adding BloomFilterHeader to ColumnMetaData #285

asfimport commented Feb 16, 2021

Consider adding BloomFilterHeader to ColumnMetaData #285

Consider adding BloomFilterHeader to ColumnMetaData #285

Comments

asfimport commented Feb 16, 2021