You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently ColumnMetaData only contains bloom_filter_offset, which points to BloomFilterHeader followed by the bloom filter data.
This solution is not optimal during reading, as two IO reads are needed once we know bloom_filter_offset - one to read the header, which contains the size of the bloom filter, then another to read the actual bloom filter to a buffer. Having the size near bloom_filter_offset would allow to do this in a single read.
Having algorithm/hash/compression could be also useful by allowing skipping the read of the bloom filter if one of those parameters is not supported.
Currently ColumnMetaData only contains bloom_filter_offset, which points to BloomFilterHeader followed by the bloom filter data.
This solution is not optimal during reading, as two IO reads are needed once we know bloom_filter_offset - one to read the header, which contains the size of the bloom filter, then another to read the actual bloom filter to a buffer. Having the size near bloom_filter_offset would allow to do this in a single read.
Having algorithm/hash/compression could be also useful by allowing skipping the read of the bloom filter if one of those parameters is not supported.
Reporter: Csaba Ringhofer / @csringhofer
Note: This issue was originally created as PARQUET-1981. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: