-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PARQUET-2373: Improve I/O performance with bloom_filter_length #1184
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the change! I have left some comments.
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java
Outdated
Show resolved
Hide resolved
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/metadata/ColumnChunkMetaData.java
Show resolved
Hide resolved
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/metadata/ColumnChunkMetaData.java
Outdated
Show resolved
Hide resolved
@zhangjiashen This can be rebased to adopt parquet-format 2.10.0 |
78e7941
to
9f2738f
Compare
@wgtmac I just rebased with master branch and please help take a look when you get a chance? |
FYI, I've update a BloomFilter with length for testing: apache/parquet-testing#43 |
@zhangjiashen This needs to be rebased again. |
9f2738f
to
83a9777
Compare
parquet-hadoop/src/test/java/org/apache/parquet/hadoop/TestDataIndexBloomEncodingStats.java
Outdated
Show resolved
Hide resolved
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java
Outdated
Show resolved
Hide resolved
c36e236
to
2b1f135
Compare
The spec PARQUET-2257 has added bloom_filter_length for reader to load the bloom filter in a single shot. This implementation alters the code to make use of the 'bloom_filter_length' field for loading the bloom filter (consisting of the header and bitset) in order to enhance I/O scheduling.
Jira
Tests
Commits
Documentation