Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fix read parquet for different encodings across row groups (facebooki…
…ncubator#9129) Summary: When the parquet file contains a column that uses both plain and dictionary encodings across row groups, the output after filter can be wrong. The root cause is that `returnReaderNulls_` in `SelectiveColumnReader` may not be reset when the column encoding switches from dictionary encoding to plain encoding. Here is the parquet metadata for the test file to reproduce this issue. `n_2` is dictionary encoded in row group 2 and is plain encoded in row group 3. ``` file: file:/home/sparkuser/github/oap-project/velox/velox/dwio/parquet/tests/examples/different_encodings_with_filter.parquet creator: parquet-cpp-arrow version 15.0.0 extra: ARROW:schema = /////9gAAAAQAAAAAAAKAAwABgAFAAgACgAAAAABBAAMAAAACAAIAAAABAAIAAAABAAAAAMAAABwAAAAMAAAAAQAAACs////AAABBRAAAAAYAAAABAAAAAAAAAADAAAAbl8yAAQABAAEAAAA1P///wAAAQIQAAAAFAAAAAQAAAAAAAAAAwAAAG5fMQDE////AAAAASAAAAAQABQACAAGAAcADAAAABAAEAAAAAAAAQIQAAAAHAAAAAQAAAAAAAAAAwAAAG5fMAAIAAwACAAHAAgAAAAAAAABIAAAAAAAAAA= file schema: schema -------------------------------------------------------------------------------- n_0: OPTIONAL INT32 R:0 D:1 n_1: OPTIONAL INT32 R:0 D:1 n_2: OPTIONAL BINARY O:UTF8 R:0 D:1 row group 1: RC:5 TS:185 OFFSET:4 -------------------------------------------------------------------------------- n_0: INT32 SNAPPY DO:0 FPO:4 SZ:72/71/0.99 VC:5 ENC:RLE,PLAIN ST:[min: 1, max: 5, num_nulls: 0] n_1: INT32 SNAPPY DO:0 FPO:143 SZ:65/71/1.09 VC:5 ENC:RLE,PLAIN ST:[min: 1, max: 2, num_nulls: 0] n_2: BINARY SNAPPY DO:0 FPO:276 SZ:45/43/0.96 VC:5 ENC:RLE,PLAIN ST:[min: A, max: B, num_nulls: 3] row group 2: RC:5 TS:202 OFFSET:369 -------------------------------------------------------------------------------- n_0: INT32 SNAPPY DO:0 FPO:369 SZ:72/71/0.99 VC:5 ENC:RLE,PLAIN ST:[min: 1, max: 5, num_nulls: 0] n_1: INT32 SNAPPY DO:0 FPO:509 SZ:65/71/1.09 VC:5 ENC:RLE,PLAIN ST:[min: 1, max: 2, num_nulls: 0] n_2: BINARY SNAPPY DO:642 FPO:668 SZ:64/60/0.94 VC:5 ENC:RLE,RLE_DICTIONARY,PLAIN ST:[min: A, max: B, num_nulls: 3] row group 3: RC:5 TS:195 OFFSET:766 -------------------------------------------------------------------------------- n_0: INT32 SNAPPY DO:0 FPO:766 SZ:73/71/0.97 VC:5 ENC:RLE,PLAIN ST:[min: 6, max: 10, num_nulls: 0] n_1: INT32 SNAPPY DO:0 FPO:907 SZ:72/71/0.99 VC:5 ENC:RLE,PLAIN ST:[min: 1, max: 3, num_nulls: 0] n_2: BINARY SNAPPY DO:0 FPO:1047 SZ:55/53/0.96 VC:5 ENC:RLE,PLAIN ST:[min: F, max: J, num_nulls: 1] ``` Pull Request resolved: facebookincubator#9129 Reviewed By: mbasmanova Differential Revision: D55018054 Pulled By: Yuhta fbshipit-source-id: 83dbab02575321930262628ea60cd4ad3e7b8892
- Loading branch information