Performance improvement brought by row filter #1017

v0y4g3r · 2023-02-15T12:13:54Z

v0y4g3r
Feb 15, 2023
Maintainer

In #972 we introduced row level filter to improve queries to parquet files with explicit time range. I did a rough test on small parquet file (~60MB) to get a performance improvement by 8 times. I wonder what is the potential of row filters, so I did a benchmark on parquet readers with following configurations:

no row filters, all columns are fetched from parquet and we check for every element in timestamp column and discard the unrelated rows;

plain row filter that iterates timestamp column to find desired rows. This row filter only fetch timestamp column to evaluate the result;

greptimedb/src/storage/src/sst/parquet.rs

Lines 452 to 467 in 75b8afe

    
           macro_rules! downcast_and_compute { 
        
               ($array_ty: ty, $unit: ident) => {{ 
        
                       let ts_col = ts_col 
        
                       .as_any() 
        
                       .downcast_ref::<$array_ty>() 
        
                       .unwrap(); // safety: we've checked the data type of timestamp column. 
        
                       Ok(BooleanArray::from_iter(ts_col.iter().map(|ts| { 
        
                           ts.map(|val| { 
        
                               Timestamp::new(val, TimeUnit::$unit) 
        
                           }).map(|ts| { 
        
                               self.time_range.contains(&ts) 
        
                           }) 
        
                       }))) 
        
               }}; 
        
           }

fast row filters that leverages SIMD features of arrow comparison methods.

greptimedb/src/storage/src/sst/parquet.rs

Lines 390 to 402 in 75b8afe

    
           macro_rules! downcast_and_compute { 
        
               ($typ: ty) => { 
        
                   { 
        
                       let ts_col = ts_col 
        
                           .as_any() 
        
                           .downcast_ref::<$typ>() 
        
                           .unwrap(); // safety: we've checked the data type of timestamp column. 
        
                       let left = arrow::compute::gt_eq_scalar(ts_col, self.lower_bound)?; 
        
                       let right = arrow::compute::lt_scalar(ts_col, self.upper_bound)?; 
        
                       arrow::compute::and(&left, &right) 
        
                   } 
        
               }; 
        
           }

Benchmark

I created a 865MB parquet file with 86,400,000 rows inside, and the query condition is a small timestamp 1000~2000 and 1000 rows are expected to be retrieved.

$ bdt count --table test-read.parquet
+-----------------+
| COUNT(Int32(1)) |
+-----------------+
| 86400000        |
+-----------------+

# TYPE read_no_predicate summary
read_no_predicate{quantile="0"} 8.547431166
read_no_predicate{quantile="0.5"} 9.239173224293092
read_no_predicate{quantile="0.9"} 10.2451584382346
read_no_predicate{quantile="0.95"} 10.2451584382346
read_no_predicate{quantile="0.99"} 10.2451584382346
read_no_predicate{quantile="0.999"} 10.2451584382346
read_no_predicate{quantile="1"} 10.809900583
read_no_predicate_sum 90.521731663
read_no_predicate_count 10

# TYPE read_plain_predicate summary
read_plain_predicate{quantile="0"} 1.950291416
read_plain_predicate{quantile="0.5"} 2.100341598543225
read_plain_predicate{quantile="0.9"} 2.226653972003886
read_plain_predicate{quantile="0.95"} 2.226653972003886
read_plain_predicate{quantile="0.99"} 2.226653972003886
read_plain_predicate{quantile="0.999"} 2.226653972003886
read_plain_predicate{quantile="1"} 2.333021541
read_plain_predicate_sum 21.167984746000002
read_plain_predicate_count 10

# TYPE read_fast_predicate summary [simd enabled]
read_fast_predicate{quantile="0"} 1.65664675
read_fast_predicate{quantile="0.5"} 1.7998440044623896
read_fast_predicate{quantile="0.9"} 1.8285073842372965
read_fast_predicate{quantile="0.95"} 1.8285073842372965
read_fast_predicate{quantile="0.99"} 1.8285073842372965
read_fast_predicate{quantile="0.999"} 1.8285073842372965
read_fast_predicate{quantile="1"} 1.862670125
read_fast_predicate_sum 17.756668206
read_fast_predicate_count 10

Summary

We can expect about x5 performance improvement when scanning a small set of time range from a typical size parquet file (865MB).

The key to this improvement is, row filter only takes timestamp column to evaluate. After evaluation, only the rows matching time range are fetched and returned to parquet reader.

By introducing SIMD feature, the comparison between desired time range and row values are futher boosted, FastTimestampRowFilter takes only about 80% of PlainTimestampRowFilter's time to get the result.

Please also notice that, current query conditition will hit one or some consecutive row groups, so the row group pruning feature may also help a lot when row filter is not present.

greptimedb/src/storage/src/sst/parquet.rs

Lines 260 to 269 in 75b8afe

    
           let pruned_row_groups = self 
        
               .predicate 
        
               .prune_row_groups( 
        
                   store_schema.schema().clone(), 
        
                   builder.metadata().row_groups(), 
        
               ) 
        
               .into_iter() 
        
               .enumerate() 
        
               .filter_map(|(idx, valid)| if valid { Some(idx) } else { None }) 
        
               .collect::<Vec<_>>();

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Greptime

Performance improvement brought by row filter #1017

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Greptime

Performance improvement brought by row filter #1017

v0y4g3r Feb 15, 2023 Maintainer

Benchmark

Summary

Replies: 0 comments

v0y4g3r
Feb 15, 2023
Maintainer