Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Utility trait for stats-based skipping logic (#357)
Parquet footer stats allow data skipping, very similar to Delta file stats. Except parquet isn't quite as convenient to work with and arrow-parquet doesn't even try to help (it can't, because arrow-compute expressions are opaque, so there's no way to traverse and rewrite them into stats-based skipping predicates). We implement row group skipping support by traversing the same push-down predicate that delta-kernel already uses to extract a for Delta file skipping predicate. But instead of rewriting the expression, we evaluate it bottom-up (no-copy, O(n) work where n is the number of nodes in the expression). This PR does not attempt to actually incorporate the new skipping logic into the default reader. That (plus testing the integration) should be a follow-up PR.
- Loading branch information