Skip to content

Commit

Permalink
Better document parquet pushdown (apache#5491)
Browse files Browse the repository at this point in the history
  • Loading branch information
tustvold authored Mar 11, 2024
1 parent aad42b5 commit 51bcadb
Show file tree
Hide file tree
Showing 2 changed files with 16 additions and 0 deletions.
4 changes: 4 additions & 0 deletions parquet/src/arrow/arrow_reader/filter.rs
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,10 @@ where
/// leaves 99% of the rows, it may be better to not filter the data from parquet and
/// apply the filter after the RecordBatch has been fully decoded.
///
/// Additionally, even if a predicate eliminates a moderate number of rows, it may still be faster
/// to filter the data after the RecordBatch has been fully decoded, if the eliminated rows are
/// not contiguous.
///
/// [`RowSelection`]: crate::arrow::arrow_reader::RowSelection
pub struct RowFilter {
/// A list of [`ArrowPredicate`]
Expand Down
12 changes: 12 additions & 0 deletions parquet/src/arrow/arrow_reader/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -141,6 +141,9 @@ impl<T> ArrowReaderBuilder<T> {
/// An example use case of this would be applying a selection determined by
/// evaluating predicates against the [`Index`]
///
/// It is recommended to enable reading the page index if using this functionality, to allow
/// more efficient skipping over data pages. See [`ArrowReaderOptions::with_page_index`]
///
/// [`Index`]: crate::file::page_index::index::Index
pub fn with_row_selection(self, selection: RowSelection) -> Self {
Self {
Expand All @@ -152,6 +155,9 @@ impl<T> ArrowReaderBuilder<T> {
/// Provide a [`RowFilter`] to skip decoding rows
///
/// Row filters are applied after row group selection and row selection
///
/// It is recommended to enable reading the page index if using this functionality, to allow
/// more efficient skipping over data pages. See [`ArrowReaderOptions::with_page_index`].
pub fn with_row_filter(self, filter: RowFilter) -> Self {
Self {
filter: Some(filter),
Expand All @@ -163,6 +169,9 @@ impl<T> ArrowReaderBuilder<T> {
///
/// The limit will be applied after any [`Self::with_row_selection`] and [`Self::with_row_filter`]
/// allowing it to limit the final set of rows decoded after any pushed down predicates
///
/// It is recommended to enable reading the page index if using this functionality, to allow
/// more efficient skipping over data pages. See [`ArrowReaderOptions::with_page_index`]
pub fn with_limit(self, limit: usize) -> Self {
Self {
limit: Some(limit),
Expand All @@ -174,6 +183,9 @@ impl<T> ArrowReaderBuilder<T> {
///
/// The offset will be applied after any [`Self::with_row_selection`] and [`Self::with_row_filter`]
/// allowing it to skip rows after any pushed down predicates
///
/// It is recommended to enable reading the page index if using this functionality, to allow
/// more efficient skipping over data pages. See [`ArrowReaderOptions::with_page_index`]
pub fn with_offset(self, offset: usize) -> Self {
Self {
offset: Some(offset),
Expand Down

0 comments on commit 51bcadb

Please sign in to comment.