Skip to content

Enable dynamic filter pushdown for LEFT/RIGHT/SEMI/ANTI/Mark joins; surface probe metadata in plans; add join-preservation docs #17090

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 80 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
80 commits
Select commit Hold shift + click to select a range
2a194f1
Merge branch 'main' into non-inner-join-filter-pushdown-16973
kosiew Aug 8, 2025
ab81469
```
kosiew Aug 7, 2025
15c8897
Refactor join logic in test file to improve filter application and ex…
kosiew Aug 8, 2025
c960bda
Fix filter application order in RightAnti join execution plan
kosiew Aug 8, 2025
1076e8b
Enhance dynamic filter pushdown documentation for joins pushdown
kosiew Aug 12, 2025
fb1c85c
Implement dynamic filter pushdown for various join types in HashJoinE…
kosiew Aug 12, 2025
adf1207
Implement dynamic filter handling for various join types in HashJoinExec
kosiew Aug 12, 2025
081bed0
Refactor static filter pushdown test to include predicate in DataSour…
kosiew Aug 12, 2025
4675cbc
Add predicate information to explain tree output for parquet format
kosiew Aug 12, 2025
78b9f37
Implement deduplication for join key pairs and partition expressions;…
kosiew Aug 9, 2025
94e3492
Merge branch 'main' into non-inner-join-filter-pushdown-16973
kosiew Aug 12, 2025
6527085
Refactor: Remove inferred predicate alias handling and related tests
kosiew Aug 12, 2025
444ed6d
Merge branch 'pushdown-16973' into non-inner-join-filter-pushdown-16973
kosiew Aug 12, 2025
b258ead
fix fmt errors
kosiew Aug 12, 2025
ea4776a
Merge branch 'main' into non-inner-join-filter-pushdown-16973
kosiew Aug 13, 2025
4a497ff
docs: improve documentation for dynamic filter pushdown configuration
kosiew Aug 13, 2025
f78b8a5
test: add dynamic filter pushdown tests for right semi and right anti…
kosiew Aug 13, 2025
e336ad9
Enhance documentation for dynamic filters in joins
kosiew Aug 13, 2025
d24cd4f
Add tests for disabled dynamic filter pushdown on hash joins
kosiew Aug 13, 2025
9f4411a
Refactor dynamic filtering in HashJoinExec
kosiew Aug 13, 2025
e5bf484
Update explain_tree.slt to reflect changes in predicate representatio…
kosiew Aug 13, 2025
1eba0d0
Refactor dynamic filter pushdown documentation for clarity and comple…
kosiew Aug 13, 2025
c48e8c1
Improve changelog entry for dynamic filter pushdown in version 50.0.0…
kosiew Aug 13, 2025
41f2e1e
Enhance upgrading documentation to include details on dynamic filter …
kosiew Aug 13, 2025
4d507c4
Update dynamic filter pushdown description for clarity and detail
kosiew Aug 13, 2025
25254ab
Enhance documentation for dynamic filter pushdown, clarifying join ty…
kosiew Aug 13, 2025
7ec2553
Update predicate display in TestSource to show '<none>' for absent pr…
kosiew Aug 13, 2025
25b4dfb
Add tests for dynamic filter pushdown in HashJoinExec, including null…
kosiew Aug 13, 2025
6d5b6ce
Update dynamic filter pushdown documentation and configuration descri…
kosiew Aug 13, 2025
9c2ba0b
Enhance dynamic filter pushdown documentation to include support for …
kosiew Aug 13, 2025
051c310
Update predicate display in explain_tree.slt to show 'true' instead o…
kosiew Aug 13, 2025
feeb231
Update dynamic filter pushdown description to clarify requirements fo…
kosiew Aug 13, 2025
f9c3cd0
Enhance dynamic filter functionality by adding key count tracking and…
kosiew Aug 13, 2025
47c2491
Refactor dynamic filter handling in HashJoinExec to clarify probe sid…
kosiew Aug 13, 2025
2c32098
Enforce non-empty ON clause requirement in HashJoinExec and enhance d…
kosiew Aug 13, 2025
ec47ed7
Add join preservation utilities and refactor related functions for cl…
kosiew Aug 13, 2025
5eabb36
Refactor dynamic filter display logic in HashJoinExec for clarity and…
kosiew Aug 13, 2025
014bed8
Refactor dynamic filter handling in joins and update related document…
kosiew Aug 13, 2025
92ed50d
Enhance dynamic filter observability and error handling. Add inline c…
kosiew Aug 13, 2025
853fa6c
Enhance dynamic filter pushdown tests and output formatting. Update a…
kosiew Aug 13, 2025
5d2b60c
Refactor dynamic filter handling and enhance documentation. Clean up …
kosiew Aug 13, 2025
51bcf4a
Enhance dynamic filter pushdown tests and improve output clarity. Add…
kosiew Aug 14, 2025
4ceb8dd
Enhance join execution details in tests. Update expected output to in…
kosiew Aug 14, 2025
10c2171
Enhance multi_hash_joins test output. Include probe side and probe ke…
kosiew Aug 14, 2025
02596fc
Fix probe side and keys in multi_hash_joins test output. Adjust forma…
kosiew Aug 14, 2025
24f6367
Fix probe side determination in multi_hash_joins test output. Adjust …
kosiew Aug 14, 2025
0b1fc31
Implement snap changes to enhance functionality and improve performance
kosiew Aug 14, 2025
910a7ed
Enhance test output for filter pushdown and projection pushdown. Incl…
kosiew Aug 14, 2025
f64d072
Fix probe side determination in HashJoinExec output. Adjust formattin…
kosiew Aug 14, 2025
40d36d7
Fix HashJoinExec output to include probe side and keys for improved c…
kosiew Aug 14, 2025
4f9f94a
Fix HashJoinExec output to include probe side and keys for improved c…
kosiew Aug 14, 2025
be9d8de
Fix assertions in hash join dynamic filter pushdown tests to correctl…
kosiew Aug 14, 2025
0cc204e
Add probe side and keys information to HashJoinExec output in join se…
kosiew Aug 14, 2025
6958ffd
Merge branch 'pushdown-16973' into non-inner-join-filter-pushdown-16973
kosiew Aug 14, 2025
b38b0b9
Merge branch 'main' into non-inner-join-filter-pushdown-16973
kosiew Aug 14, 2025
b8f5d5c
Remove redundant tests for right semi and right anti dynamic filter p…
kosiew Aug 14, 2025
8b1bb07
Enhance dynamic filter pushdown tests for right semi and right anti j…
kosiew Aug 14, 2025
5c03033
Fix fmt errors
kosiew Aug 14, 2025
a49749b
Update documentation for dynamic filters in joins to clarify filter t…
kosiew Aug 14, 2025
f81a9ab
Fix clippy error
kosiew Aug 14, 2025
d7e36cb
prettier config docs
kosiew Aug 14, 2025
146fc28
fix(tests): update snapshot for topk dynamic filter pushdown test
kosiew Aug 14, 2025
f197d1b
fix(tests): update assertions for dynamic filter pushdown tests
kosiew Aug 14, 2025
3aa1659
feat(config): add option to enable dynamic filter pushdown in optimizer
kosiew Aug 14, 2025
49cafb3
fix(docs): enhance description for dynamic filter pushdown in optimizer
kosiew Aug 14, 2025
90c15f4
Bless SLT outputs via CI (--complete)
github-actions[bot] Aug 14, 2025
d07afea
feat(join): add preservation methods for join types and remove unused…
kosiew Aug 15, 2025
8f63f08
Merge branch 'main' into non-inner-join-filter-pushdown-16973
kosiew Aug 15, 2025
01c366f
docs(tests): enhance comments for join type handling and dynamic filt…
kosiew Aug 15, 2025
bafcf29
test: enhance dynamic filter pushdown tests for hash joins
kosiew Aug 15, 2025
1395b49
refactor(tests): remove redundant hash join parent filter pushdown test
kosiew Aug 15, 2025
4799c01
refactor(tests): consolidate tests
kosiew Aug 15, 2025
b2cc0a5
refactor(tests): clean up imports and remove unused join type referen…
kosiew Aug 15, 2025
7a3c6db
Rearrange tests to minimize diff
kosiew Aug 15, 2025
08a0e29
docs(tests): add issue reference for dynamic filter pushdown test
kosiew Aug 15, 2025
7b6f778
refactor(tests): add dynamic filter location assertion and predicate …
kosiew Aug 15, 2025
3023d88
fix tests
kosiew Aug 15, 2025
4beb2f7
refactor(tests): remove unused predicate accessor from TestSource
kosiew Aug 15, 2025
f999914
test: add async test for hash join with probe filter
kosiew Aug 15, 2025
704aba7
Merge branch 'main' into non-inner-join-filter-pushdown-16973
kosiew Aug 15, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 37 additions & 5 deletions datafusion/common/src/config.rs
Original file line number Diff line number Diff line change
Expand Up @@ -725,11 +725,43 @@ config_namespace! {
/// during aggregations, if possible
pub enable_topk_aggregation: bool, default = true

/// When set to true attempts to push down dynamic filters generated by operators into the file scan phase.
/// For example, for a query such as `SELECT * FROM t ORDER BY timestamp DESC LIMIT 10`, the optimizer
/// will attempt to push down the current top 10 timestamps that the TopK operator references into the file scans.
/// This means that if we already have 10 timestamps in the year 2025
/// any files that only have timestamps in the year 2024 can be skipped / pruned at various stages in the scan.
/// When set to true attempts to push down dynamic filters generated by operators
/// into the file scan phase. For example, for a query such as
/// `SELECT * FROM t ORDER BY timestamp DESC LIMIT 10`, the optimizer will attempt
/// to push down the current top 10 timestamps that the TopK operator references
/// into the file scans. This means that if we already have 10 timestamps in the
/// year 2025 any files that only have timestamps in the year 2024 can be skipped /
/// pruned at various stages in the scan.
///
/// Dynamic filters are also produced by joins. At runtime, DataFusion applies
/// the filter to one input to prune work. `HashJoinExec` builds from its left
/// input and probes with its right input, but the dynamic filter target (the
/// side we prune) depends on the join type:
///
/// | Join type | Dynamic filter target |
/// |--------------------------|-----------------------|
/// | `Inner`, `Left` | Right input |
/// | `Right` | Left input |
/// | `LeftSemi`, `LeftAnti` | Left input |
/// | `RightSemi`, `RightAnti`| Right input |
/// | `LeftMark` | Right input |
/// | `RightMark` | Left input |
/// | `Full` | Not supported |
///
/// Non-equi join predicates do **not** generate dynamic filters; they require
/// range analysis and cross-conjunct reasoning (future work). Composite
/// predicates only derive filters from their equi-conjuncts, and rows with
/// `NULL` join keys (see [`crate::NullEquality::NullEqualsNothing`]) do not contribute
/// filter values. Enabling `optimizer.filter_null_join_keys` can remove such
/// rows up front.
///
/// Pushdown is effective only when the file source supports predicate pushdown
/// (e.g. Parquet) and `execution.parquet.pushdown_filters` is `true`; formats
/// without predicate pushdown (CSV/JSON) see no benefit. See the upgrade guide
/// for additional details and examples. For example,
/// `SELECT * FROM fact LEFT JOIN dim ON fact.id = dim.id WHERE dim.region = 'US'`
/// will only read `fact` rows whose `id` values match `dim` rows where
/// `region = 'US'`.
pub enable_dynamic_filter_pushdown: bool, default = true

/// When set to true, the optimizer will insert filters before a join between
Expand Down
57 changes: 57 additions & 0 deletions datafusion/common/src/join_type.rs
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,63 @@ impl JoinType {
| JoinType::RightAnti
)
}
/// Returns true if the left side of this join preserves its input rows
/// for filters applied *after* the join.
#[inline]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see a lot of use of #[inline]. My understanding is that without specific evidence that it helps performance it may actually sometimes hurt it and it's best to not throw it around unless it's very obvious or can be proven to help performance.

pub const fn preserves_left_for_output_filters(self) -> bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I arrived at this same refactor in #17153 - I think it's a good one. Can we pull this out into its own PR?

matches!(
self,
JoinType::Inner
| JoinType::Left
| JoinType::LeftSemi
| JoinType::LeftAnti
| JoinType::LeftMark,
)
}

/// Returns true if the right side of this join preserves its input rows
/// for filters applied *after* the join.
#[inline]
pub const fn preserves_right_for_output_filters(self) -> bool {
matches!(
self,
JoinType::Inner
| JoinType::Right
| JoinType::RightSemi
| JoinType::RightAnti
| JoinType::RightMark,
)
}

/// Returns true if the left side of this join preserves its input rows
/// for filters in the join condition (ON-clause filters).
#[inline]
pub const fn preserves_left_for_on_filters(self) -> bool {
matches!(
self,
JoinType::Inner
| JoinType::Right
| JoinType::LeftSemi
| JoinType::RightSemi
| JoinType::RightAnti
| JoinType::RightMark,
)
}

/// Returns true if the right side of this join preserves its input rows
/// for filters in the join condition (ON-clause filters).
#[inline]
pub const fn preserves_right_for_on_filters(self) -> bool {
matches!(
self,
JoinType::Inner
| JoinType::Left
| JoinType::LeftSemi
| JoinType::RightSemi
| JoinType::LeftAnti
| JoinType::LeftMark,
)
}
}

impl Display for JoinType {
Expand Down
Loading