Skip to content

Improve Filter Handling in Join Optimization: Retain Inferred Predicates as Join Filters #17090

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

kosiew
Copy link
Contributor

@kosiew kosiew commented Aug 8, 2025

Which issue does this PR close?

Rationale for this change

Previously, inferred predicates that couldn't be pushed down through a join (due to join type restrictions) were discarded. This behavior missed opportunities to apply dynamic filter pushdown later in the optimization pipeline. By retaining such predicates as join filters, we enable further optimization opportunities and improve query performance.

What changes are included in this PR?

  • Updated push_down_all_join logic to retain inferred predicates as join filters when they can't be pushed down to either side.

  • Adjusted plan formatting to reflect join filters.

  • Added new test cases to cover:

    • Dynamic filter pushdown in left and right joins.
    • Correct handling of null-filtering predicates.
    • Ensuring proper formatting of optimized logical plans.

Are these changes tested?

Yes, this PR includes multiple unit tests to verify the correctness of the optimizer behavior for:

  • Left and right joins with pushable inferred predicates
  • Joins with predicates that allow nulls (ensuring dynamic filters are not incorrectly generated)
  • Plan formatting with join filters

Are there any user-facing changes?

Yes:

  • The optimized logical plan now includes inferred join filters that were previously dropped.
  • This results in improved visibility into the logical plan and enables downstream dynamic filter optimizations.

These changes are internal to query optimization and do not alter public APIs, but users may observe better query performance and more comprehensive filter handling in EXPLAIN plans.

kosiew added 2 commits August 8, 2025 19:18
feat: enhance predicate handling for join optimization

- Retain inferred predicates that cannot be pushed through joins as join filters for dynamic filter pushdown.
- Update join filter assertions in tests to reflect new logic.
- Add tests for dynamic filter pushdown scenarios, including:
  - Left join with a filter on the preserved side.
  - Right join with a filter on the preserved side.
  - Handling filters that do not restrict nulls.
```
@github-actions github-actions bot added the optimizer Optimizer rules label Aug 8, 2025
@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Aug 8, 2025
@github-actions github-actions bot added the logical-expr Logical plan and expressions label Aug 9, 2025
@kosiew kosiew force-pushed the non-inner-join-filter-pushdown-16973 branch from 6dddfc6 to 8337f80 Compare August 9, 2025 05:32
@github-actions github-actions bot added the core Core DataFusion crate label Aug 9, 2025
@@ -2730,7 +2787,7 @@ mod tests {
assert_optimized_plan_equal!(
plan,
@r"
Right Join: Using test.a = test2.a
Right Join: Using test.a = test2.a Filter: test.a <= Int64(1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These filters seem redundant (always true)?

@github-actions github-actions bot removed logical-expr Logical plan and expressions core Core DataFusion crate labels Aug 9, 2025
@kosiew kosiew force-pushed the non-inner-join-filter-pushdown-16973 branch from 9f9f0ac to 5e4472d Compare August 9, 2025 10:15
@github-actions github-actions bot added logical-expr Logical plan and expressions core Core DataFusion crate labels Aug 9, 2025
@kosiew kosiew force-pushed the non-inner-join-filter-pushdown-16973 branch from f8310ec to 5ae7863 Compare August 9, 2025 12:21
@adriangb
Copy link
Contributor

Does this actually close #16973? I don't see that it changes filter pushdown at the physical plan level at all which is what #16973 is talking about. The example in that issue involves a TopK and HashJoin operator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate logical-expr Logical plan and expressions optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enable dynamic filter pushdown for non-inner joins
3 participants