Skip to content

Add dynamic filter (bounds) pushdown to HashJoinExec #16445

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

adriangb
Copy link
Contributor

Part of #7955.

My goal here is to lay the groundwork for pushing down joins.
I am only implementing bounds pushdown because I am sure that is cheap and it will probably be quite effective in many cases. And it will be ~ easy to push down a reference to the whole hash table in a followup PR.

Another followup that can be done is to enable parent filter pushdown through HashJoinExec. Similar to FilterExec this requires adjusting parent filters for the join's projection, but we also need to check what columns each filter refers to to push it into the correct child (or not push it if it refers to columns from both children and can't be disjoint).

@github-actions github-actions bot added core Core DataFusion crate physical-plan Changes to the physical-plan crate labels Jun 18, 2025
@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Jun 18, 2025
}

// Compute min/max using ScalarValue's utilities
let mut min_val = ScalarValue::try_from_array(array, 0)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should arrow kernel for this (this is slow).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we re-use datafusion/functions-aggregate/src/min_max.rs? Seems like there's a lot of complexity there related the types that we wouldn't want t to re-implement

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sadly min_batch is not public and either way functions-aggregate is not a dependency of physical-plan

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we move these functions into functions-aggreage-common?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense

@Dandandan
Copy link
Contributor

I tink we should also consider a heuristic for not evaluating the filter if it's not useful.

Also I think doing only the lookup is preferable above also computing / checking the bounds, I think the latter might create more overhead.

@Dandandan Dandandan closed this Jun 18, 2025
@Dandandan Dandandan reopened this Jun 18, 2025
@Dandandan
Copy link
Contributor

Sorry, misclicked a button.

@adriangb
Copy link
Contributor Author

adriangb commented Jun 18, 2025

I think doing only the lookup is preferable above also computing / checking the bounds, I think the latter might create more overhead

My thought was that for some cases the bounds checks are going to be quite effective at pruning and they should always be cheap to compute and cheap to apply. I'm surprised you say that they might create a lot of overhead?

@Dandandan
Copy link
Contributor

Dandandan commented Jun 18, 2025

I think doing only the lookup is preferable above also computing / checking the bounds, I think the latter might create more overhead

My thought was that for some cases the bounds checks are going to be quite effective at pruning and they should always be cheap to compute and cheap to apply. I'm surprised you say that they might create a lot of overhead?

Maybe I should articulate it a bit more.

  • If we are only filtering out based on statistics, min/max might make sense to quickly filter out large chunks of rows.
  • If we are filtering on values (e.g. filter pushdown) - I think it makes sense to only filter on the shared hashmap and not bothering with the min/max values - creating hashes and doing a single table lookup is quite fast, so I think we want to avoid to also evaluate the min/max expression (at least for all rows).

I think it also makes sense to also thing about a heuristic we want to use to use this pushdown only when we think it might be useful - e.g. the left side is much smaller than the right side, or we know (based on column statistics) it will filter out rows.

@adriangb
Copy link
Contributor Author

I think it makes sense to only filter on the shared hashmap and not bothering with the min/max values - creating hashes and doing a single table lookup is quite fast, so I think we want to avoid to also evaluate the min/max expression (at least for all rows)

I'm surprised that the hash table lookup, even if O(1), has such a small constant factor that its ~ a couple of binary comparisons. That said a reason to still do both is stats and filter caching: simple filters like col >= 123 and col <= 456 can be used for stats pruning and can easily be cached (for example for filter caching based indexing). So even if performance is not strictly better there is still something to be said for including a simple filter in addition to the hash table lookup.

@xudong963 xudong963 self-requested a review June 19, 2025 10:14
@adriangb
Copy link
Contributor Author

I think it also makes sense to also thing about a heuristic we want to use to use this pushdown only when we think it might be useful - e.g. the left side is much smaller than the right side, or we know (based on column statistics) it will filter out rows

Datafusion is generally not great at these things: we often don't have enough stats / info to make decisions like this.

@Dandandan
Copy link
Contributor

I think it makes sense to only filter on the shared hashmap and not bothering with the min/max values - creating hashes and doing a single table lookup is quite fast, so I think we want to avoid to also evaluate the min/max expression (at least for all rows)

I'm surprised that the hash table lookup, even if O(1), has such a small constant factor that its ~ a couple of binary comparisons. That said a reason to still do both is stats and filter caching: simple filters like col >= 123 and col <= 456 can be used for stats pruning and can easily be cached (for example for filter caching based indexing). So even if performance is not strictly better there is still something to be said for including a simple filter in addition to the hash table lookup.

It's hard to say generally, but a hashtable lookup which fits into cache on a u64 key can be really fast.

@adriangb
Copy link
Contributor Author

It's hard to say generally, but a hashtable lookup which fits into cache on a u64 key can be really fast.

I guess only benchmarks can tell. But I still think the scalar bounds are worth keeping for stats pruning reasons.

Copy link
Member

@xudong963 xudong963 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have any metrics to record how much data is filtered by dynamic join filter?

@@ -433,6 +433,117 @@ async fn test_topk_dynamic_filter_pushdown() {
);
}

#[tokio::test]
async fn test_hashjoin_dynamic_filter_pushdown() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add some tests for multiple joins? Such as

Join (t1.a = t2.b)
/        \
t1    Join(t2.c = t3.d)
        /    \
       t3   t2

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Such test can check

  1. dynamic filters are pushed down to right scan node
  2. dynamic filters aren't missed during pushdown

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a test that I think matches your suggestion

@adriangb adriangb force-pushed the hash-join-pushdown branch from 49d1636 to 04efcc1 Compare June 24, 2025 19:08
@github-actions github-actions bot removed the sqllogictest SQL Logic Tests (.slt) label Jun 24, 2025
@@ -353,6 +353,18 @@ impl FilterDescription {
}
}

pub fn with_child_pushdown(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More APIs 🤮. I really need to circle back to doing some whiteboard design for these. It's complex and won't be pretty but I'm sure it can be better than it is right now.

@Dandandan
Copy link
Contributor

To share some experience, we recently added some similar pushdown for HashJoinExec (at Coralogix) using sharing of Arc<JoinLeftData> / comparing column hashes and it is seems so far very effective with predicate pushdown enabled.

@adriangb
Copy link
Contributor Author

adriangb commented Jun 24, 2025

I was originally planning on keeping this PR smaller but it's been growing so I might as well add the Arc<LeftData> :)

@Dandandan
Copy link
Contributor

I was originally planning on keeping this PR smaller but it's been growing so I might as well add the Arc :)

Feel free to PR it however you like ;)

@adriangb
Copy link
Contributor Author

@Dandandan any chance you'd be willing to contribute your implementation of sharing Arc<LeftData> so we use something we know is working / I don't have to re-invent the wheel? I think you can just push it to this branch.

@adriangb
Copy link
Contributor Author

@alamb I'd be interested to see what benchmarks say if you don't mind kicking them off?

@xudong963
Copy link
Member

@alamb I'd be interested to see what benchmarks say if you don't mind kicking them off?

IIRC, the optimization will speed up tpch benchmark, we may run it directly. Or directly construct a small table and probe big table to see the effect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate physical-plan Changes to the physical-plan crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants