-
Notifications
You must be signed in to change notification settings - Fork 136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix wrong join_rel size estimates for anti join. #934
base: main
Are you sure you want to change the base?
Conversation
In Postgres planner, we set join_rel selectivity in calc_joinrel_size_estimate() for anti join type. Maybe it's codes historical reasons, we have another place to set anti join selectivity in clauselist_selectivity_ext(). This can lead to wrong join_rel size estimates for anti join type. We can remove the selectivity computation in clauselist_selectivity_ext(). So we can make codes be consistent with upstream.
Good catch, thank you for your contribution! It looks like the GPDB introduced these codes in the initial commit for specific reasons at that time. However, given the developments in PostgreSQL, it might be worth reevaluating their relevance today.
As cost is more accurate, the plan adjusted the join order. LASJ is a special outer join, and it also cloud be used as normal outer join to follow the join order optimization. The codes change LASJ as well as anti join, so I guess there might be plan change for query like: NOT EXISTS. Are there observations? |
Yeah, I also checked the git commit history; in the initial commit, only in clauselist_selectivity_ext(), we do "s1 = 1 - s1", which would be ok. It maybe do the merge upstream codes, that introduce "s1 = 1 -s1" in calc_joinrel_size_estimate(), but someone forgot to remove the same code in clauselist_selectivity_ext().
I only found a plan diff in regression test, which is notin test, but not anti join. I did the TPCDS benchmark; I found query 69, which includes "not exists"; there is a plan change.
with this PR, its plan look like as below:
without this PR, its plan looks like as below:
The execution time is 60s VS 67s. |
By the way, I run the make installcheck-good and make installcheck-cbdb-parallel on my local machine. I didn't have a failed case like github ci reporting. In my two commits, the failed cases are different. Is the CI not stable? |
Thank you for your verification; I believe that's the one. I'll do further research later. |
The failure in this CI run indicates a resource limitation on the CI environment, which has been a persistent issue for us. Similarly, the failure in this other run is also due to limited CI resources. We attempted a parallel refresh of a materialized view, but it proves ineffective with such constrained resources. Previously, there was a workaround mentioned in #892, but it was not an ideal solution and has since been closed. |
I found $SUBJECT when I did TPCH benchmark. In query 16, join rel size estimate has big gap with output of explain analyze.
Details are as below:
The 16th query is:
In Postgres planner, its plan looks like below:
Let's look at the hashjoin part of above plan,
The Hash Left Anti Semi(Not-In) Join size estimate has big gap with real execution. In this case rows = 8 vs rows = 2669156.
For anti join, the selectivity is determined by the fraction of tuples that do not match. But in current planner codes, we have two places that calculate anti join selectivity.
The first place at the calc_joinrel_size_estimate(), we have below codes:
And the second place at the end of clauselist_selectivity_ext(), we have below codes:
I think we should keep the first place just like upstream does, otherwise, the selectivity of anti join is same with inner join.
Because in clauselist_selectivity_ext() we do "s1 = 1 - s1", return from clauselist_selectivity_ext(), in calc_joinrel_size_estimate(),
we do "(1.0 - fkselec*jselec)" again.
In query 16 case, the anti join clause length is 1, so we return directly in below code:
But for pselec, we are not lucky, in this case, pselec is null, so it will run the end of clauselist_selectivity_ext(), and return s1=0.
So the join rel size is very small.
With this pr, query 16 plan looks like below:
As you can see, this time the estimate is much more accurate. And the execution time is also 2 seconds faster. I'm guessing the performance gap will be even bigger if the s option changes bigger. In this case, I choose s = 10(e.g. 10GB).