-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: ORCA: Fix eliminate self comparison #722
Open
fanfuxiaoran
wants to merge
2
commits into
main
Choose a base branch
from
fix_orca
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
2 tasks
fanfuxiaoran
changed the title
ORCA: Fix eliminate self comparison
WIP: ORCA: Fix eliminate self comparison
Nov 21, 2024
Working on adding tests for it! |
Issue: Orca tries to eliminate self comparisons at preprocessing, but this early optimization misleading the further expression preprocesing of LOJ. This PR tries to avoid self comparison check's of WHERE clause predicate when SELECT's logical child is LOJ. NOTE: Postgres Executor’s standard, restriction placed in the ON clause is processed before the join, while a restriction placed in the WHERE clause is processed after the join. That does not matter with inner joins, but it matters a lot with outer joins. Setup: CREATE TABLE t2(c0 int, c1 int not null); INSERT INTO t2 values(1, 2),(3,4),(5,6),(7,8); CREATE TABLE t3(c0 int not null, c1 int, c2 int); SELECT t2.c1 FROM t2 LEFT OUTER JOIN t3 ON t3.c1 > t3.c2 WHERE (t3.c0=t3.c0) IS NULL; c1 ---- (0 rows) explain SELECT t2.c1 FROM t2 LEFT OUTER JOIN t3 ON t3.c1 > t3.c2 WHERE (t3.c0=t3.c0) IS NULL; QUERY PLAN --------------------------------------------------------------------------------------------------- Gather Motion 3:1 (slice1; segments: 3) (cost=0.00..1324032.07 rows=1 width=4) -> Nested Loop (cost=0.00..1324032.07 rows=1 width=4) Join Filter: true -> Seq Scan on t2 (cost=0.00..431.00 rows=1 width=4) Filter: (true IS NULL) -> Materialize (cost=0.00..431.00 rows=1 width=1) -> Broadcast Motion 3:3 (slice2; segments: 3) (cost=0.00..431.00 rows=1 width=1) -> Seq Scan on t3 (cost=0.00..431.00 rows=1 width=1) Filter: c1 > c2 Optimizer: Pivotal Optimizer (GPORCA) (10 rows set optimizer=off; SET SELECT t2.c1 FROM t2 LEFT OUTER JOIN t3 ON t3.c1 > t3.c2 WHERE (t3.c0=t3.c0) IS NULL; c1 ---- 4 8 2 6 (4 rows) explain SELECT t2.c1 FROM t2 LEFT OUTER JOIN t3 ON t3.c1 > t3.c2 WHERE (t3.c0=t3.c0) IS NULL; QUERY PLAN --------------------------------------------------------------------------------------------------------- Gather Motion 3:1 (slice1; segments: 3) (cost=10000000000.00..10044448648.78 rows=1117865000 width=4) -> Nested Loop Left Join (cost=10000000000.00..10029543782.11 rows=372621667 width=4) Filter: ((t3.c0 = t3.c0) IS NULL) -> Seq Scan on t2 (cost=0.00..321.00 rows=28700 width=4) -> Materialize (cost=0.00..834.64 rows=25967 width=4) -> Broadcast Motion 3:3 (slice2; segments: 3) (cost=0.00..704.81 rows=25967 width=4) -> Seq Scan on t3 (cost=0.00..358.58 rows=8656 width=4) Filter: (c1 > c2) Optimizer: Postgres query optimizer (8 rows) After Fix: SELECT t2.c1 FROM t2 LEFT OUTER JOIN t3 ON t3.c1 > t3.c2 WHERE (t3.c0=t3.c0) IS NULL; c1 ---- 6 4 8 2 (4 rows) explain SELECT t2.c1 FROM t2 LEFT OUTER JOIN t3 ON t3.c1 > t3.c2 WHERE (t3.c0=t3.c0) IS NULL; QUERY PLAN --------------------------------------------------------------------------------------------------------- Gather Motion 3:1 (slice1; segments: 3) (cost=0.00..1324032.37 rows=1 width=4) -> Result (cost=0.00..1324032.37 rows=1 width=4) Filter: ((t3.c0 = t3.c0) IS NULL) -> Nested Loop Left Join (cost=0.00..1324032.37 rows=1 width=8) Join Filter: true -> Seq Scan on t2 (cost=0.00..431.00 rows=1 width=4) -> Materialize (cost=0.00..431.00 rows=1 width=4) -> Broadcast Motion 3:3 (slice2; segments: 3) (cost=0.00..431.00 rows=1 width=4) -> Seq Scan on t3 (cost=0.00..431.00 rows=1 width=4) Filter: (c1 > c2) Optimizer: Pivotal Optimizer (GPORCA) (cherry picked from gpdb commit d3dd98c1a8daf04fbf6cb91fc4afa6f91b317e93)
'PexprEliminateSelfComparison' only uses the 'pcrsNotNull' from the topmost expression to filter the nullable columns. This can lead the PexprEliminateSelfComparison cannot apply to the subquery properly. create table t1(a int not null, b int not null); create table t2(like t1); create table t3(like t1); select * from t2 left join (select t2.a , t2.b from t1, t2 where t1.a < t1.a) as t on t2. a = t.a; the plan for it from orca is Gather Motion 3:1 (slice1; segments: 3) -> Hash Left Join Hash Cond: (t2.a = t2_1.a) -> Seq Scan on t2 -> Hash -> Nested Loop Join Filter: true -> Seq Scan on t2 t2_1 -> Materialize -> Broadcast Motion 3:3 (slice2; segments: 3) -> Seq Scan on t1 Filter: (a < a) the self comparison in subquery is not eliminated. This commit is to optimize it by fetching 'pcrsNotNull' from the current logical expression and apply them to its child scalar expression.
fanfuxiaoran
force-pushed
the
fix_orca
branch
from
November 21, 2024 08:14
8902941
to
ffb8b2e
Compare
Found that gpdb has a similar commit : greenplum-db/gpdb-archive@d3dd98c |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
For the below query
orca generates a wrong plan:
The root cause is '(t1.b < t1.b)' is been transformed into 'CScalarConst (0)' by 'PexprEliminateSelfComparison'. The reason is that when checking if the
selfcomparison
can be simplified by functionFSelfComparison
, it checks theCColRef
IsNullable only from the column definition, not checking if the column is from outer join.To fix it, before simplifing the scalar expression, we fisrt get the 'pcrsNotNull' from its parent expression. 'pcrsNotNull' recoreds the output columns' nullable property. If the column is not in 'pcrsNotNull', then the self comparison cannot be transformed into const true or false.
Fixes #594
What does this PR do?
Type of Change
Breaking Changes
Test Plan
make installcheck
make -C src/test installcheck-cbdb-parallel
Impact
Performance:
User-facing changes:
Dependencies:
Checklist
Additional Context
[skip ci]
to your PR title. Only use when necessary!