Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: ORCA: Fix eliminate self comparison #722

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Commits on Nov 21, 2024

  1. Wrong results by ORCA when NULL TEST on LOJ (#15358)

    Issue:
        Orca tries to eliminate self comparisons at preprocessing, but this early optimization
        misleading the further expression preprocesing of LOJ. This PR tries to avoid self comparison
        check's of WHERE clause predicate when SELECT's logical child is LOJ.
    
    NOTE:
    Postgres Executor’s standard, restriction placed in the ON clause is processed before the join,
    while a restriction placed in the WHERE clause is processed after the join.
    That does not matter with inner joins, but it matters a lot with outer joins.
    
    Setup:
    CREATE TABLE t2(c0 int, c1 int not null);
    INSERT INTO t2 values(1, 2),(3,4),(5,6),(7,8);
    CREATE TABLE t3(c0 int not null, c1 int, c2 int);
    
    SELECT t2.c1 FROM t2 LEFT OUTER JOIN t3 ON t3.c1 > t3.c2 WHERE (t3.c0=t3.c0) IS NULL;
     c1
    ----
    (0 rows)
    
    explain SELECT t2.c1 FROM t2 LEFT OUTER JOIN t3 ON t3.c1 > t3.c2 WHERE (t3.c0=t3.c0) IS NULL;
    QUERY PLAN
    ---------------------------------------------------------------------------------------------------
    Gather Motion 3:1  (slice1; segments: 3)  (cost=0.00..1324032.07 rows=1 width=4) ->  Nested Loop  (cost=0.00..1324032.07 rows=1 width=4)
       Join Filter: true
       ->  Seq Scan on t2  (cost=0.00..431.00 rows=1 width=4)
             Filter: (true IS NULL)
       ->  Materialize  (cost=0.00..431.00 rows=1 width=1)
             ->  Broadcast Motion 3:3  (slice2; segments: 3)  (cost=0.00..431.00 rows=1 width=1)
                   ->  Seq Scan on t3  (cost=0.00..431.00 rows=1 width=1)
                         Filter: c1 > c2
    Optimizer: Pivotal Optimizer (GPORCA)
    (10 rows
    set optimizer=off;
    SET
    SELECT t2.c1 FROM t2 LEFT OUTER JOIN t3 ON t3.c1 > t3.c2 WHERE (t3.c0=t3.c0) IS NULL;
     c1
    ----
      4
      8
      2
      6
    (4 rows)
    explain SELECT t2.c1 FROM t2 LEFT OUTER JOIN t3 ON t3.c1 > t3.c2 WHERE (t3.c0=t3.c0) IS NULL;
                                                   QUERY PLAN
    ---------------------------------------------------------------------------------------------------------
     Gather Motion 3:1  (slice1; segments: 3)  (cost=10000000000.00..10044448648.78 rows=1117865000 width=4)
       ->  Nested Loop Left Join  (cost=10000000000.00..10029543782.11 rows=372621667 width=4)
             Filter: ((t3.c0 = t3.c0) IS NULL)
             ->  Seq Scan on t2  (cost=0.00..321.00 rows=28700 width=4)
             ->  Materialize  (cost=0.00..834.64 rows=25967 width=4)
                   ->  Broadcast Motion 3:3  (slice2; segments: 3)  (cost=0.00..704.81 rows=25967 width=4)
                         ->  Seq Scan on t3  (cost=0.00..358.58 rows=8656 width=4)
                               Filter: (c1 > c2)
     Optimizer: Postgres query optimizer
    (8 rows)
    
    After Fix:
    SELECT t2.c1 FROM t2 LEFT OUTER JOIN t3 ON t3.c1 > t3.c2 WHERE (t3.c0=t3.c0) IS NULL;
     c1
    ----
      6
      4
      8
      2
    (4 rows)
    explain SELECT t2.c1 FROM t2 LEFT OUTER JOIN t3 ON t3.c1 > t3.c2 WHERE (t3.c0=t3.c0) IS NULL;
                                                   QUERY PLAN
    ---------------------------------------------------------------------------------------------------------
     Gather Motion 3:1  (slice1; segments: 3)  (cost=0.00..1324032.37 rows=1 width=4)
       ->  Result  (cost=0.00..1324032.37 rows=1 width=4)
             Filter: ((t3.c0 = t3.c0) IS NULL)
             ->  Nested Loop Left Join  (cost=0.00..1324032.37 rows=1 width=8)
                   Join Filter: true
                   ->  Seq Scan on t2  (cost=0.00..431.00 rows=1 width=4)
                   ->  Materialize  (cost=0.00..431.00 rows=1 width=4)
                         ->  Broadcast Motion 3:3  (slice2; segments: 3)  (cost=0.00..431.00 rows=1 width=4)
                               ->  Seq Scan on t3  (cost=0.00..431.00 rows=1 width=4)
                                     Filter: (c1 > c2)
     Optimizer: Pivotal Optimizer (GPORCA)
    (cherry picked from gpdb commit d3dd98c1a8daf04fbf6cb91fc4afa6f91b317e93)
    pobbatihari authored and fanfuxiaoran committed Nov 21, 2024
    Configuration menu
    Copy the full SHA
    b628a60 View commit details
    Browse the repository at this point in the history
  2. [ORCA] optimize eliminate self comparison

    'PexprEliminateSelfComparison' only uses the 'pcrsNotNull'
    from the topmost expression to filter the nullable columns.
    This can lead the PexprEliminateSelfComparison cannot apply
    to the subquery properly.
    
    create table t1(a int not null, b int not null);
    create table t2(like t1);
    create table t3(like t1);
    select * from t2  left join (select  t2.a , t2.b  from t1, t2 where t1.a
    	< t1.a) as t on t2. a = t.a;
    
    the plan for it from orca is
     Gather Motion 3:1  (slice1; segments: 3)
       ->  Hash Left Join
             Hash Cond: (t2.a = t2_1.a)
             ->  Seq Scan on t2
             ->  Hash
                   ->  Nested Loop
                         Join Filter: true
                         ->  Seq Scan on t2 t2_1
                         ->  Materialize
                               ->  Broadcast Motion 3:3  (slice2; segments: 3)
                                     ->  Seq Scan on t1
                                           Filter: (a < a)
    the self comparison in subquery is not eliminated.
    
    This commit is to optimize it by fetching 'pcrsNotNull' from
    the current logical expression and apply them to its child
    scalar expression.
    fanfuxiaoran committed Nov 21, 2024
    Configuration menu
    Copy the full SHA
    ffb8b2e View commit details
    Browse the repository at this point in the history