Fix wrong join_rel size estimates for anti join. #934

tenderwg · 2025-02-17T08:38:39Z

I found $SUBJECT when I did TPCH benchmark. In query 16, join rel size estimate has big gap with output of explain analyze.
Details are as below:
The 16th query is:

select
    p_brand,
    p_type,
    p_size,
    count(distinct ps_suppkey) as supplier_cnt
from
    partsupp,
    part
where
    p_partkey = ps_partkey
    and p_brand <> 'Brand#44'
    and p_type not like 'SMALL BURNISHED%'
    and p_size in (36, 27, 34, 45, 11, 6, 25, 16)
    and ps_suppkey not in (
        select
            s_suppkey
        from
            supplier
        where
            s_comment like '%Customer%Complaints%'
    )
group by
    p_brand,
    p_type,
    p_size
order by
    supplier_cnt desc,
    p_brand,
    p_type,
    p_size;

In Postgres planner, its plan looks like below:

                                                                                          QUERY PLAN                                                                                          
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Gather Motion 3:1  (slice1; segments: 3)  (cost=68817.76..68818.05 rows=20 width=44) (actual time=19394.798..19437.429 rows=27839 loops=1)
   Merge Key: (count(DISTINCT partsupp.ps_suppkey)), part.p_brand, part.p_type, part.p_size
   ->  Sort  (cost=68817.76..68817.78 rows=7 width=44) (actual time=18325.335..18328.372 rows=9410 loops=1)
         Sort Key: (count(DISTINCT partsupp.ps_suppkey)) DESC, part.p_brand, part.p_type, part.p_size
         Sort Method:  quicksort  Memory: 3570kB
         ->  GroupAggregate  (cost=68817.52..68817.67 rows=7 width=44) (actual time=17517.702..18287.814 rows=9410 loops=1)
               Group Key: part.p_brand, part.p_type, part.p_size
               ->  Sort  (cost=68817.52..68817.54 rows=7 width=40) (actual time=17517.529..17822.904 rows=400936 loops=1)
                     Sort Key: part.p_brand, part.p_type, part.p_size
                     Sort Method:  external merge  Disk: 60448kB
                     ->  Redistribute Motion 3:3  (slice2; segments: 3)  (cost=48369.10..68817.43 rows=7 width=40) (actual time=7132.199..15967.319 rows=400936 loops=1)
                           Hash Key: part.p_brand, part.p_type, part.p_size
                           ->  Hash Join  (cost=48369.10..68817.29 rows=7 width=40) (actual time=9108.262..11288.461 rows=396839 loops=1)
                                 Hash Cond: (part.p_partkey = partsupp.ps_partkey)
                                 Extra Text: (seg2)   Initial batch 0:
 (seg2)     Wrote 36480K bytes to inner workfile.
 (seg2)     Wrote 7456K bytes to outer workfile.
 (seg2)   Overflow batch 1:
 (seg2)     Read 36484K bytes from inner workfile.
 (seg2)     Read 7473K bytes from outer workfile.
 (seg2)   Hash chain length 5.4 avg, 28 max, using 493953 of 1048576 buckets.
                                 ->  Seq Scan on part  (cost=0.00..20051.67 rows=105720 width=40) (actual time=0.266..1208.562 rows=99260 loops=1)
                                       Filter: ((p_brand <> 'Brand#44'::bpchar) AND ((p_type)::text !~~ 'SMALL BURNISHED%'::text) AND (p_size = ANY ('{36,27,34,45,11,6,25,16}'::integer[])))
                                       Rows Removed by Filter: 567371
                                 ->  Hash  (cost=48369.01..48369.01 rows=8 width=8) (actual time=6876.148..6876.169 rows=2669156 loops=1)
                                       Buckets: 524288 (originally 262144)  Batches: 2 (originally 1)  Memory Usage: 59755kB
                                       ->  Hash Left Anti Semi (Not-In) Join  (cost=600.93..48369.01 rows=8 width=8) (actual time=39.783..5232.001 rows=2669156 loops=1)
                                             Hash Cond: (partsupp.ps_suppkey = supplier.s_suppkey)
                                             Extra Text: (seg2)   Hash chain length 1.0 avg, 1 max, using 56 of 262144 buckets.
                                             ->  Seq Scan on partsupp  (cost=0.00..41101.13 rows=2666513 width=8) (actual time=0.217..3245.678 rows=2670680 loops=1)
                                             ->  Hash  (cost=600.80..600.80 rows=10 width=4) (actual time=4.142..4.148 rows=56 loops=1)
                                                   Buckets: 262144  Batches: 1  Memory Usage: 2050kB
                                                   ->  Broadcast Motion 3:3  (slice3; segments: 3)  (cost=0.00..600.80 rows=10 width=4) (actual time=0.032..4.086 rows=56 loops=1)
                                                         ->  Seq Scan on supplier  (cost=0.00..600.67 rows=3 width=4) (actual time=8.156..55.336 rows=22 loops=1)
                                                               Filter: ((s_comment)::text ~~ '%Customer%Complaints%'::text)
                                                               Rows Removed by Filter: 33189
 Planning Time: 16.820 ms
   (slice0)    Executor memory: 180K bytes.
   (slice1)    Executor memory: 52339K bytes avg x 3x(0) workers, 52346K bytes max (seg1).  Work_mem: 42602K bytes max.
 * (slice2)    Executor memory: 66128K bytes avg x 3x(0) workers, 66128K bytes max (seg0).  Work_mem: 59755K bytes max, 108360K bytes wanted.
   (slice3)    Executor memory: 113K bytes avg x 3x(0) workers, 113K bytes max (seg0).
 Memory used:  128000kB
 Memory wanted:  542599kB
 Optimizer: Postgres query optimizer
 Execution Time: 19475.738 ms

Let's look at the hashjoin part of above plan,

>  Hash Join  (cost=48369.10..68817.29 rows=7 width=40) (actual time=9108.262..11288.461 rows=396839 loops=1)
    Hash Cond: (part.p_partkey = partsupp.ps_partkey)
    ->  Seq Scan on part  (cost=0.00..20051.67 rows=105720 width=40) (actual time=0.266..1208.562 rows=99260 loops=1)
        Filter: ((p_brand <> 'Brand#44'::bpchar) AND ((p_type)::text !~~ 'SMALL BURNISHED%'::text) AND (p_size = ANY ('{36,27,34,45,11,6,25,16}'::integer[])))
        Rows Removed by Filter: 567371
    ->  Hash  (cost=48369.01..48369.01 rows=8 width=8) (actual time=6876.148..6876.169 rows=2669156 loops=1)
        Buckets: 524288 (originally 262144)  Batches: 2 (originally 1)  Memory Usage: 59755kB
        ->  Hash Left Anti Semi (Not-In) Join  (cost=600.93..48369.01 rows=8 width=8) (actual time=39.783..5232.001 rows=2669156 loops=1)
            Hash Cond: (partsupp.ps_suppkey = supplier.s_suppkey)
            ->  Seq Scan on partsupp  (cost=0.00..41101.13 rows=2666513 width=8) (actual time=0.217..3245.678 rows=2670680 loops=1)
            ->  Hash  (cost=600.80..600.80 rows=10 width=4) (actual time=4.142..4.148 rows=56 loops=1)
                Buckets: 262144  Batches: 1  Memory Usage: 2050kB
                ->  Broadcast Motion 3:3  (slice3; segments: 3)  (cost=0.00..600.80 rows=10 width=4) (actual time=0.032..4.086 rows=56 loops=1)
                    ->  Seq Scan on supplier  (cost=0.00..600.67 rows=3 width=4) (actual time=8.156..55.336 rows=22 loops=1)
                        Filter: ((s_comment)::text ~~ '%Customer%Complaints%'::text)
                        Rows Removed by Filter: 33189

The Hash Left Anti Semi(Not-In) Join size estimate has big gap with real execution. In this case rows = 8 vs rows = 2669156.
For anti join, the selectivity is determined by the fraction of tuples that do not match. But in current planner codes, we have two places that calculate anti join selectivity.
The first place at the calc_joinrel_size_estimate(), we have below codes:

		case JOIN_ANTI:
		case JOIN_LASJ_NOTIN:
			nrows = outer_rows * (1.0 - fkselec * jselec);
			nrows *= pselec;
			break;

And the second place at the end of clauselist_selectivity_ext(), we have below codes:

       /* 
	 * For Anti Semi Join, selectivity is determined by the fraction of 
	 * tuples that do no match 
	 */
	if (JOIN_ANTI == jointype || JOIN_LASJ_NOTIN == jointype)
	{
		s1 = (1 - s1);
	}

I think we should keep the first place just like upstream does, otherwise, the selectivity of anti join is same with inner join.
Because in clauselist_selectivity_ext() we do "s1 = 1 - s1", return from clauselist_selectivity_ext(), in calc_joinrel_size_estimate(),
we do "(1.0 - fkselec*jselec)" again.

In query 16 case, the anti join clause length is 1, so we return directly in below code:

if (list_length(clauses) == 1)
		return clause_selectivity_ext(root, (Node *) linitial(clauses),
									  varRelid, jointype, sjinfo,
									  use_extended_stats, use_damping);

But for pselec, we are not lucky, in this case, pselec is null, so it will run the end of clauselist_selectivity_ext(), and return s1=0.
So the join rel size is very small.
With this pr, query 16 plan looks like below:

                                                                                                QUERY PLAN                                                                                                
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Gather Motion 3:1  (slice1; segments: 3)  (cost=142534.89..144770.04 rows=157775 width=44) (actual time=17604.002..17644.780 rows=27839 loops=1)
   Merge Key: (count(DISTINCT partsupp.ps_suppkey)), part.p_brand, part.p_type, part.p_size
   ->  Sort  (cost=142534.89..142666.37 rows=52592 width=44) (actual time=17511.103..17514.690 rows=9410 loops=1)
         Sort Key: (count(DISTINCT partsupp.ps_suppkey)) DESC, part.p_brand, part.p_type, part.p_size
         Sort Method:  quicksort  Memory: 3570kB
         ->  GroupAggregate  (cost=132599.94..138411.04 rows=52592 width=44) (actual time=16559.956..17459.772 rows=9410 loops=1)
               Group Key: part.p_brand, part.p_type, part.p_size
               ->  Sort  (cost=132599.94..133656.98 rows=422814 width=40) (actual time=16559.757..16910.262 rows=400936 loops=1)
                     Sort Key: part.p_brand, part.p_type, part.p_size
                     Sort Method:  external merge  Disk: 60480kB
                     ->  Redistribute Motion 3:3  (slice2; segments: 3)  (cost=21974.09..93088.65 rows=422814 width=40) (actual time=1111.295..14910.234 rows=400936 loops=1)
                           Hash Key: part.p_brand, part.p_type, part.p_size
                           ->  Hash Left Anti Semi (Not-In) Join  (cost=21974.09..84632.36 rows=422814 width=40) (actual time=1307.675..11466.605 rows=396839 loops=1)
                                 Hash Cond: (partsupp.ps_suppkey = supplier.s_suppkey)
                                 Extra Text: (seg0)   Hash chain length 1.0 avg, 1 max, using 56 of 131072 buckets.
                                 ->  Hash Join  (cost=21373.17..78746.05 rows=422857 width=40) (actual time=1175.142..11022.790 rows=397040 loops=1)
                                       Hash Cond: (partsupp.ps_partkey = part.p_partkey)
                                       Extra Text: (seg0)   Hash chain length 1.4 avg, 7 max, using 69507 of 131072 buckets.
                                       ->  Seq Scan on partsupp  (cost=0.00..41101.13 rows=2666513 width=8) (actual time=3.526..8470.135 rows=2670680 loops=1)
                                       ->  Hash  (cost=20051.67..20051.67 rows=105720 width=40) (actual time=1168.854..1168.869 rows=99260 loops=1)
                                             Buckets: 131072  Batches: 1  Memory Usage: 8204kB
                                             ->  Seq Scan on part  (cost=0.00..20051.67 rows=105720 width=40) (actual time=0.346..1026.829 rows=99260 loops=1)
                                                   Filter: ((p_brand <> 'Brand#44'::bpchar) AND ((p_type)::text !~~ 'SMALL BURNISHED%'::text) AND (p_size = ANY ('{36,27,34,45,11,6,25,16}'::integer[])))
                                                   Rows Removed by Filter: 567371
                                 ->  Hash  (cost=600.80..600.80 rows=10 width=4) (actual time=84.655..84.656 rows=56 loops=1)
                                       Buckets: 131072  Batches: 1  Memory Usage: 1026kB
                                       ->  Broadcast Motion 3:3  (slice3; segments: 3)  (cost=0.00..600.80 rows=10 width=4) (actual time=8.873..84.547 rows=56 loops=1)
                                             ->  Seq Scan on supplier  (cost=0.00..600.67 rows=3 width=4) (actual time=51.387..126.987 rows=22 loops=1)
                                                   Filter: ((s_comment)::text ~~ '%Customer%Complaints%'::text)
                                                   Rows Removed by Filter: 33189
 Planning Time: 26.042 ms
   (slice0)    Executor memory: 181K bytes.
   (slice1)    Executor memory: 52182K bytes avg x 3x(0) workers, 52189K bytes max (seg1).  Work_mem: 42474K bytes max.
   (slice2)    Executor memory: 9546K bytes avg x 3x(0) workers, 9578K bytes max (seg0).  Work_mem: 8204K bytes max.
   (slice3)    Executor memory: 113K bytes avg x 3x(0) workers, 113K bytes max (seg0).
 Memory used:  128000kB
 Optimizer: Postgres query optimizer
 Execution Time: 17701.394 ms

As you can see, this time the estimate is much more accurate. And the execution time is also 2 seconds faster. I'm guessing the performance gap will be even bigger if the s option changes bigger. In this case, I choose s = 10(e.g. 10GB).

In Postgres planner, we set join_rel selectivity in calc_joinrel_size_estimate() for anti join type. Maybe it's codes historical reasons, we have another place to set anti join selectivity in clauselist_selectivity_ext(). This can lead to wrong join_rel size estimates for anti join type. We can remove the selectivity computation in clauselist_selectivity_ext(). So we can make codes be consistent with upstream.

avamingli · 2025-02-18T03:10:31Z

As you can see, this time the estimate is much more accurate.

Good catch, thank you for your contribution!

It looks like the GPDB introduced these codes in the initial commit for specific reasons at that time. However, given the developments in PostgreSQL, it might be worth reevaluating their relevance today.

And the execution time is also 2 seconds faster. I'm guessing the performance gap will be even bigger if the s option changes bigger. In this case, I choose s = 10(e.g. 10GB).

As cost is more accurate, the plan adjusted the join order. LASJ is a special outer join, and it also cloud be used as normal outer join to follow the join order optimization.

The codes change LASJ as well as anti join, so I guess there might be plan change for query like: NOT EXISTS. Are there observations?

tenderwg · 2025-02-18T14:05:14Z

As you can see, this time the estimate is much more accurate.

Good catch, thank you for your contribution!

It looks like the GPDB introduced these codes in the initial commit for specific reasons at that time. However, given the developments in PostgreSQL, it might be worth reevaluating their relevance today.

Yeah, I also checked the git commit history; in the initial commit, only in clauselist_selectivity_ext(), we do "s1 = 1 - s1", which would be ok. It maybe do the merge upstream codes, that introduce "s1 = 1 -s1" in calc_joinrel_size_estimate(), but someone forgot to remove the same code in clauselist_selectivity_ext().

And the execution time is also 2 seconds faster. I'm guessing the performance gap will be even bigger if the s option changes bigger. In this case, I choose s = 10(e.g. 10GB).

As cost is more accurate, the plan adjusted the join order. LASJ is a special outer join, and it also cloud be used as normal outer join to follow the join order optimization.

The codes change LASJ as well as anti join, so I guess there might be plan change for query like: NOT EXISTS. Are there observations?

I only found a plan diff in regression test, which is notin test, but not anti join. I did the TPCDS benchmark; I found query 69, which includes "not exists"; there is a plan change.

tpcds=# explain analyze select  
  cd_gender,
  cd_marital_status,
  cd_education_status,
  count(*) cnt1,
  cd_purchase_estimate,
  count(*) cnt2,
  cd_credit_rating,
  count(*) cnt3
 from
  customer c,customer_address ca,customer_demographics
 where
  c.c_current_addr_sk = ca.ca_address_sk and
  ca_state in ('CO','IL','MN') and
  cd_demo_sk = c.c_current_cdemo_sk and 
  exists (select *
          from store_sales,date_dim
          where c.c_customer_sk = ss_customer_sk and
                ss_sold_date_sk = d_date_sk and
                d_year = 1999 and
                d_moy between 1 and 1+2) and
   (not exists (select *
            from web_sales,date_dim
            where c.c_customer_sk = ws_bill_customer_sk and
                  ws_sold_date_sk = d_date_sk and
                  d_year = 1999 and
                  d_moy between 1 and 1+2) and
    not exists (select * 
            from catalog_sales,date_dim
            where c.c_customer_sk = cs_ship_customer_sk and
                  cs_sold_date_sk = d_date_sk and
                  d_year = 1999 and
                  d_moy between 1 and 1+2))
 group by cd_gender,
          cd_marital_status,
          cd_education_status,
          cd_purchase_estimate,
          cd_credit_rating
 order by cd_gender,
          cd_marital_status,
          cd_education_status,
          cd_purchase_estimate,
          cd_credit_rating
 limit 100;

with this PR, its plan look like as below:

                                                                                                               QUERY PLAN                                                                                                    
             
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------
Limit  (cost=328084.44..328089.19 rows=100 width=64) (actual time=60631.738..60632.119 rows=100 loops=1)
  ->  Finalize GroupAggregate  (cost=328084.44..328292.24 rows=4375 width=64) (actual time=60631.732..60632.035 rows=100 loops=1)
        Group Key: customer_demographics.cd_gender, customer_demographics.cd_marital_status, customer_demographics.cd_education_status, customer_demographics.cd_purchase_estimate, customer_demographics.cd_credit_rating
        ->  Gather Motion 3:1  (slice1; segments: 3)  (cost=328084.44..328182.88 rows=4375 width=48) (actual time=60631.692..60631.843 rows=124 loops=1)
              Merge Key: customer_demographics.cd_gender, customer_demographics.cd_marital_status, customer_demographics.cd_education_status, customer_demographics.cd_purchase_estimate, customer_demographics.cd_credit_rating
              ->  Partial GroupAggregate  (cost=328084.44..328124.55 rows=1458 width=48) (actual time=60580.889..60582.351 rows=813 loops=1)
                    Group Key: customer_demographics.cd_gender, customer_demographics.cd_marital_status, customer_demographics.cd_education_status, customer_demographics.cd_purchase_estimate, customer_demographics.cd_credit_rating
                    ->  Sort  (cost=328084.44..328088.09 rows=1458 width=40) (actual time=60580.849..60581.022 rows=934 loops=1)
                          Sort Key: customer_demographics.cd_gender, customer_demographics.cd_marital_status, customer_demographics.cd_education_status, customer_demographics.cd_purchase_estimate, customer_demographics.cd_credit_rating
                          Sort Method:  quicksort  Memory: 538kB
                          ->  Hash Join  (cost=317385.56..328007.81 rows=1458 width=40) (actual time=59143.763..60574.039 rows=1552 loops=1)
                                Hash Cond: (customer_demographics.cd_demo_sk = c.c_current_cdemo_sk)
                                Extra Text: (seg0)   Hash chain length 1.0 avg, 2 max, using 1535 of 65536 buckets.
                                ->  Seq Scan on customer_demographics  (cost=0.00..8206.67 rows=640267 width=44) (actual time=2.228..1257.710 rows=641104 loops=1)
                                ->  Hash  (cost=317366.68..317366.68 rows=1511 width=4) (actual time=59139.608..59139.614 rows=1552 loops=1)
                                      Buckets: 65536  Batches: 1  Memory Usage: 567kB
                                      ->  Redistribute Motion 3:3  (slice2; segments: 3)  (cost=313805.81..317366.68 rows=1511 width=4) (actual time=58828.043..59138.928 rows=1720 loops=1)
                                            Hash Key: c.c_current_cdemo_sk
                                            ->  Hash Anti Join  (cost=313805.81..317336.47 rows=1511 width=4) (actual time=58076.704..59083.686 rows=1595 loops=1)
                                                  Hash Cond: (c.c_customer_sk = web_sales.ws_bill_customer_sk)
                                                  Extra Text: (seg0)   Hash chain length 12.7 avg, 49 max, using 5733 of 65536 buckets.
                                                  ->  Hash Anti Join  (cost=265626.25..269137.62 rows=1567 width=8) (actual time=48428.206..49420.346 rows=1651 loops=1)
                                                        Hash Cond: (c.c_customer_sk = catalog_sales.cs_ship_customer_sk)
                                                        Extra Text: (seg0)   Hash chain length 10.4 avg, 51 max, using 13003 of 65536 buckets.
                                                        ->  Hash Semi Join  (cost=170946.96..174438.11 rows=1686 width=8) (actual time=33859.998..34838.469 rows=1811 loops=1)
                                                              Hash Cond: (c.c_customer_sk = store_sales.ss_customer_sk)
                                                              Extra Text: (seg0)   Hash chain length 14.2 avg, 65 max, using 19106 of 65536 buckets.
                                                              ->  Hash Join  (cost=2096.43..5534.84 rows=12944 width=8) (actual time=211.223..1208.829 rows=13203 loops=1)
                                                                    Hash Cond: (c.c_current_addr_sk = ca.ca_address_sk)
                                                                    Extra Text: (seg1)   Hash chain length 1.2 avg, 5 max, using 17037 of 65536 buckets.
                                                                    ->  Seq Scan on customer c  (cost=0.00..2840.67 rows=166667 width=12) (actual time=3.287..881.714 rows=166874 loops=1)
                                                                    ->  Hash  (cost=1853.72..1853.72 rows=19417 width=4) (actual time=207.179..207.188 rows=19679 loops=1)
                                                                          Buckets: 65536  Batches: 1  Memory Usage: 1204kB
                                                                          ->  Broadcast Motion 3:3  (slice3; segments: 3)  (cost=0.00..1853.72 rows=19417 width=4) (actual time=0.276..198.109 rows=19679 loops=1)
                                                                                ->  Seq Scan on customer_address ca  (cost=0.00..1594.83 rows=6472 width=4) (actual time=0.397..299.926 rows=6715 loops=1)
                                                                                      Filter: (ca_state = ANY ('{CO,IL,MN}'::bpchar[]))
                                                                                      Rows Removed by Filter: 76651
                                                              ->  Hash  (cost=168566.32..168566.32 rows=22737 width=4) (actual time=33648.074..33648.075 rows=271539 loops=1)
                                                                    Buckets: 65536  Batches: 1  Memory Usage: 10059kB
                                                                    ->  Redistribute Motion 3:3  (slice4; segments: 3)  (cost=1630.63..168566.32 rows=22737 width=4) (actual time=0.229..33542.873 rows=291145 loops=1)
                                                                          Hash Key: store_sales.ss_customer_sk
                                                                          ->  Hash Join  (cost=1630.63..168111.58 rows=22737 width=4) (actual time=153.411..34734.913 rows=278197 loops=1)
                                                                                Hash Cond: (store_sales.ss_sold_date_sk = date_dim.d_date_sk)
                                                                                Extra Text: (seg1)   Hash chain length 1.0 avg, 2 max, using 89 of 65536 buckets.
                                                                                ->  Seq Scan on store_sales  (cost=0.00..141042.79 rows=9600479 width=8) (actual time=6.716..32570.244 rows=9601560 loops=1)
                                                                                ->  Hash  (cost=1628.36..1628.36 rows=181 width=4) (actual time=234.437..234.453 rows=90 loops=1)
                                                                                      Buckets: 65536  Batches: 1  Memory Usage: 516kB
                                                                                      ->  Seq Scan on date_dim  (cost=0.00..1628.36 rows=181 width=4) (actual time=133.576..234.286 rows=90 loops=1)
                                                                                            Filter: ((d_moy >= 1) AND (d_moy <= 3) AND (d_year = 1999))
                                                                                            Rows Removed by Filter: 72959
                                                        ->  Hash  (cost=94530.99..94530.99 rows=11864 width=4) (actual time=14567.445..14567.447 rows=135738 loops=1)
                                                              Buckets: 65536  Batches: 1  Memory Usage: 5285kB
                                                              ->  Redistribute Motion 3:3  (slice5; segments: 3)  (cost=1630.63..94530.99 rows=11864 width=4) (actual time=0.253..14478.242 rows=136691 loops=1)
                                                                    Hash Key: catalog_sales.cs_ship_customer_sk
                                                                    ->  Hash Join  (cost=1630.63..94293.71 rows=11864 width=4) (actual time=4975.077..18371.695 rows=134965 loops=1)
                                                                          Hash Cond: (catalog_sales.cs_sold_date_sk = date_dim_2.d_date_sk)
                                                                          Extra Text: (seg1)   Hash chain length 1.0 avg, 2 max, using 89 of 65536 buckets.
                                                                          ->  Seq Scan on catalog_sales  (cost=0.00..79942.42 rows=4800442 width=8) (actual time=0.231..17961.146 rows=4800991 loops=1)
                                                                          ->  Hash  (cost=1628.36..1628.36 rows=181 width=4) (actual time=240.752..240.768 rows=90 loops=1)
                                                                                Buckets: 65536  Batches: 1  Memory Usage: 516kB
                                                                                ->  Seq Scan on date_dim date_dim_2  (cost=0.00..1628.36 rows=181 width=4) (actual time=145.921..240.615 rows=90 loops=1)
                                                                                      Filter: ((d_moy >= 1) AND (d_moy <= 3) AND (d_year = 1999))
                                                                                      Rows Removed by Filter: 72959
                                                  ->  Hash  (cost=48105.08..48105.08 rows=5958 width=4) (actual time=9644.638..9644.640 rows=72933 loops=1)
                                                        Buckets: 65536  Batches: 1  Memory Usage: 3077kB
                                                        ->  Redistribute Motion 3:3  (slice6; segments: 3)  (cost=1630.63..48105.08 rows=5958 width=4) (actual time=220.069..9613.018 rows=72961 loops=1)
                                                              Hash Key: web_sales.ws_bill_customer_sk
                                                              ->  Hash Join  (cost=1630.63..47985.91 rows=5958 width=4) (actual time=244.746..9383.130 rows=72345 loops=1)
                                                                    Hash Cond: (web_sales.ws_sold_date_sk = date_dim_1.d_date_sk)
                                                                    Extra Text: (seg0)   Hash chain length 1.0 avg, 2 max, using 89 of 65536 buckets.
                                                                    ->  Seq Scan on web_sales  (cost=0.00..39995.54 rows=2399154 width=8) (actual time=2.048..8538.197 rows=2399927 loops=1)
                                                                    ->  Hash  (cost=1628.36..1628.36 rows=181 width=4) (actual time=217.104..217.111 rows=90 loops=1)
                                                                          Buckets: 65536  Batches: 1  Memory Usage: 516kB
                                                                          ->  Seq Scan on date_dim date_dim_1  (cost=0.00..1628.36 rows=181 width=4) (actual time=146.144..216.987 rows=90 loops=1)
                                                                                Filter: ((d_moy >= 1) AND (d_moy <= 3) AND (d_year = 1999))
                                                                                Rows Removed by Filter: 72959
Planning Time: 13.085 ms
  (slice0)    Executor memory: 332K bytes.
  (slice1)    Executor memory: 641K bytes avg x 3x(0) workers, 642K bytes max (seg0).  Work_mem: 567K bytes max.
  (slice2)    Executor memory: 21688K bytes avg x 3x(0) workers, 21795K bytes max (seg0).  Work_mem: 10059K bytes max.
  (slice3)    Executor memory: 117K bytes avg x 3x(0) workers, 117K bytes max (seg0).
  (slice4)    Executor memory: 592K bytes avg x 3x(0) workers, 592K bytes max (seg0).  Work_mem: 516K bytes max.
  (slice5)    Executor memory: 592K bytes avg x 3x(0) workers, 592K bytes max (seg0).  Work_mem: 516K bytes max.
  (slice6)    Executor memory: 592K bytes avg x 3x(0) workers, 592K bytes max (seg0).  Work_mem: 516K bytes max.
Memory used:  128000kB
Optimizer: Postgres query optimizer
Execution Time: 60705.822 ms

without this PR, its plan looks like as below:

                                                                                                                    QUERY PLAN                                                                                                 
                    
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------
 Limit  (cost=327278.47..327278.94 rows=21 width=64) (actual time=67033.415..67033.638 rows=100 loops=1)
   ->  Gather Motion 3:1  (slice1; segments: 3)  (cost=327278.47..327278.94 rows=21 width=64) (actual time=67033.408..67033.543 rows=100 loops=1)
         Merge Key: customer_demographics.cd_gender, customer_demographics.cd_marital_status, customer_demographics.cd_education_status, customer_demographics.cd_purchase_estimate, customer_demographics.cd_credit_rating
         ->  Limit  (cost=327278.47..327278.67 rows=7 width=64) (actual time=66974.506..66974.922 rows=100 loops=1)
               ->  GroupAggregate  (cost=327278.47..327278.67 rows=7 width=64) (actual time=66974.496..66974.841 rows=100 loops=1)
                     Group Key: customer_demographics.cd_gender, customer_demographics.cd_marital_status, customer_demographics.cd_education_status, customer_demographics.cd_purchase_estimate, customer_demographics.cd_credit_rating
                     ->  Sort  (cost=327278.47..327278.49 rows=7 width=40) (actual time=66983.972..66983.996 rows=155 loops=1)
                           Sort Key: customer_demographics.cd_gender, customer_demographics.cd_marital_status, customer_demographics.cd_education_status, customer_demographics.cd_purchase_estimate, customer_demographics.cd_credit_rating
                           Sort Method:  quicksort  Memory: 538kB
                           ->  Redistribute Motion 3:3  (slice2; segments: 3)  (cost=316670.50..327278.38 rows=7 width=40) (actual time=65840.559..66961.990 rows=1536 loops=1)
                                 Hash Key: customer_demographics.cd_gender, customer_demographics.cd_marital_status, customer_demographics.cd_education_status, customer_demographics.cd_purchase_estimate, customer_demographics.cd_credit_rating
                                 ->  Hash Join  (cost=316670.50..327278.24 rows=7 width=40) (actual time=65560.100..66947.218 rows=1552 loops=1)
                                       Hash Cond: (customer_demographics.cd_demo_sk = c.c_current_cdemo_sk)
                                       Extra Text: (seg0)   Hash chain length 1.0 avg, 2 max, using 1535 of 65536 buckets.
                                       ->  Seq Scan on customer_demographics  (cost=0.00..8206.67 rows=640267 width=44) (actual time=1.595..1186.315 rows=641104 loops=1)
                                       ->  Hash  (cost=316670.44..316670.44 rows=5 width=4) (actual time=65533.021..65533.028 rows=1552 loops=1)
                                             Buckets: 65536  Batches: 1  Memory Usage: 567kB
                                             ->  Redistribute Motion 3:3  (slice3; segments: 3)  (cost=313385.12..316670.44 rows=5 width=4) (actual time=65327.338..65531.669 rows=1720 loops=1)
                                                   Hash Key: c.c_current_cdemo_sk
                                                   ->  Hash Join  (cost=313385.12..316670.35 rows=5 width=4) (actual time=64603.161..65569.004 rows=1605 loops=1)
                                                         Hash Cond: (c.c_current_addr_sk = ca.ca_address_sk)
                                                         Extra Text: (seg2)   Hash chain length 1.1 avg, 3 max, using 6094 of 65536 buckets.
                                                         ->  Redistribute Motion 3:3  (slice4; segments: 3)  (cost=311709.38..314994.55 rows=5 width=8) (actual time=64222.096..65170.794 rows=19965 loops=1)
                                                               Hash Key: c.c_current_addr_sk
                                                               ->  Hash Semi Join  (cost=311709.38..314994.44 rows=5 width=8) (actual time=64532.423..65508.199 rows=20020 loops=1)
                                                                     Hash Cond: (c.c_customer_sk = store_sales.ss_customer_sk)
                                                                     Extra Text: (seg2)   Hash chain length 14.2 avg, 70 max, using 19139 of 65536 buckets.
                                                                     ->  Hash Anti Join  (cost=142858.85..146143.84 rows=5 width=12) (actual time=25788.748..26645.087 rows=147307 loops=1)
                                                                           Hash Cond: (c.c_customer_sk = catalog_sales.cs_ship_customer_sk)
                                                                           Extra Text: (seg2)   Hash chain length 10.4 avg, 54 max, using 12857 of 65536 buckets.
                                                                           ->  Hash Anti Join  (cost=48179.56..51464.49 rows=6 width=12) (actual time=10919.503..11664.183 rows=161110 loops=1)
                                                                                 Hash Cond: (c.c_customer_sk = web_sales.ws_bill_customer_sk)
                                                                                 Extra Text: (seg2)   Hash chain length 12.8 avg, 47 max, using 5521 of 65536 buckets.
                                                                                 ->  Seq Scan on customer c  (cost=0.00..2840.67 rows=166667 width=12) (actual time=2.664..634.176 rows=166874 loops=1)
                                                                                 ->  Hash  (cost=48105.08..48105.08 rows=5958 width=4) (actual time=10916.233..10916.240 rows=72933 loops=1)
                                                                                       Buckets: 65536  Batches: 1  Memory Usage: 3077kB
                                                                                       ->  Redistribute Motion 3:3  (slice5; segments: 3)  (cost=1630.63..48105.08 rows=5958 width=4) (actual time=0.183..10887.458 rows=72961 loops=1)
                                                                                             Hash Key: web_sales.ws_bill_customer_sk
                                                                                             ->  Hash Join  (cost=1630.63..47985.91 rows=5958 width=4) (actual time=149.670..13783.116 rows=72345 loops=1)
                                                                                                   Hash Cond: (web_sales.ws_sold_date_sk = date_dim_1.d_date_sk)
                                                                                                   Extra Text: (seg0)   Hash chain length 1.0 avg, 2 max, using 89 of 65536 buckets.
                                                                                                   ->  Seq Scan on web_sales  (cost=0.00..39995.54 rows=2399154 width=8) (actual time=3.926..13095.786 rows=2399927 loops=1)
                                                                                                   ->  Hash  (cost=1628.36..1628.36 rows=181 width=4) (actual time=129.407..129.414 rows=90 loops=1)
                                                                                                         Buckets: 65536  Batches: 1  Memory Usage: 516kB
                                                                                                         ->  Seq Scan on date_dim date_dim_1  (cost=0.00..1628.36 rows=181 width=4) (actual time=73.261..129.265 rows=90 loops=1)
                                                                                                               Filter: ((d_moy >= 1) AND (d_moy <= 3) AND (d_year = 1999))
                                                                                                               Rows Removed by Filter: 72959
                                                                           ->  Hash  (cost=94530.99..94530.99 rows=11864 width=4) (actual time=14868.527..14868.528 rows=135738 loops=1)
                                                                                 Buckets: 65536  Batches: 1  Memory Usage: 5285kB
                                                                                 ->  Redistribute Motion 3:3  (slice6; segments: 3)  (cost=1630.63..94530.99 rows=11864 width=4) (actual time=0.185..14798.149 rows=136691 loops=1)
                                                                                       Hash Key: catalog_sales.cs_ship_customer_sk
                                                                                       ->  Hash Join  (cost=1630.63..94293.71 rows=11864 width=4) (actual time=8799.963..23497.911 rows=134965 loops=1)
                                                                                             Hash Cond: (catalog_sales.cs_sold_date_sk = date_dim_2.d_date_sk)
                                                                                             Extra Text: (seg1)   Hash chain length 1.0 avg, 2 max, using 89 of 65536 buckets.
                                                                                             ->  Seq Scan on catalog_sales  (cost=0.00..79942.42 rows=4800442 width=8) (actual time=2.342..22060.191 rows=4800991 loops=1)
                                                                                             ->  Hash  (cost=1628.36..1628.36 rows=181 width=4) (actual time=141.206..141.213 rows=90 loops=1)
                                                                                                   Buckets: 65536  Batches: 1  Memory Usage: 516kB
                                                                                                   ->  Seq Scan on date_dim date_dim_2  (cost=0.00..1628.36 rows=181 width=4) (actual time=74.353..141.085 rows=90 loops=1)
                                                                                                         Filter: ((d_moy >= 1) AND (d_moy <= 3) AND (d_year = 1999))
                                                                                                         Rows Removed by Filter: 72959
                                                                     ->  Hash  (cost=168566.32..168566.32 rows=22737 width=4) (actual time=38711.522..38711.523 rows=271539 loops=1)
                                                                           Buckets: 65536  Batches: 1  Memory Usage: 10059kB
                                                                           ->  Redistribute Motion 3:3  (slice7; segments: 3)  (cost=1630.63..168566.32 rows=22737 width=4) (actual time=373.350..38599.613 rows=291145 loops=1)
                                                                                 Hash Key: store_sales.ss_customer_sk
                                                                                 ->  Hash Join  (cost=1630.63..168111.58 rows=22737 width=4) (actual time=273.440..37468.344 rows=278197 loops=1)
                                                                                       Hash Cond: (store_sales.ss_sold_date_sk = date_dim.d_date_sk)
                                                                                       Extra Text: (seg1)   Hash chain length 1.0 avg, 2 max, using 89 of 65536 buckets.
                                                                                       ->  Seq Scan on store_sales  (cost=0.00..141042.79 rows=9600479 width=8) (actual time=7.207..32871.917 rows=9601560 loops=1)
                                                                                       ->  Hash  (cost=1628.36..1628.36 rows=181 width=4) (actual time=127.486..127.492 rows=90 loops=1)
                                                                                             Buckets: 65536  Batches: 1  Memory Usage: 516kB
                                                                                             ->  Seq Scan on date_dim  (cost=0.00..1628.36 rows=181 width=4) (actual time=60.732..127.370 rows=90 loops=1)
                                                                                                   Filter: ((d_moy >= 1) AND (d_moy <= 3) AND (d_year = 1999))
                                                                                                   Rows Removed by Filter: 72959
                                                         ->  Hash  (cost=1594.83..1594.83 rows=6472 width=4) (actual time=386.403..386.409 rows=6715 loops=1)
                                                               Buckets: 65536  Batches: 1  Memory Usage: 749kB
                                                               ->  Seq Scan on customer_address ca  (cost=0.00..1594.83 rows=6472 width=4) (actual time=5.309..369.984 rows=6715 loops=1)
                                                                     Filter: (ca_state = ANY ('{CO,IL,MN}'::bpchar[]))
                                                                     Rows Removed by Filter: 76651
 Planning Time: 48.201 ms
   (slice0)    Executor memory: 317K bytes.
   (slice1)    Executor memory: 367K bytes avg x 3x(0) workers, 368K bytes max (seg2).  Work_mem: 96K bytes max.
   (slice2)    Executor memory: 634K bytes avg x 3x(0) workers, 634K bytes max (seg0).  Work_mem: 567K bytes max.
   (slice3)    Executor memory: 837K bytes avg x 3x(0) workers, 858K bytes max (seg0).  Work_mem: 749K bytes max.
   (slice4)    Executor memory: 20354K bytes avg x 3x(0) workers, 20461K bytes max (seg0).  Work_mem: 10059K bytes max.
   (slice5)    Executor memory: 593K bytes avg x 3x(0) workers, 593K bytes max (seg0).  Work_mem: 516K bytes max.
   (slice6)    Executor memory: 593K bytes avg x 3x(0) workers, 593K bytes max (seg0).  Work_mem: 516K bytes max.
   (slice7)    Executor memory: 593K bytes avg x 3x(0) workers, 593K bytes max (seg0).  Work_mem: 516K bytes max.
 Memory used:  128000kB
 Optimizer: Postgres query optimizer
 Execution Time: 67127.038 ms

The execution time is 60s VS 67s.

tenderwg · 2025-02-18T14:08:45Z

By the way, I run the make installcheck-good and make installcheck-cbdb-parallel on my local machine. I didn't have a failed case like github ci reporting. In my two commits, the failed cases are different. Is the CI not stable?

avamingli · 2025-02-19T02:02:42Z

I only found a plan diff in regression test, which is notin test, but not anti join. I did the TPCDS benchmark; I found query 69, which includes "not exists"; there is a plan change

Thank you for your verification; I believe that's the one. I'll do further research later.

avamingli · 2025-02-19T02:12:27Z

By the way, I run the make installcheck-good and make installcheck-cbdb-parallel on my local machine. I didn't have a failed case like github ci reporting. In my two commits, the failed cases are different. Is the CI not stable?

The failure in this CI run indicates a resource limitation on the CI environment, which has been a persistent issue for us.

Similarly, the failure in this other run is also due to limited CI resources. We attempted a parallel refresh of a materialized view, but it proves ineffective with such constrained resources.

Previously, there was a workaround mentioned in #892, but it was not an ideal solution and has since been closed.

src/test/regress/expected/join_gp.out

my-ship-it requested a review from avamingli February 17, 2025 08:41

Fix notin plan diff.

f2429e6

avamingli reviewed Feb 19, 2025

View reviewed changes

src/test/regress/expected/join_gp.out Show resolved Hide resolved

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix wrong join_rel size estimates for anti join. #934

Fix wrong join_rel size estimates for anti join. #934

tenderwg commented Feb 17, 2025

avamingli commented Feb 18, 2025

tenderwg commented Feb 18, 2025 •

edited

Loading

tenderwg commented Feb 18, 2025

avamingli commented Feb 19, 2025

avamingli commented Feb 19, 2025

Fix wrong join_rel size estimates for anti join. #934

Are you sure you want to change the base?

Fix wrong join_rel size estimates for anti join. #934

Conversation

tenderwg commented Feb 17, 2025

avamingli commented Feb 18, 2025

tenderwg commented Feb 18, 2025 • edited Loading

tenderwg commented Feb 18, 2025

avamingli commented Feb 19, 2025

avamingli commented Feb 19, 2025

tenderwg commented Feb 18, 2025 •

edited

Loading