Open
Description
I tested Q19a query of JOB benchmark in the latest PG17beta1 version. I found the following problems which indicate that hints cannot totally force the decisions of optimizer and optimizer will not follow the hints in some cases.
- Leading hint only: work correctly!
/*+ Leading((rt (it ((n (chn (mc (mi (t (ci an)))))) cn)))) */
EXPLAIN (FORMAT TEXT)
SELECT MIN(n.name) AS voicing_actress,
MIN(t.title) AS voiced_movie
FROM aka_name AS an,
char_name AS chn,
cast_info AS ci,
company_name AS cn,
info_type AS it,
movie_companies AS mc,
movie_info AS mi,
name AS n,
role_type AS rt,
title AS t
WHERE ci.note IN ('(voice)',
'(voice: Japanese version)',
'(voice) (uncredited)',
'(voice: English version)')
AND cn.country_code ='[us]'
AND it.info = 'release dates'
AND mc.note IS NOT NULL
AND (mc.note LIKE '%(USA)%'
OR mc.note LIKE '%(worldwide)%')
AND mi.info IS NOT NULL
AND (mi.info LIKE 'Japan:%200%'
OR mi.info LIKE 'USA:%200%')
AND n.gender ='f'
AND n.name LIKE '%Ang%'
AND rt.role ='actress'
AND t.production_year BETWEEN 2005 AND 2009
AND t.id = mi.movie_id
AND t.id = mc.movie_id
AND t.id = ci.movie_id
AND mc.movie_id = ci.movie_id
AND mc.movie_id = mi.movie_id
AND mi.movie_id = ci.movie_id
AND cn.id = mc.company_id
AND it.id = mi.info_type_id
AND n.id = ci.person_id
AND rt.id = ci.role_id
AND n.id = an.person_id
AND ci.person_id = an.person_id
AND chn.id = ci.person_role_id;
Execution plan is:
Aggregate (cost=1129471.94..1129471.95 rows=1 width=64)
-> Nested Loop (cost=1031727.69..1129471.94 rows=1 width=32)
Join Filter: (rt.id = ci.role_id)
-> Seq Scan on role_type rt (cost=0.00..18.88 rows=4 width=4)
Filter: ((role)::text = 'actress'::text)
-> Materialize (cost=1031727.69..1129453.01 rows=1 width=36)
-> Nested Loop (cost=1031727.69..1129453.00 rows=1 width=36)
Join Filter: (it.id = mi.info_type_id)
-> Seq Scan on info_type it (cost=0.00..2.41 rows=1 width=4)
Filter: ((info)::text = 'release dates'::text)
-> Nested Loop (cost=1031727.69..1129449.85 rows=59 width=40)
-> Hash Join (cost=1031727.27..1129253.56 rows=163 width=44)
Hash Cond: (n.id = ci.person_id)
-> Gather (cost=1000.00..83655.92 rows=9516 width=19)
Workers Planned: 2
-> Parallel Seq Scan on name n (cost=0.00..81704.32 rows=3965 width=19)
Filter: ((name ~~ '%Ang%'::text) AND ((gender)::text = 'f'::text))
-> Hash (cost=1029835.88..1029835.88 rows=71311 width=37)
-> Merge Join (cost=928724.12..1029835.88 rows=71311 width=37)
Merge Cond: (chn.id = ci.person_role_id)
-> Index Only Scan using char_name_pkey on char_name chn (cost=0.43..92048.75 rows=3140288 width=4)
-> Materialize (cost=928688.51..929418.41 rows=145979 width=41)
-> Sort (cost=928688.51..929053.46 rows=145979 width=41)
Sort Key: ci.person_role_id
-> Gather (cost=861411.88..911672.87 rows=145979 width=41)
Workers Planned: 2
-> Parallel Hash Join (cost=860411.88..896074.97 rows=60825 width=41)
Hash Cond: (mc.movie_id = t.id)
-> Parallel Seq Scan on movie_companies mc (cost=0.00..35131.06 rows=128227 width=8)
Filter: ((note IS NOT NULL) AND ((note ~~ '%(USA)%'::text) OR (note ~~ '%(worldwide)%'::text))
)
-> Parallel Hash (cost=859150.63..859150.63 rows=100900 width=49)
-> Parallel Hash Join (cost=601038.73..859150.63 rows=100900 width=49)
Hash Cond: (mi.movie_id = t.id)
-> Parallel Seq Scan on movie_info mi (cost=0.00..253676.43 rows=226580 width=8)
Filter: ((info ~~ 'Japan:%200%'::text) OR (info ~~ 'USA:%200%'::text))
-> Parallel Hash (cost=597052.41..597052.41 rows=187225 width=41)
-> Parallel Hash Join (cost=525612.09..597052.41 rows=187225 width=41)
Hash Cond: (t.id = ci.movie_id)
-> Parallel Seq Scan on title t (cost=0.00..51808.72 rows=235421 width=21)
Filter: ((production_year >= 2005) AND (production_year <= 2009))
-> Parallel Hash (cost=510231.45..510231.45 rows=837731 width=20)
-> Parallel Hash Join (cost=21334.10..510231.45 rows=837731 width=20
)
Hash Cond: (ci.person_id = an.person_id)
-> Parallel Seq Scan on cast_info ci (cost=0.00..479269.70 row
s=345362 width=16)
Filter: (note = ANY ('{(voice),"(voice: Japanese version)"
,"(voice) (uncredited)","(voice: English version)"}'::text[]))
-> Parallel Hash (cost=15171.60..15171.60 rows=375560 width=4)
-> Parallel Seq Scan on aka_name an (cost=0.00..15171.60
rows=375560 width=4)
-> Index Scan using company_name_pkey on company_name cn (cost=0.42..1.20 rows=1 width=4)
Index Cond: (id = mc.company_id)
Filter: ((country_code)::text = '[us]'::text)
- Leading hint + join methods hint: wrong results.
/*+
Leading((rt (it ((n (chn (mc (mi (t (ci an)))))) cn))))
HashJoin(ci an)
NestLoop(ci an t)
NestLoop(ci an t mi)
NestLoop(ci an t mi mc)
NestLoop(ci an t mi mc chn)
NestLoop(ci an t mi mc chn n)
NestLoop(ci an t mi mc chn n cn)
NestLoop(ci an t mi mc chn n cn it)
NestLoop(ci an t mi mc chn n cn it rt)
*/
EXPLAIN (FORMAT TEXT)
SELECT MIN(n.name) AS voicing_actress,
MIN(t.title) AS voiced_movie
FROM aka_name AS an,
char_name AS chn,
cast_info AS ci,
company_name AS cn,
info_type AS it,
movie_companies AS mc,
movie_info AS mi,
name AS n,
role_type AS rt,
title AS t
WHERE ci.note IN ('(voice)',
'(voice: Japanese version)',
'(voice) (uncredited)',
'(voice: English version)')
AND cn.country_code ='[us]'
AND it.info = 'release dates'
AND mc.note IS NOT NULL
AND (mc.note LIKE '%(USA)%'
OR mc.note LIKE '%(worldwide)%')
AND mi.info IS NOT NULL
AND (mi.info LIKE 'Japan:%200%'
OR mi.info LIKE 'USA:%200%')
AND n.gender ='f'
AND n.name LIKE '%Ang%'
AND rt.role ='actress'
AND t.production_year BETWEEN 2005 AND 2009
AND t.id = mi.movie_id
AND t.id = mc.movie_id
AND t.id = ci.movie_id
AND mc.movie_id = ci.movie_id
AND mc.movie_id = mi.movie_id
AND mi.movie_id = ci.movie_id
AND cn.id = mc.company_id
AND it.id = mi.info_type_id
AND n.id = ci.person_id
AND rt.id = ci.role_id
AND n.id = an.person_id
AND ci.person_id = an.person_id
AND chn.id = ci.person_role_id;
Execution plan is:
Aggregate (cost=27678801700.75..27678801700.76 rows=1 width=64)
-> Nested Loop (cost=20000031697.50..27678801700.75 rows=1 width=32)
Join Filter: (rt.id = ci.role_id)
-> Seq Scan on role_type rt (cost=0.00..18.88 rows=4 width=4)
Filter: ((role)::text = 'actress'::text)
-> Materialize (cost=20000031697.50..27678801681.81 rows=1 width=36)
-> Nested Loop (cost=20000031697.50..27678801681.81 rows=1 width=36)
Join Filter: (it.id = mi.info_type_id)
-> Seq Scan on info_type it (cost=0.00..2.41 rows=1 width=4)
Filter: ((info)::text = 'release dates'::text)
-> Nested Loop (cost=20000031697.50..27678801678.66 rows=59 width=40)
-> Nested Loop (cost=20000031697.08..27678801482.37 rows=163 width=44)
Join Filter: (n.id = ci.person_id)
-> Seq Scan on name n (cost=0.00..118171.96 rows=9516 width=19)
Filter: ((name ~~ '%Ang%'::text) AND ((gender)::text = 'f'::text))
-> Materialize (cost=20000031697.08..27663195186.54 rows=71311 width=37)
-> Nested Loop (cost=20000031697.08..27663194271.99 rows=71311 width=37)
-> Nested Loop (cost=10000031696.65..17663060878.15 rows=145979 width=41)
Join Filter: (t.id = mc.movie_id)
-> Seq Scan on movie_companies mc (cost=0.00..57960.93 rows=307745 width=8)
Filter: ((note IS NOT NULL) AND ((note ~~ '%(USA)%'::text) OR (note ~~ '%(worldwide)%'::text)))
-> Materialize (cost=10000031696.65..15817340640.79 rows=242159 width=49)
-> Nested Loop (cost=10000031696.65..15817337065.00 rows=242159 width=49)
Join Filter: (t.id = mi.movie_id)
-> Seq Scan on movie_info mi (cost=0.00..382516.23 rows=543793 width=8)
Filter: ((info ~~ 'Japan:%200%'::text) OR (info ~~ 'USA:%200%'::text))
-> Materialize (cost=10000031696.65..10003749915.92 rows=449341 width=41)
-> Nested Loop (cost=10000031696.65..10003743719.21 rows=449341 width=41)
-> Hash Join (cost=31696.22..852385.40 rows=2010555 width=20)
Hash Cond: (ci.person_id = an.person_id)
-> Seq Scan on cast_info ci (cost=0.00..796439.28 rows=828870 width=16)
Filter: (note = ANY ('{(voice),"(voice: Japanese version)","(voice) (uncredited)",
"(voice: English version)"}'::text[]))
-> Hash (cost=20429.43..20429.43 rows=901343 width=4)
-> Seq Scan on aka_name an (cost=0.00..20429.43 rows=901343 width=4)
-> Index Scan using title_pkey on title t (cost=0.43..1.44 rows=1 width=21)
Index Cond: (id = ci.movie_id)
Filter: ((production_year >= 2005) AND (production_year <= 2009))
-> Index Only Scan using char_name_pkey on char_name chn (cost=0.43..0.91 rows=1 width=4)
Index Cond: (id = ci.person_role_id)
-> Index Scan using company_name_pkey on company_name cn (cost=0.42..1.20 rows=1 width=4)
Index Cond: (id = mc.company_id)
Filter: ((country_code)::text = '[us]'::text)
Actual join order and join algorithms:
(rt (it ((n ((mc (mi ((ci an) t))) chn)) cn)))
HashJoin(ci an)
NestLoop(ci an t)
NestLoop(ci an t mi)
NestLoop(ci an t mi mc)
NestLoop(ci an t mi mc chn)
NestLoop(ci an t mi mc chn n)
NestLoop(ci an t mi mc chn n cn)
We could see that:
- both table
t
andchn
are not following the leading hint((rt (it ((n (chn (mc (mi (t (ci an)))))) cn))))
. - both table
t
andchn
use index scan, so they are used as the inner tables.
- Leading hint + join methods hint + access methods hint: work correctly!.
If we add hintsSeqScan(t) SeqScan(chn)
to force the access methods, the join order will become right.
/*+
SeqScan(t) SeqScan(chn)
Leading((rt (it ((n (chn (mc (mi (t (ci an)))))) cn))))
HashJoin(ci an)
NestLoop(ci an t)
NestLoop(ci an t mi)
NestLoop(ci an t mi mc)
NestLoop(ci an t mi mc chn)
NestLoop(ci an t mi mc chn n)
NestLoop(ci an t mi mc chn n cn)
NestLoop(ci an t mi mc chn n cn it)
NestLoop(ci an t mi mc chn n cn it rt)
*/
EXPLAIN (FORMAT TEXT)
SELECT MIN(n.name) AS voicing_actress,
MIN(t.title) AS voiced_movie
FROM aka_name AS an,
char_name AS chn,
cast_info AS ci,
company_name AS cn,
info_type AS it,
movie_companies AS mc,
movie_info AS mi,
name AS n,
role_type AS rt,
title AS t
WHERE ci.note IN ('(voice)',
'(voice: Japanese version)',
'(voice) (uncredited)',
'(voice: English version)')
AND cn.country_code ='[us]'
AND it.info = 'release dates'
AND mc.note IS NOT NULL
AND (mc.note LIKE '%(USA)%'
OR mc.note LIKE '%(worldwide)%')
AND mi.info IS NOT NULL
AND (mi.info LIKE 'Japan:%200%'
OR mi.info LIKE 'USA:%200%')
AND n.gender ='f'
AND n.name LIKE '%Ang%'
AND rt.role ='actress'
AND t.production_year BETWEEN 2005 AND 2009
AND t.id = mi.movie_id
AND t.id = mc.movie_id
AND t.id = ci.movie_id
AND mc.movie_id = ci.movie_id
AND mc.movie_id = mi.movie_id
AND mi.movie_id = ci.movie_id
AND cn.id = mc.company_id
AND it.id = mi.info_type_id
AND n.id = ci.person_id
AND rt.id = ci.role_id
AND n.id = an.person_id
AND ci.person_id = an.person_id
AND chn.id = ci.person_role_id;
Execution plan is:
Aggregate (cost=42280242141.33..42280242141.34 rows=1 width=64)
-> Nested Loop (cost=32696.64..42280242141.32 rows=1 width=32)
Join Filter: (rt.id = ci.role_id)
-> Seq Scan on role_type rt (cost=0.00..18.88 rows=4 width=4)
Filter: ((role)::text = 'actress'::text)
-> Materialize (cost=32696.64..42280242122.39 rows=1 width=36)
-> Nested Loop (cost=32696.64..42280242122.39 rows=1 width=36)
Join Filter: (it.id = mi.info_type_id)
-> Seq Scan on info_type it (cost=0.00..2.41 rows=1 width=4)
Filter: ((info)::text = 'release dates'::text)
-> Nested Loop (cost=32696.64..42280242119.24 rows=59 width=40)
-> Nested Loop (cost=32696.22..42280241922.95 rows=163 width=44)
Join Filter: (n.id = ci.person_id)
-> Seq Scan on name n (cost=0.00..118171.96 rows=9516 width=19)
Filter: ((name ~~ '%Ang%'::text) AND ((gender)::text = 'f'::text))
-> Materialize (cost=32696.22..42264635627.12 rows=71311 width=37)
-> Nested Loop (cost=32696.22..42264634712.57 rows=71311 width=37)
Join Filter: (chn.id = ci.person_role_id)
-> Seq Scan on char_name chn (cost=0.00..67850.88 rows=3140288 width=4)
-> Materialize (cost=32696.22..31356197189.35 rows=145979 width=41)
-> Nested Loop (cost=32696.22..31356195175.46 rows=145979 width=41)
Join Filter: (t.id = mc.movie_id)
-> Seq Scan on movie_companies mc (cost=0.00..57960.93 rows=307745 width=8)
Filter: ((note IS NOT NULL) AND ((note ~~ '%(USA)%'::text) OR (note ~~ '%(worldwide)%'::text)))
-> Materialize (cost=32696.22..29510474938.10 rows=242159 width=49)
-> Nested Loop (cost=32696.22..29510471362.30 rows=242159 width=49)
Join Filter: (t.id = mi.movie_id)
-> Seq Scan on movie_info mi (cost=0.00..382516.23 rows=543793 width=8)
Filter: ((info ~~ 'Japan:%200%'::text) OR (info ~~ 'USA:%200%'::text))
-> Materialize (cost=32696.22..23696884213.22 rows=449341 width=41)
-> Nested Loop (cost=32696.22..23696878016.52 rows=449341 width=41)
Join Filter: (t.id = ci.movie_id)
-> Seq Scan on title t (cost=0.00..73929.74 rows=565011 width=21)
Filter: ((production_year >= 2005) AND (production_year <= 2009))
-> Materialize (cost=32696.22..640936.59 rows=2010555 width=20)
-> Hash Join (cost=32696.22..619102.82 rows=2010555 width=20)
Hash Cond: (ci.person_id = an.person_id)
-> Gather (cost=1000.00..563156.70 rows=828870 width=16)
Workers Planned: 2
-> Parallel Seq Scan on cast_info ci (cost=0.00..479269.70 rows=3453
62 width=16)
Filter: (note = ANY ('{(voice),"(voice: Japanese version)","(voi
ce) (uncredited)","(voice: English version)"}'::text[]))
-> Hash (cost=20429.43..20429.43 rows=901343 width=4)
-> Seq Scan on aka_name an (cost=0.00..20429.43 rows=901343 width=4)
-> Index Scan using company_name_pkey on company_name cn (cost=0.42..1.20 rows=1 width=4)
Index Cond: (id = mc.company_id)
Filter: ((country_code)::text = '[us]'::text)
Based on the above results, we can get the following conclusions:
- If we only use the leading hint, the optimizer will choose the appropriate join methods to follow the specified join order.
- If we use both leading hint and join methods hint, the join order will not strictly follow the join order hint in some cases. For example, in our example the optimizer choose to use the index nested loop join rather than the normal nested loop join. So it put
t
andchn
as the inner table in the nest loop join. - Use only leading hint means the optimizer has some freedom to choose the join methods and access methods.
- The pg_hint_plan has bugs that optimizer's decisions cannot be totally forced and controlled by hints when some hints combinations are used. This is a bug of the hint and need to be fixed. There is a similar issue on github about this problem.
Recommendation: fix this bug by completely forcing the optimizer following the hints such as a combination hints with join order and join algorithms!