-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Optimizer could choose HashAggregate to dedup based on cost. #659
Comments
For Postgres planner, in function
|
Do we have reproduce SQL steps? |
Please check the description of this issue above. You can create a very simple table, and insert some data. Then, you can use the above SQL statements to reproduce this issue. |
Hi all, I try to reproduce with some data, but got the opposite result: with_cte is slower than normal agg. create table ao(id int) using ao_row;
insert into ao select i from generate_series(1, 1000000) i;
anaylze; Normal AGG explain(analyze) select count(distinct id) from ao ;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
-------
Finalize Aggregate (cost=4268.72..4268.73 rows=1 width=8) (actual time=235.218..235.221 rows=1 loops=1)
-> Gather Motion 3:1 (slice1; segments: 3) (cost=4268.67..4268.72 rows=3 width=8) (actual time=188.897..235.179 rows=3 lo
ops=1)
-> Partial Aggregate (cost=4268.67..4268.68 rows=1 width=8) (actual time=234.069..234.071 rows=1 loops=1)
-> Seq Scan on ao (cost=0.00..3435.33 rows=333333 width=4) (actual time=0.366..98.303 rows=334042 loops=1)
Planning Time: 0.716 ms
(slice0) Executor memory: 114K bytes.
(slice1) Executor memory: 12290K bytes avg x 3x(0) workers, 12290K bytes max (seg0).
Memory used: 128000kB
Optimizer: Postgres query optimizer
Execution Time: 236.235 ms
(10 rows) Use a CTE SQL: explain(analyze) WITH UniqueIDs AS (
SELECT DISTINCT id
FROM ao
)
SELECT COUNT(*)
FROM UniqueIDs;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
-------
Finalize Aggregate (cost=8435.39..8435.40 rows=1 width=8) (actual time=674.920..674.923 rows=1 loops=1)
-> Gather Motion 3:1 (slice1; segments: 3) (cost=8435.33..8435.38 rows=3 width=8) (actual time=633.443..674.883 rows=3 lo
ops=1)
-> Partial Aggregate (cost=8435.33..8435.34 rows=1 width=8) (actual time=623.181..623.184 rows=1 loops=1)
-> HashAggregate (cost=4268.67..7602.00 rows=333333 width=4) (actual time=419.478..583.788 rows=334042 loops=1
)
Group Key: ao.id
-> Seq Scan on ao (cost=0.00..3435.33 rows=333333 width=4) (actual time=0.534..117.273 rows=334042 loops
=1)
Planning Time: 0.888 ms
(slice0) Executor memory: 12333K bytes.
(slice1) Executor memory: 22960K bytes avg x 3x(0) workers, 22960K bytes max (seg0). Work_mem: 30737K bytes max.
Memory used: 128000kB
Optimizer: Postgres query optimizer
Execution Time: 696.665 ms
(12 rows) |
Same for ORCA: explain(analyze) select count(distinct id) from ao ;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
---
Finalize Aggregate (cost=0.00..440.60 rows=1 width=8) (actual time=326.314..326.317 rows=1 loops=1)
-> Gather Motion 3:1 (slice1; segments: 3) (cost=0.00..440.60 rows=1 width=8) (actual time=236.042..326.277 rows=3 loops=
1)
-> Partial Aggregate (cost=0.00..440.60 rows=1 width=8) (actual time=234.961..234.964 rows=1 loops=1)
-> Seq Scan on ao (cost=0.00..437.97 rows=333334 width=4) (actual time=0.354..105.905 rows=334042 loops=1)
Planning Time: 29.959 ms
(slice0) Executor memory: 114K bytes.
(slice1) Executor memory: 12290K bytes avg x 3x(0) workers, 12290K bytes max (seg0).
Memory used: 128000kB
Optimizer: Pivotal Optimizer (GPORCA)
Execution Time: 327.857 ms
(10 rows) Use CTE SQL: explain(analyze) WITH UniqueIDs AS (
SELECT DISTINCT id
FROM ao
)
SELECT COUNT(*)
FROM UniqueIDs;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
----
Finalize Aggregate (cost=0.00..480.67 rows=1 width=8) (actual time=1068.652..1068.655 rows=1 loops=1)
-> Gather Motion 3:1 (slice1; segments: 3) (cost=0.00..480.67 rows=1 width=8) (actual time=789.968..1068.613 rows=3 loops
=1)
-> Partial Aggregate (cost=0.00..480.67 rows=1 width=8) (actual time=785.682..785.685 rows=1 loops=1)
-> HashAggregate (cost=0.00..480.67 rows=333334 width=1) (actual time=435.946..748.697 rows=334042 loops=1)
Group Key: id
-> Seq Scan on ao (cost=0.00..437.97 rows=333334 width=4) (actual time=0.534..120.923 rows=334042 loops=
1)
Planning Time: 41.497 ms
(slice0) Executor memory: 12333K bytes.
(slice1) Executor memory: 22960K bytes avg x 3x(0) workers, 22960K bytes max (seg0). Work_mem: 30737K bytes max.
Memory used: 128000kB
Optimizer: Pivotal Optimizer (GPORCA)
Execution Time: 1089.742 ms
(12 rows) |
I think you had reproduced the problem: we generated two different plans for those two SQL statements which are functionally equivalent. One uses the GroupAggregate node to perform deduplication, while other uses the HashAggregate to perform deduplication. You can easily create a case where HashAggregate for deduplication is more efficient than GroupAggregate. Based on the table you created, try this:
And then, run the tests again. |
Got it, now this case, CTE works better. |
I debug this for a while: For CTE SQL, the HashAgg on SeqScan came from CTE path, and planner do a count agg on the CTE subquery, so the plan has an additional DEDUP HashAgg node, and it must be that one. For SQL: The cost of HashAggregate to dedup may need to be adjusted. |
Yes, in fact, dedup + count is much faster than count(distinct) in some cases. We need to adjust cost estimation very carefully. |
Correction: after some debug, I found there is no such a path, explain(analyze) select count(distinct id) from ao ;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
-----------------
Aggregate (cost=20917.17..20917.18 rows=1 width=8) (actual time=1237.827..1237.830 rows=1 loops=1)
-> Gather Motion 3:1 (slice1; segments: 3) (cost=20898.00..20914.67 rows=1000 width=4) (actual time=1131.395..1237.586 ro
ws=1000 loops=1)
-> HashAggregate (cost=20898.00..20901.33 rows=333 width=4) (actual time=1212.114..1212.211 rows=340 loops=1)
Group Key: id
-> Seq Scan on ao (cost=0.00..14071.33 rows=1365333 width=4) (actual time=1.418..646.267 rows=1392640 loops=1)
Planning Time: 0.761 ms
(slice0) Executor memory: 114K bytes.
(slice1) Executor memory: 286K bytes avg x 3x(0) workers, 286K bytes max (seg2). Work_mem: 61K bytes max.
Memory used: 128000kB
Optimizer: Postgres query optimizer
Execution Time: 1238.826 ms
(11 rows) That's not we expected, it gather all live tuples to do agg on master. |
Hi all, explain(costs off)
select count(distinct a) from t_issue_659;
QUERY PLAN
-------------------------------------------------
Finalize Aggregate
-> Gather Motion 3:1 (slice1; segments: 3)
-> Partial Aggregate
-> HashAggregate
Group Key: a
-> Seq Scan on t_issue_659
Optimizer: Postgres query optimizer
(7 rows) Planner may choose the new plan by cost. set gp_eager_distinct_dedup = on; Set it to true will make planner use our new plan. |
Does it only work when the key of distinct is the same as the distribution key, or it doesn't matter? |
GPDB already have 3-phase agg plan when distinct key is not same with distribution key or group key. After #676 , it should work for both cases, whether distinct key is same with distribution key or not. |
Close due to already fixed. |
Hi Team, Eddie provided us with a test RPM package containing the single fix for issue #676 (already cherry-picked and built into v1.6.0): We conducted some real-data testing, hoping for better performance with optimizer=on (needed for our typical workload) and gp_eager_distinct_dedup=on. Unfortunately, the PostgreSQL optimizer still outperforms ORCA by a factor of two. We've attached the EXPLAIN ANALYZE output for your reference. Best regards, Andreas |
I am reopening this so there can be a discussion in response to aeschbacherEUF comment. |
`set gp_eager_distinct_dedup = on; explain (analyze, verbose, costs, timing, buffers) select count(distinct gsp_id) from eft_com_call_statistic;`
|
Hi @aeschbacherEUF, please try it with optimizer = off; |
We found ORCA already have the path. Maybe need to adjust the cost. |
my test case is
and the ORCA memo is:
then the lowest cost path is:
let us see the group 7(only one path to do the seq scan):
then below path can add
The parent path of path abc So i guess we need add a path in the ORCA... |
Good analysis, yes, I think so... |
Cloudberry Database version
main
What happened
Optimizer could produce better plan to dedup based on cost.
What you think should happen instead
No response
How to reproduce
For the planner, we need to create better plan which uses HashAggregate to dedup columns.
For example,
This plan uses HashAggregate to perform gsp_id deduplication, while
It seems the cost estimation is no accurate.
Operating System
No specific
Anything else
No
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: