Skip to content

[config](load) improve in-memory aggregation triggering for AGG KEY tables #52305

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

kaijchen
Copy link
Contributor

@kaijchen kaijchen commented Jun 25, 2025

What problem does this PR solve?

This commit makes two related changes to enable more efficient and timely
in-memory aggregation during data loading:

  1. Reduce the default value of write_buffer_size_for_agg from 400MB to 100MB.
    The previous default exceeded the write_buffer_size (200MB), causing
    need_agg() to never trigger before need_flush(), effectively skipping
    the intended aggregation step.

  2. Refine the need_agg() logic to be based on memory growth since the last
    aggregation
    , rather than total memory usage. This is tracked using a new
    _last_agg_pos field, which is updated after each aggregation. This prevents
    repeated aggregation when memory usage remains stagnant and allows for
    more adaptive and efficient memory management.

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@kaijchen
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 33930 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit e0e751a3b92293299a11a48aaa40e598aa14d08f, data reload: false

------ Round 1 ----------------------------------
q1	17592	5246	5069	5069
q2	1951	305	189	189
q3	10279	1326	708	708
q4	10222	1037	531	531
q5	7549	2419	2328	2328
q6	187	162	129	129
q7	917	746	616	616
q8	9327	1294	1007	1007
q9	6943	5131	5070	5070
q10	6893	2395	1980	1980
q11	485	300	283	283
q12	333	348	229	229
q13	17800	3674	3072	3072
q14	227	229	214	214
q15	558	477	487	477
q16	422	433	365	365
q17	579	839	388	388
q18	7960	7226	7146	7146
q19	1224	949	557	557
q20	329	343	225	225
q21	3713	3187	2387	2387
q22	1067	1036	960	960
Total cold run time: 106557 ms
Total hot run time: 33930 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5276	5132	5044	5044
q2	267	315	222	222
q3	2189	2668	2285	2285
q4	1365	1825	1341	1341
q5	4262	4509	4476	4476
q6	213	172	125	125
q7	2069	1969	1810	1810
q8	2571	2530	2524	2524
q9	7184	7140	7229	7140
q10	3104	3252	2834	2834
q11	600	539	497	497
q12	695	777	655	655
q13	3525	3860	3269	3269
q14	302	296	280	280
q15	522	466	466	466
q16	441	495	450	450
q17	1158	1526	1386	1386
q18	7763	7603	7469	7469
q19	821	745	854	745
q20	1970	2045	1918	1918
q21	4863	4369	4272	4272
q22	1043	1072	1020	1020
Total cold run time: 52203 ms
Total hot run time: 50228 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 186311 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit e0e751a3b92293299a11a48aaa40e598aa14d08f, data reload: false

query1	996	399	412	399
query2	6532	1855	1867	1855
query3	6749	213	211	211
query4	26307	23518	23282	23282
query5	4423	648	509	509
query6	317	213	202	202
query7	4626	515	311	311
query8	294	225	219	219
query9	8651	2612	2644	2612
query10	474	335	270	270
query11	15864	15089	14910	14910
query12	168	110	110	110
query13	1668	561	410	410
query14	9509	6237	6294	6237
query15	210	200	182	182
query16	7315	655	438	438
query17	1196	713	583	583
query18	2041	413	317	317
query19	196	196	163	163
query20	122	117	119	117
query21	214	130	111	111
query22	4264	4114	4085	4085
query23	34110	33129	33185	33129
query24	8452	2403	2417	2403
query25	539	463	406	406
query26	1229	267	151	151
query27	2748	516	344	344
query28	4270	2170	2118	2118
query29	775	547	452	452
query30	283	220	203	203
query31	965	842	778	778
query32	72	64	66	64
query33	555	367	337	337
query34	805	833	528	528
query35	805	824	725	725
query36	969	989	873	873
query37	111	104	78	78
query38	4125	4192	4109	4109
query39	1497	1436	1404	1404
query40	215	121	106	106
query41	62	59	58	58
query42	136	110	106	106
query43	509	496	478	478
query44	1296	821	813	813
query45	175	172	162	162
query46	836	1037	647	647
query47	1780	1816	1696	1696
query48	403	425	306	306
query49	775	506	393	393
query50	632	666	410	410
query51	4166	4150	4127	4127
query52	114	116	99	99
query53	234	254	187	187
query54	576	589	500	500
query55	87	84	86	84
query56	298	296	290	290
query57	1179	1199	1122	1122
query58	262	257	258	257
query59	2644	2731	2642	2642
query60	325	319	338	319
query61	130	136	145	136
query62	797	720	701	701
query63	223	198	193	193
query64	4319	995	667	667
query65	4242	4177	4253	4177
query66	1116	460	347	347
query67	15925	15597	15300	15300
query68	8233	890	526	526
query69	486	307	275	275
query70	1202	1138	1085	1085
query71	510	327	300	300
query72	5581	4737	4692	4692
query73	719	594	364	364
query74	9169	9198	8942	8942
query75	3918	3197	2679	2679
query76	3724	1179	766	766
query77	791	373	303	303
query78	10016	10201	9312	9312
query79	2288	820	581	581
query80	616	514	433	433
query81	480	256	228	228
query82	422	126	98	98
query83	294	254	239	239
query84	296	108	86	86
query85	798	376	315	315
query86	338	304	302	302
query87	4365	4485	4307	4307
query88	3129	2275	2264	2264
query89	451	311	288	288
query90	1941	215	211	211
query91	144	138	112	112
query92	79	57	55	55
query93	1147	958	580	580
query94	679	377	310	310
query95	378	290	285	285
query96	486	570	286	286
query97	2728	2764	2674	2674
query98	229	211	202	202
query99	1460	1380	1253	1253
Total cold run time: 274999 ms
Total hot run time: 186311 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 29.13 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit e0e751a3b92293299a11a48aaa40e598aa14d08f, data reload: false

query1	0.03	0.03	0.03
query2	0.07	0.04	0.03
query3	0.24	0.07	0.06
query4	1.61	0.10	0.11
query5	0.44	0.43	0.42
query6	1.14	0.67	0.66
query7	0.02	0.02	0.02
query8	0.05	0.04	0.03
query9	0.58	0.52	0.52
query10	0.56	0.57	0.58
query11	0.15	0.11	0.11
query12	0.15	0.11	0.12
query13	0.63	0.60	0.60
query14	0.83	0.81	0.81
query15	0.89	0.84	0.86
query16	0.38	0.38	0.40
query17	1.09	1.06	1.05
query18	0.22	0.22	0.21
query19	2.02	1.88	1.84
query20	0.02	0.02	0.02
query21	15.40	0.90	0.53
query22	0.75	1.19	0.67
query23	14.94	1.37	0.65
query24	7.04	2.31	0.38
query25	0.33	0.17	0.08
query26	0.57	0.16	0.13
query27	0.06	0.05	0.05
query28	9.39	0.86	0.45
query29	12.59	4.04	3.35
query30	0.25	0.09	0.06
query31	2.83	0.61	0.40
query32	3.23	0.55	0.47
query33	3.02	3.15	3.14
query34	16.12	5.41	4.75
query35	4.83	4.82	4.80
query36	0.66	0.49	0.49
query37	0.09	0.06	0.06
query38	0.05	0.04	0.03
query39	0.03	0.02	0.02
query40	0.16	0.14	0.14
query41	0.08	0.03	0.02
query42	0.04	0.02	0.02
query43	0.03	0.03	0.03
Total cold run time: 103.61 s
Total hot run time: 29.13 s

@hello-stephen
Copy link
Contributor

BE UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 57.05% (15359/26920)
Line Coverage 46.14% (139388/302098)
Region Coverage 45.48% (70625/155283)
Branch Coverage 40.25% (37319/92720)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 61.49% (16285/26482)
Line Coverage 51.20% (154592/301910)
Region Coverage 48.68% (88720/182241)
Branch Coverage 42.32% (44643/105492)

@kaijchen kaijchen closed this Jun 25, 2025
@kaijchen kaijchen changed the title [config](load) reduce default write_buffer_size_for_agg to 100MB [config](load) improve in-memory aggregation triggering for AGG KEY tables Jun 25, 2025
@kaijchen kaijchen reopened this Jun 25, 2025
@hello-stephen
Copy link
Contributor

BE UT Coverage Report

Increment line coverage 50.00% (1/2) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 57.04% (15359/26927)
Line Coverage 46.13% (139398/302176)
Region Coverage 45.47% (70637/155365)
Branch Coverage 40.22% (37309/92762)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 50.00% (1/2) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 62.60% (16582/26489)
Line Coverage 52.42% (158303/301988)
Region Coverage 49.97% (91109/182334)
Branch Coverage 43.54% (45952/105548)

@kaijchen
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 50.00% (1/2) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 57.05% (15365/26932)
Line Coverage 46.14% (139416/302164)
Region Coverage 45.47% (70640/155364)
Branch Coverage 40.23% (37316/92760)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 50.00% (1/2) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 62.60% (16582/26489)
Line Coverage 52.42% (158303/301988)
Region Coverage 49.97% (91109/182334)
Branch Coverage 43.54% (45952/105548)

@kaijchen
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 50.00% (1/2) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 57.05% (15365/26932)
Line Coverage 46.14% (139418/302164)
Region Coverage 45.46% (70636/155364)
Branch Coverage 40.23% (37313/92760)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants