Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Improvement](fragment) Use partitioned hash map to manage contexts #46235

Merged
merged 2 commits into from
Jan 2, 2025

Conversation

Gabriel39
Copy link
Contributor

@Gabriel39 Gabriel39 commented Dec 31, 2024

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Contexts in fragment_mgr are managed by a global map and accessed by multiple threads concurrently with a global lock. It introduced a obvious overhead. To solve it , this PR use a partitioned hash table to optimize the global lock.

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@Gabriel39
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 38.91% (10131/26039)
Line Coverage: 29.91% (85645/286374)
Region Coverage: 29.02% (43735/150681)
Branch Coverage: 25.55% (22305/87298)
Coverage Report: http://coverage.selectdb-in.cc/coverage/2f49d245699dbe558b3aab6215f6d06b32366dc0_2f49d245699dbe558b3aab6215f6d06b32366dc0/report/index.html

be/src/runtime/fragment_mgr.cpp Outdated Show resolved Hide resolved
be/src/runtime/fragment_mgr.cpp Outdated Show resolved Hide resolved
be/src/runtime/fragment_mgr.cpp Outdated Show resolved Hide resolved
@Gabriel39
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 38.90% (10128/26039)
Line Coverage: 29.91% (85652/286374)
Region Coverage: 29.02% (43725/150681)
Branch Coverage: 25.54% (22300/87298)
Coverage Report: http://coverage.selectdb-in.cc/coverage/7a72f30aa3642a2db6811ee274df1caf11b894b7_7a72f30aa3642a2db6811ee274df1caf11b894b7/report/index.html

@Gabriel39
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 32753 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 5dc756c35eebb1fe4c3b9673e32d81fdd3e9bd07, data reload: false

------ Round 1 ----------------------------------
q1	17563	6125	6032	6032
q2	2044	304	161	161
q3	10426	1244	766	766
q4	10219	867	448	448
q5	7564	2235	1981	1981
q6	217	184	146	146
q7	895	761	611	611
q8	9246	1367	1196	1196
q9	5346	4943	4988	4943
q10	6755	2329	1875	1875
q11	485	283	261	261
q12	342	355	228	228
q13	17763	3623	2947	2947
q14	225	236	220	220
q15	581	516	488	488
q16	632	628	594	594
q17	590	862	326	326
q18	7104	6483	6486	6483
q19	2321	971	578	578
q20	311	316	184	184
q21	2825	2254	1974	1974
q22	368	341	311	311
Total cold run time: 103822 ms
Total hot run time: 32753 ms

----- Round 2, with runtime_filter_mode=off -----
q1	6374	6213	6207	6207
q2	241	336	245	245
q3	2266	2689	2313	2313
q4	1379	1807	1356	1356
q5	4382	4778	4815	4778
q6	190	181	148	148
q7	2092	2019	1822	1822
q8	2705	2830	2685	2685
q9	7350	7344	7298	7298
q10	3100	3368	2776	2776
q11	595	512	490	490
q12	654	756	630	630
q13	3325	3748	3087	3087
q14	288	292	294	292
q15	559	516	533	516
q16	680	682	633	633
q17	1249	1761	1275	1275
q18	7766	7535	7295	7295
q19	851	1183	1084	1084
q20	1954	2026	1933	1933
q21	5789	5399	4952	4952
q22	637	603	583	583
Total cold run time: 54426 ms
Total hot run time: 52398 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 196667 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 5dc756c35eebb1fe4c3b9673e32d81fdd3e9bd07, data reload: false

query1	1284	957	945	945
query2	6492	2343	2243	2243
query3	10986	4528	4577	4528
query4	32876	24274	23471	23471
query5	4220	630	502	502
query6	300	216	204	204
query7	3980	509	315	315
query8	300	248	237	237
query9	9212	2646	2648	2646
query10	436	312	248	248
query11	18330	15467	15179	15179
query12	157	108	101	101
query13	1579	542	413	413
query14	9948	6707	7990	6707
query15	304	222	198	198
query16	8478	585	426	426
query17	1521	798	609	609
query18	2130	418	318	318
query19	193	187	179	179
query20	130	117	122	117
query21	205	122	109	109
query22	4856	4617	4490	4490
query23	34407	33630	33780	33630
query24	6495	2304	2377	2304
query25	505	464	414	414
query26	860	288	153	153
query27	2189	472	345	345
query28	5767	2462	2453	2453
query29	667	562	438	438
query30	210	182	150	150
query31	988	952	860	860
query32	94	65	62	62
query33	497	385	314	314
query34	789	866	555	555
query35	843	850	768	768
query36	1023	1061	971	971
query37	120	106	83	83
query38	4238	4327	4211	4211
query39	1591	1449	1481	1449
query40	216	118	105	105
query41	48	50	43	43
query42	129	113	101	101
query43	526	539	504	504
query44	1411	864	838	838
query45	192	184	168	168
query46	905	1053	664	664
query47	2025	1978	1934	1934
query48	402	445	319	319
query49	728	525	407	407
query50	685	676	414	414
query51	7338	7377	7262	7262
query52	109	109	98	98
query53	241	273	191	191
query54	498	500	430	430
query55	85	76	82	76
query56	271	251	256	251
query57	1281	1236	1173	1173
query58	241	235	233	233
query59	3212	3411	3303	3303
query60	285	272	266	266
query61	114	106	108	106
query62	890	806	748	748
query63	245	206	205	205
query64	3608	1005	656	656
query65	3392	3294	3278	3278
query66	792	411	307	307
query67	16273	15640	15460	15460
query68	8589	847	552	552
query69	488	308	266	266
query70	1199	1171	1172	1171
query71	430	291	267	267
query72	6330	3796	3836	3796
query73	660	776	375	375
query74	10117	9034	8765	8765
query75	3990	3144	2667	2667
query76	3610	1196	799	799
query77	758	404	304	304
query78	10147	10135	9535	9535
query79	3831	932	594	594
query80	724	532	434	434
query81	519	266	229	229
query82	685	154	120	120
query83	199	165	144	144
query84	279	101	77	77
query85	785	387	303	303
query86	355	324	305	305
query87	4711	4601	4502	4502
query88	5082	2276	2237	2237
query89	451	336	298	298
query90	1915	191	189	189
query91	146	147	105	105
query92	67	57	54	54
query93	2964	892	529	529
query94	683	413	282	282
query95	337	274	252	252
query96	486	612	283	283
query97	2766	2843	2713	2713
query98	218	203	193	193
query99	1732	1521	1442	1442
Total cold run time: 300595 ms
Total hot run time: 196667 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.19 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 5dc756c35eebb1fe4c3b9673e32d81fdd3e9bd07, data reload: false

query1	0.04	0.04	0.03
query2	0.07	0.03	0.04
query3	0.24	0.07	0.06
query4	1.61	0.10	0.10
query5	0.45	0.41	0.47
query6	1.16	0.64	0.64
query7	0.02	0.02	0.02
query8	0.04	0.04	0.03
query9	0.57	0.50	0.51
query10	0.55	0.56	0.55
query11	0.15	0.11	0.10
query12	0.14	0.11	0.11
query13	0.60	0.59	0.60
query14	2.83	2.76	2.77
query15	0.89	0.82	0.82
query16	0.38	0.37	0.39
query17	1.00	1.06	1.05
query18	0.23	0.20	0.20
query19	1.85	1.78	1.97
query20	0.02	0.01	0.01
query21	15.37	0.91	0.57
query22	0.77	0.76	0.63
query23	15.30	1.44	0.62
query24	2.61	1.00	0.92
query25	0.22	0.13	0.18
query26	0.36	0.16	0.14
query27	0.07	0.05	0.05
query28	13.71	1.56	1.04
query29	12.59	3.88	3.27
query30	0.25	0.10	0.06
query31	2.82	0.62	0.37
query32	3.23	0.54	0.46
query33	3.19	3.07	3.09
query34	16.71	5.09	4.44
query35	4.49	4.56	4.48
query36	0.64	0.48	0.47
query37	0.10	0.06	0.06
query38	0.04	0.04	0.04
query39	0.04	0.02	0.02
query40	0.16	0.13	0.12
query41	0.08	0.03	0.02
query42	0.03	0.02	0.02
query43	0.04	0.03	0.03
Total cold run time: 105.66 s
Total hot run time: 31.19 s

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 38.92% (10133/26038)
Line Coverage: 29.92% (85678/286361)
Region Coverage: 29.03% (43727/150641)
Branch Coverage: 25.57% (22320/87304)
Coverage Report: http://coverage.selectdb-in.cc/coverage/5dc756c35eebb1fe4c3b9673e32d81fdd3e9bd07_5dc756c35eebb1fe4c3b9673e32d81fdd3e9bd07/report/index.html

Copy link
Contributor

@HappenLee HappenLee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jan 2, 2025
Copy link
Contributor

github-actions bot commented Jan 2, 2025

PR approved by at least one committer and no changes requested.

Copy link
Contributor

github-actions bot commented Jan 2, 2025

PR approved by anyone and no changes requested.

@Gabriel39 Gabriel39 merged commit 518faa3 into apache:master Jan 2, 2025
25 of 27 checks passed
Gabriel39 added a commit to Gabriel39/incubator-doris that referenced this pull request Jan 2, 2025
…pache#46235)

Contexts in `fragment_mgr` are managed by a global map and accessed by
multiple threads concurrently with a global lock. It introduced a
obvious overhead. To solve it , this PR use a partitioned hash table to
optimize the global lock.
Gabriel39 added a commit to Gabriel39/incubator-doris that referenced this pull request Jan 3, 2025
…pache#46235)

Contexts in `fragment_mgr` are managed by a global map and accessed by
multiple threads concurrently with a global lock. It introduced a
obvious overhead. To solve it , this PR use a partitioned hash table to
optimize the global lock.
Gabriel39 added a commit that referenced this pull request Jan 3, 2025
#46282)

…#46235)

Contexts in `fragment_mgr` are managed by a global map and accessed by
multiple threads concurrently with a global lock. It introduced a
obvious overhead. To solve it , this PR use a partitioned hash table to
optimize the global lock.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/2.1.8-merged dev/3.0.4-merged reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants