Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[refactor](pipelineX) refine union dependency #27348

Merged
merged 1 commit into from
Nov 23, 2023

Conversation

Mryange
Copy link
Contributor

@Mryange Mryange commented Nov 21, 2023

Proposed changes

Issue Number: close #xxx

Further comments

If this is a relatively large or complex change, kick off the discussion at [email protected] by explaining why you chose the solution you did and what alternatives you considered, etc...

@Mryange
Copy link
Contributor Author

Mryange commented Nov 21, 2023

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@Mryange
Copy link
Contributor Author

Mryange commented Nov 21, 2023

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 44.4 seconds
stream load tsv: 572 seconds loaded 74807831229 Bytes, about 124 MB/s
stream load json: 18 seconds loaded 2358488459 Bytes, about 124 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 34 seconds loaded 861443392 Bytes, about 24 MB/s
insert into select: 28.5 seconds inserted 10000000 Rows, about 350K ops/s
storage size: 17097432746 Bytes

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 49.93 seconds
stream load tsv: 576 seconds loaded 74807831229 Bytes, about 123 MB/s
stream load json: 18 seconds loaded 2358488459 Bytes, about 124 MB/s
stream load orc: 67 seconds loaded 1101869774 Bytes, about 15 MB/s
stream load parquet: 34 seconds loaded 861443392 Bytes, about 24 MB/s
insert into select: 32.4 seconds inserted 10000000 Rows, about 308K ops/s
storage size: 17099208099 Bytes

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 36.54% (8448/23123)
Line Coverage: 28.87% (68680/237932)
Region Coverage: 27.83% (35515/127629)
Branch Coverage: 24.59% (18115/73678)
Coverage Report: http://coverage.selectdb-in.cc/coverage/1727928a2c8346608e52fcee917fb6e6d63789da_1727928a2c8346608e52fcee917fb6e6d63789da/report/index.html

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 36.54% (8448/23123)
Line Coverage: 28.86% (68675/237932)
Region Coverage: 27.82% (35510/127629)
Branch Coverage: 24.57% (18104/73678)
Coverage Report: http://coverage.selectdb-in.cc/coverage/ae60cc8526e755fcc3b4798c8ff86d80f4b0ff34_ae60cc8526e755fcc3b4798c8ff86d80f4b0ff34/report/index.html

@Mryange
Copy link
Contributor Author

Mryange commented Nov 21, 2023

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
Tpch sf100 test result on commit 3775a4d0c87c4c42418d082eec6b139501c0d043, data reload: false

run tpch-sf100 query with default conf and session variables
q1	4893	4666	4680	4666
q2	362	163	158	158
q3	2028	1925	1870	1870
q4	1383	1312	1265	1265
q5	3989	3928	4018	3928
q6	245	127	133	127
q7	1442	887	896	887
q8	2754	2778	2766	2766
q9	9686	9600	9587	9587
q10	3495	3525	3537	3525
q11	383	242	238	238
q12	440	297	296	296
q13	4574	3807	3817	3807
q14	309	300	291	291
q15	577	549	534	534
q16	663	594	578	578
q17	1133	941	913	913
q18	7779	7403	7318	7318
q19	1652	1693	1670	1670
q20	536	319	300	300
q21	4343	3914	3973	3914
q22	480	371	376	371
Total cold run time: 53146 ms
Total hot run time: 49009 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4592	4604	4560	4560
q2	365	224	275	224
q3	4041	4014	4010	4010
q4	2694	2689	2690	2689
q5	9754	9698	9750	9698
q6	239	124	124	124
q7	3003	2476	2519	2476
q8	4467	4415	4437	4415
q9	13218	13092	13138	13092
q10	4106	4195	4182	4182
q11	813	661	645	645
q12	981	809	824	809
q13	4318	3598	3596	3596
q14	387	342	360	342
q15	573	519	521	519
q16	736	663	700	663
q17	3930	3959	3925	3925
q18	9549	8891	8928	8891
q19	1820	1775	1776	1775
q20	2384	2078	2069	2069
q21	8863	8516	8750	8516
q22	889	817	821	817
Total cold run time: 81722 ms
Total hot run time: 78037 ms

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 46.02 seconds
stream load tsv: 581 seconds loaded 74807831229 Bytes, about 122 MB/s
stream load json: 18 seconds loaded 2358488459 Bytes, about 124 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 29.7 seconds inserted 10000000 Rows, about 336K ops/s
storage size: 17100698723 Bytes

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 36.54% (8448/23122)
Line Coverage: 28.86% (68671/237946)
Region Coverage: 27.82% (35507/127641)
Branch Coverage: 24.57% (18107/73690)
Coverage Report: http://coverage.selectdb-in.cc/coverage/3775a4d0c87c4c42418d082eec6b139501c0d043_3775a4d0c87c4c42418d082eec6b139501c0d043/report/index.html

@Mryange
Copy link
Contributor Author

Mryange commented Nov 21, 2023

run p0

@Mryange
Copy link
Contributor Author

Mryange commented Nov 21, 2023

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 36.54% (8448/23123)
Line Coverage: 28.86% (68688/237992)
Region Coverage: 27.83% (35526/127665)
Branch Coverage: 24.58% (18117/73708)
Coverage Report: http://coverage.selectdb-in.cc/coverage/e93ce01f7138626ddbf417a1ea0cb098872a3fdd_e93ce01f7138626ddbf417a1ea0cb098872a3fdd/report/index.html

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
Tpch sf100 test result on commit e93ce01f7138626ddbf417a1ea0cb098872a3fdd, data reload: false

run tpch-sf100 query with default conf and session variables
q1	4879	4668	4651	4651
q2	360	153	160	153
q3	2019	1925	1930	1925
q4	1400	1308	1260	1260
q5	3967	3962	4055	3962
q6	247	133	131	131
q7	1415	894	900	894
q8	2723	2766	2755	2755
q9	9974	9623	9569	9569
q10	3475	3531	3525	3525
q11	373	252	251	251
q12	445	289	296	289
q13	4598	3831	3831	3831
q14	330	287	286	286
q15	589	531	516	516
q16	659	588	583	583
q17	1122	958	943	943
q18	7801	7363	7273	7273
q19	1669	1674	1683	1674
q20	608	325	317	317
q21	4409	3965	3963	3963
q22	479	377	365	365
Total cold run time: 53541 ms
Total hot run time: 49116 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4584	4576	4585	4576
q2	356	218	268	218
q3	4019	3993	3985	3985
q4	2751	2682	2695	2682
q5	9711	9736	9649	9649
q6	243	127	126	126
q7	3015	2483	2491	2483
q8	4393	4431	4462	4431
q9	13271	13112	13140	13112
q10	4116	4196	4206	4196
q11	787	646	681	646
q12	965	810	811	810
q13	4295	3588	3541	3541
q14	378	346	342	342
q15	580	520	518	518
q16	739	663	670	663
q17	3822	3909	3805	3805
q18	9566	9029	8910	8910
q19	1804	1770	1788	1770
q20	2379	2070	2039	2039
q21	8782	8603	8672	8603
q22	903	785	810	785
Total cold run time: 81459 ms
Total hot run time: 77890 ms

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 46.02 seconds
stream load tsv: 582 seconds loaded 74807831229 Bytes, about 122 MB/s
stream load json: 18 seconds loaded 2358488459 Bytes, about 124 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 28.8 seconds inserted 10000000 Rows, about 347K ops/s
storage size: 17100016472 Bytes

@Mryange
Copy link
Contributor Author

Mryange commented Nov 22, 2023

run pipelinex_p0

@Mryange
Copy link
Contributor Author

Mryange commented Nov 22, 2023

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@Mryange Mryange changed the title [refactor](pipelineX) refine union read dependency [refactor](pipelineX) refine union dependency Nov 22, 2023
@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 36.56% (8449/23110)
Line Coverage: 28.86% (68711/238097)
Region Coverage: 27.83% (35537/127701)
Branch Coverage: 24.57% (18120/73748)
Coverage Report: http://coverage.selectdb-in.cc/coverage/ad06accd50350571aa3d6d7a16b457b726b4c7eb_ad06accd50350571aa3d6d7a16b457b726b4c7eb/report/index.html

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
Tpch sf100 test result on commit ad06accd50350571aa3d6d7a16b457b726b4c7eb, data reload: false

run tpch-sf100 query with default conf and session variables
q1	4896	4661	4636	4636
q2	369	148	157	148
q3	2043	1912	1889	1889
q4	1408	1274	1249	1249
q5	3972	3970	4015	3970
q6	246	129	134	129
q7	1418	894	895	894
q8	2729	2760	2751	2751
q9	9700	9611	9492	9492
q10	3497	3525	3548	3525
q11	376	252	240	240
q12	438	286	296	286
q13	4548	3814	3817	3814
q14	323	285	296	285
q15	581	528	517	517
q16	665	573	586	573
q17	1127	955	918	918
q18	7783	7458	7383	7383
q19	1653	1673	1675	1673
q20	538	294	294	294
q21	4311	3950	3917	3917
q22	484	370	365	365
Total cold run time: 53105 ms
Total hot run time: 48948 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4564	4578	4554	4554
q2	348	232	268	232
q3	4003	3970	3998	3970
q4	2690	2677	2678	2677
q5	9517	9526	9554	9526
q6	243	122	126	122
q7	3020	2482	2501	2482
q8	4432	4455	4439	4439
q9	13133	13051	13116	13051
q10	4146	4195	4217	4195
q11	769	663	650	650
q12	993	831	806	806
q13	4278	3579	3598	3579
q14	390	338	335	335
q15	578	518	522	518
q16	729	703	672	672
q17	3895	3837	3953	3837
q18	9573	9184	9047	9047
q19	1806	1769	1764	1764
q20	2400	2074	2044	2044
q21	8838	8526	8551	8526
q22	870	825	762	762
Total cold run time: 81215 ms
Total hot run time: 77788 ms

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 45.18 seconds
stream load tsv: 567 seconds loaded 74807831229 Bytes, about 125 MB/s
stream load json: 18 seconds loaded 2358488459 Bytes, about 124 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 33 seconds loaded 861443392 Bytes, about 24 MB/s
insert into select: 28.1 seconds inserted 10000000 Rows, about 355K ops/s
storage size: 17100742381 Bytes

@Mryange
Copy link
Contributor Author

Mryange commented Nov 22, 2023

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 36.55% (8448/23111)
Line Coverage: 28.85% (68697/238101)
Region Coverage: 27.82% (35522/127705)
Branch Coverage: 24.56% (18113/73750)
Coverage Report: http://coverage.selectdb-in.cc/coverage/f9c42a07dc959eb63c28a9abe7f25592801492fe_f9c42a07dc959eb63c28a9abe7f25592801492fe/report/index.html

return nullptr;
}
return this;
return Dependency::is_blocked_by(task);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not to implement is_blocked_by

@@ -62,7 +62,13 @@ class DataQueue {
bool data_exhausted() const { return _data_exhausted; }
void set_dependency(Dependency* source_dependency, Dependency* sink_dependency) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete this function

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 45.02 seconds
stream load tsv: 566 seconds loaded 74807831229 Bytes, about 126 MB/s
stream load json: 18 seconds loaded 2358488459 Bytes, about 124 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 28.6 seconds inserted 10000000 Rows, about 349K ops/s
storage size: 17098569897 Bytes

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
Tpch sf100 test result on commit f9c42a07dc959eb63c28a9abe7f25592801492fe, data reload: false

run tpch-sf100 query with default conf and session variables
q1	4883	4669	4689	4669
q2	376	179	159	159
q3	2017	1877	1916	1877
q4	1382	1275	1268	1268
q5	3968	3908	4000	3908
q6	246	125	129	125
q7	1433	877	911	877
q8	2733	2768	2745	2745
q9	9782	9574	9544	9544
q10	3457	3512	3518	3512
q11	392	245	246	245
q12	432	288	288	288
q13	4558	3790	3774	3774
q14	315	290	292	290
q15	573	533	521	521
q16	662	586	579	579
q17	1124	949	951	949
q18	7895	7433	7443	7433
q19	1646	1675	1668	1668
q20	542	296	314	296
q21	4371	3892	3967	3892
q22	468	371	370	370
Total cold run time: 53255 ms
Total hot run time: 48989 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4599	4565	4561	4561
q2	336	228	253	228
q3	4011	3999	3974	3974
q4	2690	2683	2678	2678
q5	9659	9625	9657	9625
q6	242	122	124	122
q7	2995	2474	2466	2466
q8	4470	4460	4485	4460
q9	13101	13032	13088	13032
q10	4107	4188	4205	4188
q11	762	682	642	642
q12	975	810	808	808
q13	4267	3567	3591	3567
q14	396	347	346	346
q15	568	518	517	517
q16	734	669	675	669
q17	3889	3887	3879	3879
q18	9498	9054	9033	9033
q19	1755	1755	1771	1755
q20	2425	2066	2038	2038
q21	8718	8570	8317	8317
q22	891	806	862	806
Total cold run time: 81088 ms
Total hot run time: 77711 ms

@Mryange
Copy link
Contributor Author

Mryange commented Nov 22, 2023

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 36.57% (8450/23108)
Line Coverage: 28.86% (68719/238103)
Region Coverage: 27.83% (35538/127708)
Branch Coverage: 24.57% (18122/73756)
Coverage Report: http://coverage.selectdb-in.cc/coverage/e56fee67e6d016230014aeadea876437f56583c1_e56fee67e6d016230014aeadea876437f56583c1/report/index.html

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
Tpch sf100 test result on commit e56fee67e6d016230014aeadea876437f56583c1, data reload: false

run tpch-sf100 query with default conf and session variables
q1	4898	4654	4679	4654
q2	358	158	164	158
q3	2030	1885	1926	1885
q4	1378	1241	1219	1219
q5	3906	3936	3981	3936
q6	246	127	132	127
q7	1416	888	902	888
q8	2740	2750	2745	2745
q9	9828	9582	9583	9582
q10	3466	3512	3533	3512
q11	379	250	239	239
q12	437	293	300	293
q13	4548	3841	3817	3817
q14	321	291	287	287
q15	576	541	523	523
q16	665	572	579	572
q17	1124	944	903	903
q18	7927	7329	7393	7329
q19	1661	1656	1672	1656
q20	535	313	293	293
q21	4357	3968	3986	3968
q22	482	371	380	371
Total cold run time: 53278 ms
Total hot run time: 48957 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4593	4550	4565	4550
q2	345	213	251	213
q3	3989	3993	3986	3986
q4	2713	2681	2693	2681
q5	9747	9670	9654	9654
q6	239	122	125	122
q7	2997	2491	2506	2491
q8	4426	4473	4474	4473
q9	13219	13027	13090	13027
q10	4135	4179	4180	4179
q11	835	721	674	674
q12	979	808	813	808
q13	4285	3559	3590	3559
q14	374	342	350	342
q15	586	538	527	527
q16	731	670	707	670
q17	3893	3920	3863	3863
q18	9756	8940	8986	8940
q19	1783	1783	1767	1767
q20	2410	2055	2052	2052
q21	8881	8695	8449	8449
q22	918	810	868	810
Total cold run time: 81834 ms
Total hot run time: 77837 ms

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 43.3 seconds
stream load tsv: 568 seconds loaded 74807831229 Bytes, about 125 MB/s
stream load json: 18 seconds loaded 2358488459 Bytes, about 124 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 29.2 seconds inserted 10000000 Rows, about 342K ops/s
storage size: 17098257651 Bytes

@Mryange
Copy link
Contributor Author

Mryange commented Nov 23, 2023

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 36.57% (8449/23104)
Line Coverage: 28.86% (68675/237950)
Region Coverage: 27.84% (35527/127629)
Branch Coverage: 24.57% (18122/73766)
Coverage Report: http://coverage.selectdb-in.cc/coverage/39bb604c6dceac48f73aab42b51367edfdba2c58_39bb604c6dceac48f73aab42b51367edfdba2c58/report/index.html

@Mryange
Copy link
Contributor Author

Mryange commented Nov 23, 2023

run pipelinex_p0

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
Tpch sf100 test result on commit 39bb604c6dceac48f73aab42b51367edfdba2c58, data reload: false

run tpch-sf100 query with default conf and session variables
q1	4905	4707	4660	4660
q2	356	158	157	157
q3	2026	1879	1885	1879
q4	1379	1250	1260	1250
q5	3987	3942	4026	3942
q6	251	131	132	131
q7	1446	890	881	881
q8	2805	2814	2785	2785
q9	9746	9830	9757	9757
q10	3452	3531	3542	3531
q11	384	252	246	246
q12	440	293	290	290
q13	4591	3820	3812	3812
q14	321	285	292	285
q15	583	529	521	521
q16	664	586	580	580
q17	1150	965	953	953
q18	7841	7355	7508	7355
q19	1686	1655	1688	1655
q20	582	316	284	284
q21	4404	3988	4028	3988
q22	475	372	374	372
Total cold run time: 53474 ms
Total hot run time: 49314 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4598	4578	4597	4578
q2	346	228	274	228
q3	4007	4006	4020	4006
q4	2696	2690	2691	2690
q5	9688	9595	9733	9595
q6	250	121	121	121
q7	3007	2487	2444	2444
q8	4459	4450	4465	4450
q9	12979	12902	12802	12802
q10	4108	4176	4169	4169
q11	744	679	660	660
q12	981	812	817	812
q13	4270	3556	3581	3556
q14	374	339	351	339
q15	574	522	524	522
q16	732	661	673	661
q17	3888	3861	3918	3861
q18	9739	9117	9082	9082
q19	1878	1766	1788	1766
q20	2412	2085	2071	2071
q21	8898	8772	8465	8465
q22	896	768	765	765
Total cold run time: 81524 ms
Total hot run time: 77643 ms

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 43.92 seconds
stream load tsv: 569 seconds loaded 74807831229 Bytes, about 125 MB/s
stream load json: 18 seconds loaded 2358488459 Bytes, about 124 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 28.8 seconds inserted 10000000 Rows, about 347K ops/s
storage size: 17099099185 Bytes

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Nov 23, 2023
Copy link
Contributor

PR approved by at least one committer and no changes requested.

Copy link
Contributor

PR approved by anyone and no changes requested.

@Gabriel39 Gabriel39 merged commit ca7dbc3 into apache:master Nov 23, 2023
26 of 28 checks passed
seawinde pushed a commit to seawinde/doris that referenced this pull request Nov 28, 2023
XuJianxu pushed a commit to XuJianxu/doris that referenced this pull request Dec 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants