[YUNIKORN-2037] Testing the throughput of Yunikorn #367

wusamzong · 2023-11-14T15:00:48Z

What is this PR for?

Capture the outcomes of KWOK-based throughput tests, while details about the tools and examples employed will be found in separate PRs. Upon acceptance of the tools PR, a link to the tool will be integrated into this document, accompanied by additional tutorials on KWOK environment building.
PR links- lightweight monitor tool, example&Kwok-setup-tool

Todos

[*] - Task

What is the Jira issue?

Open an issue on Jira https://issues.apache.org/jira/browse/YUNIKORN-2037/

How should this be tested?

Screenshots (if appropriate)

Questions:

- The licenses files need update.
- There is breaking changes for older versions.
- It needs documentation.

pbacsko

@wusamzong could you rename the JIRA and the PR to sth like "Testing the performance of Yunikorn" or "Testing the throughput of Yunikorn"? The phrase "performance of throughput" sounds weird and does not make sense.

wusamzong · 2023-11-20T06:05:52Z

@wusamzong could you rename the JIRA and the PR to sth like "Testing the performance of Yunikorn" or "Testing the throughput of Yunikorn"? The phrase "performance of throughput" sounds weird and does not make sense.

Sure! Thank you for your correction

imliuda · 2023-12-07T11:32:33Z

I wonder what is the performance about 100 applications, 500 applications, and so.

pbacsko · 2023-12-07T12:00:05Z

I wonder what is the performance about 100 applications, 500 applications, and so.

Througput is definitely based on apps/pods ratio, queue hiearchy, etc. Things like failed headroom checks and pending pods slow things down.

zhuqi-lucas

@wusamzong
One question about plugin mode, if we compare plugin mode with preEnqueue and without preEnqueue, and preEnqueue is applied to newer k8s version.

wusamzong · 2023-12-29T11:06:45Z

@zhuqi-lucas Sorry for my delayed response. In my experiment, I observed no significant difference between using the plugin mode with preEnqueue and without preEnqueue. The experiment was conducted on Kubernetes version v1.27.8 & YuniKorn 1.4.0, deploying 5 applications, each with 1000 pods. The throughput of the YuniKorn plugin with PreEnqueue was 50 pods/sec, while the throughput without PreEnqueue was marginally higher at 50.01 pods/sec. Attached is a flame graph that compares the performance of YuniKorn in standard mode, plugin mode with PreEnqueue, and plugin mode without PreEnqueue.

zhuqi-lucas · 2023-12-31T09:20:17Z

Thanks @wusamzong for info, it makes sense to me, because we use preEnqueue to improve autoscaling case when we make pod unschedulable we support not going to scheduling cycle with preEnqueue plugin, but our performance testing case will not reach the autoscaling part, it's enough for this case.

pbacsko · 2023-12-31T15:33:36Z

@wusamzong could you upload the binary files so I can examine them? Thanks.

wusamzong · 2024-01-01T12:57:35Z

@pbacsko Sure! Here is the result of the pprof.
standard-mode.samples.cpu.001.pb.gz
standard-mode.alloc_objects.alloc_space.inuse_objects.inuse_space.001.pb.gz
plugin-mode.samples.cpu.001.pb.gz
plugin-mode.alloc_objects.alloc_space.inuse_objects.inuse_space.001.pb.gz

wusamzong · 2024-01-01T13:09:11Z

Thanks @wusamzong for info, it makes sense to me, because we use preEnqueue to improve autoscaling case when we make pod unschedulable we support not going to scheduling cycle with preEnqueue plugin, but our performance testing case will not reach the autoscaling part, it's enough for this case.

Okay, thank you for clarifying the outcomes of this experiment. I now have a much better understanding of the plugin aspect.

pbacsko · 2024-01-02T00:07:24Z

@pbacsko Sure! Here is the result of the pprof. standard-mode.samples.cpu.001.pb.gz standard-mode.alloc_objects.alloc_space.inuse_objects.inuse_space.001.pb.gz plugin-mode.samples.cpu.001.pb.gz plugin-mode.alloc_objects.alloc_space.inuse_objects.inuse_space.001.pb.gz

Thanks. Standard mode looks pretty solid. I haven't seen anything that stands out and needs immediate addressing.

wilfred-s · 2024-01-25T05:56:27Z

Is there anything that is left to do on this or should we add this as is to show the current state?

wusamzong · 2024-01-26T06:11:51Z

I'll make an effort to upload the remaining test cases as soon as possible!

pbacsko · 2024-01-26T10:42:21Z

@wusamzong is there a docs update which explains how to run these tests? It's OK to see the results, but what if I want to run this locally on my machine?

wusamzong · 2024-01-26T11:36:32Z

Certainly! I plan to submit a PR to k8shim, adding the scripts used for the performance test to the yunikorn-k8shim/deployments/kwok-perf-test. Along with this, I'll include a readme that provides instructions on how to utilize these scripts.

pbacsko · 2024-01-26T11:43:27Z

Ok, thanks, this PR looks good.

pbacsko

+1

…#367)

[YUNIKORN-2037] Document performance using kwok (apache#367)

[YUNIKORN-2037] Testing the performance of Throughput

96c6c37

wusamzong self-assigned this Nov 14, 2023

pbacsko self-requested a review November 19, 2023 22:14

pbacsko requested changes Nov 19, 2023

View reviewed changes

wusamzong changed the title ~~[YUNIKORN-2037] Testing the performance of Throughput~~ [YUNIKORN-2037] Testing the throughput of Yunikorn Nov 20, 2023

pbacsko requested review from FrankYang0529, wilfred-s and zhuqi-lucas December 7, 2023 11:58

zhuqi-lucas reviewed Dec 7, 2023

View reviewed changes

wilfred-s requested review from pbacsko and zhuqi-lucas January 25, 2024 05:55

Update other test results

137d855

pbacsko approved these changes Jan 26, 2024

View reviewed changes

craigcondit closed this in fc0e860 Jan 26, 2024

github-actions bot pushed a commit that referenced this pull request Jan 26, 2024

Auto refresh: [YUNIKORN-2037] Document performance using kwok (#367)

ecb4ac8

github-actions bot pushed a commit to chia7712/yunikorn-site that referenced this pull request Jan 27, 2024

Auto refresh: [YUNIKORN-2037] Document performance using kwok (apache…

a7d8b0d

…#367)

chenyulin0719 mentioned this pull request Jan 29, 2024

[YUNIKORN-2037] Document performance using kwok (#367) chenyulin0719/yunikorn-site#2

Merged

10 tasks

chenyulin0719 added a commit to chenyulin0719/yunikorn-site that referenced this pull request Jan 29, 2024

Merge pull request #2 from chenyulin0719/test-latest

cae0402

[YUNIKORN-2037] Document performance using kwok (apache#367)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[YUNIKORN-2037] Testing the throughput of Yunikorn #367

[YUNIKORN-2037] Testing the throughput of Yunikorn #367

wusamzong commented Nov 14, 2023

pbacsko left a comment

wusamzong commented Nov 20, 2023

imliuda commented Dec 7, 2023

pbacsko commented Dec 7, 2023

zhuqi-lucas left a comment

wusamzong commented Dec 29, 2023

zhuqi-lucas commented Dec 31, 2023

pbacsko commented Dec 31, 2023

wusamzong commented Jan 1, 2024

wusamzong commented Jan 1, 2024

pbacsko commented Jan 2, 2024

wilfred-s commented Jan 25, 2024

wusamzong commented Jan 26, 2024

pbacsko commented Jan 26, 2024

wusamzong commented Jan 26, 2024

pbacsko commented Jan 26, 2024

pbacsko left a comment

[YUNIKORN-2037] Testing the throughput of Yunikorn #367

[YUNIKORN-2037] Testing the throughput of Yunikorn #367

Conversation

wusamzong commented Nov 14, 2023

What is this PR for?

Todos

What is the Jira issue?

How should this be tested?

Screenshots (if appropriate)

Questions:

pbacsko left a comment

Choose a reason for hiding this comment

wusamzong commented Nov 20, 2023

imliuda commented Dec 7, 2023

pbacsko commented Dec 7, 2023

zhuqi-lucas left a comment

Choose a reason for hiding this comment

wusamzong commented Dec 29, 2023

zhuqi-lucas commented Dec 31, 2023

pbacsko commented Dec 31, 2023

wusamzong commented Jan 1, 2024

wusamzong commented Jan 1, 2024

pbacsko commented Jan 2, 2024

wilfred-s commented Jan 25, 2024

wusamzong commented Jan 26, 2024

pbacsko commented Jan 26, 2024

wusamzong commented Jan 26, 2024

pbacsko commented Jan 26, 2024

pbacsko left a comment

Choose a reason for hiding this comment