-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[YUNIKORN-2037] Testing the throughput of Yunikorn #367
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wusamzong could you rename the JIRA and the PR to sth like "Testing the performance of Yunikorn" or "Testing the throughput of Yunikorn"? The phrase "performance of throughput" sounds weird and does not make sense.
Sure! Thank you for your correction |
I wonder what is the performance about 100 applications, 500 applications, and so. |
Througput is definitely based on apps/pods ratio, queue hiearchy, etc. Things like failed headroom checks and pending pods slow things down. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wusamzong
One question about plugin mode, if we compare plugin mode with preEnqueue and without preEnqueue, and preEnqueue is applied to newer k8s version.
@zhuqi-lucas Sorry for my delayed response. In my experiment, I observed no significant difference between using the plugin mode with preEnqueue and without preEnqueue. The experiment was conducted on Kubernetes version v1.27.8 & YuniKorn 1.4.0, deploying 5 applications, each with 1000 pods. The throughput of the YuniKorn plugin with PreEnqueue was 50 pods/sec, while the throughput without PreEnqueue was marginally higher at 50.01 pods/sec. Attached is a flame graph that compares the performance of YuniKorn in standard mode, plugin mode with PreEnqueue, and plugin mode without PreEnqueue. |
Thanks @wusamzong for info, it makes sense to me, because we use preEnqueue to improve autoscaling case when we make pod unschedulable we support not going to scheduling cycle with preEnqueue plugin, but our performance testing case will not reach the autoscaling part, it's enough for this case. |
@wusamzong could you upload the binary files so I can examine them? Thanks. |
Okay, thank you for clarifying the outcomes of this experiment. I now have a much better understanding of the plugin aspect. |
Thanks. Standard mode looks pretty solid. I haven't seen anything that stands out and needs immediate addressing. |
Is there anything that is left to do on this or should we add this as is to show the current state? |
I'll make an effort to upload the remaining test cases as soon as possible! |
@wusamzong is there a docs update which explains how to run these tests? It's OK to see the results, but what if I want to run this locally on my machine? |
Certainly! I plan to submit a PR to k8shim, adding the scripts used for the performance test to the yunikorn-k8shim/deployments/kwok-perf-test. Along with this, I'll include a readme that provides instructions on how to utilize these scripts. |
Ok, thanks, this PR looks good. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
[YUNIKORN-2037] Document performance using kwok (apache#367)
What is this PR for?
Capture the outcomes of KWOK-based throughput tests, while details about the tools and examples employed will be found in separate PRs. Upon acceptance of the tools PR, a link to the tool will be integrated into this document, accompanied by additional tutorials on KWOK environment building.
PR links- lightweight monitor tool, example&Kwok-setup-tool
Todos
What is the Jira issue?
How should this be tested?
Screenshots (if appropriate)
Questions: