-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Enhancement]Optimize volcano end-to-end scheduling large-scale pod performance #3852
Comments
/good-first-issue |
@JesseStutler: Please ensure the request meets the requirements listed here. If this request no longer meets these requirements, the label can be removed In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Some previous performance issus: #3502 |
Hi, newbie to volcano community, I'd like to make my first contribution to get my feet wet in the community, and I'm looking at the first one, which hopes change the logic in the |
I didn't find the logic of predicateCacheStatus := make([]*api.Status, 0)
if predicate.cacheEnable {
fit, err = pCache.PredicateWithCache(node.Name, task.Pod)
if err != nil {
predicateCacheStatus, fit, _ = predicateByStablefilter(nodeInfo)
pCache.UpdateCache(node.Name, task.Pod, fit)
} else {
if !fit {
err = fmt.Errorf("plugin equivalence cache predicates failed")
predicateCacheStatus = append(predicateCacheStatus, &api.Status{
Code: api.Error, Reason: err.Error(), Plugin: CachePredicate,
})
}
}
} else {
predicateCacheStatus, fit, _ = predicateByStablefilter(nodeInfo)
}
predicateStatus = append(predicateStatus, predicateCacheStatus...)
if !fit {
return api.NewFitErrWithStatus(task, node, predicateStatus...)
} Basd on the code, could you pls explain how can I use PredicateCache if the podTemplate is the same instead of checking the predicateByStablefilter fails? or help me understand it correctly. |
Yes, you can try this task! You can open a new issue and refer here when you finish your work. |
Sorry it was my fault, ignore this. But prediacte only works for vc-job pod now, you can implement it to work for deployment also. |
Is there any related or similar code can help me understand how to do and what to do, like statefulset implementation or something? |
No, you can check how the vc-job pods |
In the scenario of scheduling large-scale jobs, I also encountered a problem. When the job fails to be scheduled, all the pods under this job will update the PodCondition. Since it is necessary to communicate with the apiserver, this will take a long time.Could we consider using the multi-goroutine approach to handle this part of the logic? volcano/pkg/scheduler/cache/cache.go Line 1488 in 0966fd5
I've implemented it simply and adjusted the QPS (Queries Per Second) of the apiserver. This method has helped me reduce a lot of time consumption. The code implementation is as follows. releated PR:#3921 |
I think we can use taskGroupID("%s/%s", task.Job, task.TaskRole) to unify those cases. |
/area performance |
What is the problem you're trying to solve
Recently, I used the scripts in benchmark/sh to compare the performance of volcano and kube-scheduler under benchmark. There are several problems that I found needed to be solved
Describe the solution you'd like
apiserver_current_inflight_requests
of kube-apiserver is significantly higher when volcano schedules large-scale pods than when kube-scheduler schedules, which may cause pressure on kube-apiserver and cause network conflicts, affecting scheduling performance. After reviewing the codes, I found there two places causing the problem:Running/Failed/Succeed
states in podgroup status. If these fields of podgroups only used in kubectl to show the nums of pods in each state, we may rewrite the logic , not to persist these nums fields in podgroups, like this pr: feature: Add podgroups statistics #3751isPodGroupStatusUpdated
, because the nums of pod in Running state is always refreshing in scheduling, therefore!equality.Semantic.DeepEqual(newStatus, oldStatus)
is true, causing the condition always will be refreshed:volcano/pkg/scheduler/framework/job_updater.go
Lines 81 to 88 in 7170cca
volcano.sh/template-uid
is only patched on vc-job pods, predicateCache will firstly judge whether the annotation exists and then use the feature of predicateCache.In addition, we may need to reconstruct the logic of volcano's predicate and prioritize. Assume that there are m nodes in the cluster and n pods need to be scheduled. Currently, all m nodes need to be filtered. The time complexity is O(mn). Can we not filter all the nodes? Or after predicates some nodes, there is no need to filter all these nodes when scoring.
Additional context
No response
The text was updated successfully, but these errors were encountered: