-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow watchers impact PUT latency #18109
Comments
Please link the scalability test failure you were debugging, I'm on this SIG-scalability oncall rotation and I haven't seen any failures. Please also note that until this is proved to be a regression this cannot be treated as bug. See https://testgrid.k8s.io/sig-scalability-gce#gce-master-scale-performance for Kubernetes 5k node scalability testing results. Performance improvements to slow watchers are discussed in #16839 |
Thanks for looking into it. The cluster set up is unique which does not segregate events from the main etcd cluster. So the existing upstream GCE and AWS 5k nodes scale tests are passing now consistently. I believe once the flag I think it's worth diving into the root cause of performance degradation as it would still be helpful when etcd write qps / throughput needs to be lifted up in the future. For example, increasing the default work queue concurrent sync numbers in kube-controller-manager as it would create more mutating requests to etcd iiuc.
Agreed +1 That's not a bug but a report in which circumstances etcd mutating requests could take more than 1s instead. I would come up with a benchmark test to simulate the traffic with etcd test framework only. |
@serathius Yes, these tests are run internally not in upstream in a different setup mode ^^^, so which is why you wouldn't be seeing them on The meta point here is to improve the performance/throughput so we can stretch the single cluster bit more than what it can do today. |
I expect this is an issue of events not using watch cache, however I don't see a reason to invest into an area that K8s has a official mitigation. If you want to remove |
Just to clarify, remove We are trying to figure out the root cause how etcd handling mutating requests becomes slower than 1s with high write qps/throughput. It just happened on events and it could also happen on other resources / key prefixes. |
Right, the reason is simple, the event resource has watch cache disabled, meaning it is still vulnerable to kubernetes/kubernetes#123448. If your are not sharding K8s events out, they pollute your other resources. |
Enabling watch cache only protects from N direct etcd watches polluting other resources. Problem identified in this investigation is there is only 1 direct etcd watch on events and with enough write events throughput, it pollute other resources. Hence the following statement was raised.
I think the debate could be closed with a reproduce because it's easier for us to understand our arguments. |
@serathius could you please take a look at the reproduce #18121 since the work is based on your watch latency perf benchmark tool? Edit: This is not a proper reproduce of the issue we have seen in the clusterloader2 test. Thanks!! |
I don't think there is anything surprising about slow watchers impacting PUT, etcd notifies synchronized watchers as part of committing transaction in apply loop. I don't we change it anytime soon as it's an assumption very ingrained in the code. |
Bug report criteria
What happened?
Debugging a k8s scale test failures and found that applying mutation requests (write transaction) could be delayed up to 1 - 2 seconds, which breaches the upstream SLO in the clusterloader2 test SLO measurement.
cc @hakuna-matatah
The slow
end transaction
step was caused by thewatchableStore.mutex
lock was accquired by thesyncWatchers
process.What did you expect to happen?
I would like the
syncWatcher
to be completed within100ms
and not holding the lock too long.How can we reproduce it (as minimally and precisely as possible)?
I can work on a new benchmark cmd to simulate. As long as put enough writes (qps and throughput) to etcd and have a watch established in this key prefix, the reproduce could be archived.
Anything else we need to know?
Exploring options
Option 1 and 2 is helpful to cut down the mutating requests latency to 0.2s with the same k8s scale test repeated runs. Option 3 is not.
Etcd version (please run commands below)
All supported etcd versions
Etcd configuration (command line flags or environment variables)
paste your configuration here
Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)
Relevant log output
No response
The text was updated successfully, but these errors were encountered: