Avoid infinite and unecessary loop when CN RPC processors are killed/interrupted by OS #12584
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
We found that in some scenarios, when the
stop-confignode.sh
command is executed, confignode does not exit immediately, but hangs for about 90 seconds, printing more than 3 million "Unexpected interruption during waiting for configNode leader ready." warning logs.This problem is because in the code modified by this PR, the RPC thread of confignode will continue to sleep in the while loop. When the confignode process is killed, the OS will continuously try to interrupt the sleep thread, causing the sleep thread to be continuously interrupted and print logs in the while loop.
So we check the confignode for all threads that are sleeping in the while loop. If these threads will only be interrupted during shutdown during their life cycle, then they need to jump out of the while loop when they are interrupted to avoid infinite log printing and stop-confignode lags.
for detailed info, see https://jira.infra.timecho.com:8443/browse/TIMECHODB-750