fix: race when bumping items while loading a snapshot #4564
+26
−7
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The original issue was submitted in #4497 and we supplied a fix in #4507. However, the fix ignored that the
RdbLoader::Load()
function is run per flow/shard thread and the "poison pill" of updating the loading state at the end ofRdbLoader::Load()
introduced a race condition:Any flow
F
that finished loading its own snapshot first (relatively to the rest of the flows) will callSetLoadInProgress(false)
on ALL shard threads. The consequence of that is that other flows are not yet done (their respective RdbLoader::Load()` is still processing) and next time the use the db slice API will start Bumping up items because now load in progress is false.The fix is to update the state after all shard flows are done and similarly to update all shard flow
before
we start theLoad()
which shall provide a consistent state/view among all shard threads.Should resolve #4554
P.s. we might be able to simplify the new db slice state via the global loading state. That's something I will need to follow but I won't do this as part of this PR.