Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: race when bumping items while loading a snapshot #4564

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

kostasrim
Copy link
Contributor

The original issue was submitted in #4497 and we supplied a fix in #4507. However, the fix ignored that the RdbLoader::Load() function is run per flow/shard thread and the "poison pill" of updating the loading state at the end of RdbLoader::Load() introduced a race condition:

shard_set->Add(i, [bc]() mutable {
      namespaces->GetDefaultNamespace().GetCurrentDbSlice().SetLoadInProgress(false);
      bc->Dec();
    }); 

Any flow F that finished loading its own snapshot first (relatively to the rest of the flows) will call SetLoadInProgress(false) on ALL shard threads. The consequence of that is that other flows are not yet done (their respective RdbLoader::Load()` is still processing) and next time the use the db slice API will start Bumping up items because now load in progress is false.

The fix is to update the state after all shard flows are done and similarly to update all shard flow before we start the Load() which shall provide a consistent state/view among all shard threads.

Should resolve #4554

P.s. we might be able to simplify the new db slice state via the global loading state. That's something I will need to follow but I won't do this as part of this PR.

@kostasrim kostasrim self-assigned this Feb 5, 2025
Copy link
Collaborator

@romange romange left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I misunderstand something but why do we need to orcherstrate SetLoadInProgress on all the shards? Can we do it locally on each shard? i.e. independently for each shard?

@kostasrim
Copy link
Contributor Author

Maybe I misunderstand something but why do we need to orcherstrate SetLoadInProgress on all the shards? Can we do it locally on each shard? i.e. independently for each shard?

Good question! Because master and replica might have different number of proactors. So imagine the following case:

Flow 1 -> Sets the shard's 1 loading state to true.
Flow 2 -> Still hasn't set its local loading state to true (the flow fiber did not even start because the proactor was busy)

Flow 1 had only one item, Load finished instantly and now it calls FinishLoad which calls FlushShardAsync which unfortunately calls LoadItemsBuffer on shard number 2. Boom, state loading is false. In other words, each flow at the end can dispatch to multiple shards which might have not yet have their state updated. And we can't really rely on the order of submissions to the task/shard queue (since we don't have a guarantee of sequential task execution from what you have said in the past -- and I could be wrong here).

However, saying this, I think there is a better solution! If we call FlushShardAsync we can first check if the loading state is updated. If it's not we can set it to true. That way we "save" the first scatter part of the operation (dispatching to all shard threads and update the state). We only keep the "gather" step (updating the state to false after all the loaders completed)

@romange
Copy link
Collaborator

romange commented Feb 5, 2025

i did not analyse the code but maybe using a unsigned counter instead of boolean in db_slice will simplify things?
i.e. every time we start we increase and when we stop we decrease.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SIGSEGV with 1.26.2 and cache_mode=true
2 participants