You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
So, a fix option is to wait for new config to get committed and to execute new commands only after that, but I wonder whether there's an option to solve this at library level.
so that an option toggled would make leader commit all appended entries before pausing writes, no luck -- seems there are way to many invariants that get broken
The text was updated successfully, but these errors were encountered:
Both yield_leadership and request_leadership cannot enforce adding/removing member. This is because there is no guarantee that membership change will eventually be succeeded and committed. For example, 1 is leader, it gets the adding server requests, but fails to replicate the message due to network partition.
Also, membership change should be done one at a time. Next membership change should be done after making sure that the previous change is committed. There is a known problem that multiple membership change at once may result in incorrect quorum and data inconsistency. The original paper tried to resolve it by "joint consensus", but NuRaft does not implement it and instead enforces one member change at a time.
Situation:
yield_leadership
(if a request got on 1) orrequest_leadership
(if we're on 2 or 3).request_leadership
. The request gets to 1. 1 pauses writesQuite a synthetic example, but that's what we encountered in
test_reconfig_replace_leader_in_one_command
ClickHouse/ClickHouse#52901reconfig
: add timeout before yielding/taking leadership ClickHouse/ClickHouse#53481 (attempt to fix that).So, a fix option is to wait for new config to get committed and to execute new commands only after that, but I wonder whether there's an option to solve this at library level.
I tried changing
NuRaft/src/raft_server.cxx
Line 1244 in 188947b
The text was updated successfully, but these errors were encountered: