-
Notifications
You must be signed in to change notification settings - Fork 402
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
*: introduce judge_split_prevote #420
base: master
Are you sure you want to change the base?
Changes from 8 commits
3c75e06
6a895ab
158f7f1
0e535b5
76b7915
7cb2fc8
b55ad24
e5c3aac
e47868b
d41a459
d5c4829
212f621
39b12e8
7a6bdba
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -14,7 +14,7 @@ | |
// See the License for the specific language governing permissions and | ||
// limitations under the License. | ||
|
||
use std::cmp; | ||
use std::cmp::{self, Ordering}; | ||
use std::ops::{Deref, DerefMut}; | ||
|
||
use crate::eraftpb::{ | ||
|
@@ -238,6 +238,7 @@ pub struct RaftCore<T: Storage> { | |
|
||
skip_bcast_commit: bool, | ||
batch_append: bool, | ||
judge_split_prevote: bool, | ||
|
||
heartbeat_timeout: usize, | ||
election_timeout: usize, | ||
|
@@ -342,6 +343,7 @@ impl<T: Storage> Raft<T> { | |
promotable: false, | ||
check_quorum: c.check_quorum, | ||
pre_vote: c.pre_vote, | ||
judge_split_prevote: c.judge_split_prevote, | ||
read_only: ReadOnly::new(c.read_only_option), | ||
heartbeat_timeout: c.heartbeat_tick, | ||
election_timeout: c.election_tick, | ||
|
@@ -1312,6 +1314,18 @@ impl<T: Storage> Raft<T> { | |
"msg type" => ?m.get_msg_type(), | ||
); | ||
|
||
// When judge_split_prevote, reject explicitly to make candidate exit PreCandiate early | ||
// so it will vote for other peer later. | ||
if self.judge_split_prevote | ||
&& m.get_msg_type() == MessageType::MsgRequestPreVote | ||
{ | ||
let mut to_send = | ||
new_message(m.from, MessageType::MsgRequestPreVoteResponse, None); | ||
to_send.term = m.term; | ||
to_send.reject = true; | ||
self.r.send(to_send, &mut self.msgs); | ||
} | ||
|
||
return Ok(()); | ||
} | ||
} | ||
|
@@ -1423,10 +1437,7 @@ impl<T: Storage> Raft<T> { | |
// ...or this is a PreVote for a future term... | ||
(m.get_msg_type() == MessageType::MsgRequestPreVote && m.term > self.term); | ||
// ...and we believe the candidate is up to date. | ||
if can_vote | ||
&& self.raft_log.is_up_to_date(m.index, m.log_term) | ||
&& (m.index > self.raft_log.last_index() || self.priority <= m.priority) | ||
{ | ||
if can_vote && self.may_vote(&m) { | ||
// When responding to Msg{Pre,}Vote messages we include the term | ||
// from the message, not the local term. To see why consider the | ||
// case where a single node was previously partitioned away and | ||
|
@@ -1447,6 +1458,10 @@ impl<T: Storage> Raft<T> { | |
self.election_elapsed = 0; | ||
self.vote = m.from; | ||
} | ||
// This means it's in split vote, give up election. | ||
if self.judge_split_prevote && self.state == StateRole::PreCandidate { | ||
gengliqi marked this conversation as resolved.
Show resolved
Hide resolved
|
||
self.become_follower(self.term, INVALID_ID); | ||
} | ||
} else { | ||
self.log_vote_reject(&m); | ||
let mut to_send = | ||
|
@@ -1469,6 +1484,40 @@ impl<T: Storage> Raft<T> { | |
Ok(()) | ||
} | ||
|
||
/// Checks if this node may vote for the request. It may vote when from node | ||
/// contains the same logs or more logs. When they contains same logs, priority | ||
/// is considered. If priorities are still the same, and this node is also | ||
/// starting campaign, then split vote happens. In this case, if `judge_split_prevote` | ||
/// and `pre_vote` are enabled, it will vote only when from node has greater ID. | ||
fn may_vote(&self, m: &Message) -> bool { | ||
match self.raft_log.is_up_to_date(m.index, m.log_term) { | ||
Ordering::Greater => true, | ||
Ordering::Equal => { | ||
match self.priority.cmp(&m.priority) { | ||
Ordering::Greater => false, | ||
Ordering::Equal => { | ||
// judge split vote can break symmetry of campaign, but as | ||
// it only happens during split vote, the impact should not | ||
// be significant. | ||
Comment on lines
+1531
to
+1533
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How can we tell if a campaign will end up split vote? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This could cause some raft nodes with lower ID impossible to become the leader even we want to transfer leadership to it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If the configuration size is an odd number and leader is down, split vote can probably happen If two nodes are in PreCandidate state. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Besides transfer leader, a node needs to pass pre-campaign before start the actual campaign, so There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Indeed, it depends on whether nodes are working slowly. But in my tests, when one node is down, and after elections are finished, the leader count on each nodes don't have much differences (I remove balance leader scheduler before shutdown a node). And even it leads to more leaders on some node, it should not be a problem with the help of PD to reach a eventually balance. |
||
// Transfering leader skips prevote, so they won't have impact | ||
// on the other. | ||
!self.judge_split_prevote | ||
|| self.state != StateRole::PreCandidate | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Add some comments for transfer leader? |
||
|| m.get_msg_type() != MessageType::MsgRequestPreVote | ||
|| { | ||
let from_id = m.from; | ||
let (my_h, from_h) = | ||
(fxhash::hash64(&self.id), fxhash::hash64(&from_id)); | ||
my_h < from_h || (my_h == from_h && self.id < from_id) | ||
} | ||
} | ||
Ordering::Less => true, | ||
} | ||
} | ||
Ordering::Less => false, | ||
} | ||
} | ||
|
||
fn hup(&mut self, transfer_leader: bool) { | ||
if self.state == StateRole::Leader { | ||
debug!( | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems
Follower
andPreCandidate
are not different, both of them can (pre)vote to other peers.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Follower
will ignore allPreVoteResponse
. For example, A and B split pre-votes, and C can vote for both A and B. If B is chosen, and A doesn't step down to follower, it will still start campaign when C's prevote is received.