Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test: Fix flaky leader index in waitLeader function #188

Merged
merged 1 commit into from
Mar 23, 2024

Conversation

MrDXY
Copy link
Contributor

@MrDXY MrDXY commented Mar 21, 2024

Summary

The waitLeader function currently has an issue where it returns the wrong leader index in certain scenarios. When iterating through nodes to find the leader, if the leader changes during the iteration, the function incorrectly returns the default value instead of the actual leader index.

Example

In the function, waitLeader:

raft/rafttest/node_test.go

Lines 130 to 151 in e1bfcf7

func waitLeader(ns []*node) int {
var l map[uint64]struct{}
var lindex int
for {
l = make(map[uint64]struct{})
for i, n := range ns {
lead := n.Status().SoftState.Lead
if lead != 0 {
l[lead] = struct{}{}
if n.id == lead {
lindex = i
}
}
}
if len(l) == 1 {
return lindex
}
}
}

Let's say,

  1. i is 3 now, and we still don't have a leader elected.
  2. Then node[1] is elected as a leader,
  3. i is 4 now, and the leader of node[4] has changed from 0 to 1 in step2. So l[lead] = struct{}{} will be executed. But n.id == lead is false, lindex still equals 0.
  4. The iteration ends, and len(l) now equals 1. This function will return lindex, which is 0. But 0 is not the real leader index, but the default value.

Proposed Changes

To address this issue, the proposed solution involves initializing the lindex variable to -1 instead of 0. This change ensures that lindex holds a distinct value indicating 'not found' rather than potentially conflating it with the index of the first element.

Changes Made

  • Initialized lindex variable to -1 instead of 0.

Impact

This change ensures that the waitLeader function correctly returns the leader index, avoiding confusion between 'not found' and the index of the first element.

Fixes #181

Copy link
Member

@ahrtr ahrtr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Thanks for the fix

@ahrtr ahrtr merged commit ebcf5af into etcd-io:main Mar 23, 2024
9 checks passed
@MrDXY MrDXY deleted the fix-flaky-result-in-waitLeader branch March 23, 2024 22:56
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this pull request Sep 4, 2024
Fixes cockroachdb#127413.

This commit bypasses the larger rebase in cockroachdb#122133 to pick up the test
flake fix in etcd-io/raft#188. There was some
discussion in etcd-io/raft#181 about alternatives
for fixing this test. For now, we stick with a direct cherry-pick.

Release note: None
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this pull request Sep 6, 2024
Fixes cockroachdb#127413.

This commit bypasses the larger rebase in cockroachdb#122133 to pick up the test
flake fix in etcd-io/raft#188. There was some
discussion in etcd-io/raft#181 about alternatives
for fixing this test. For now, we stick with a direct cherry-pick.

Release note: None
craig bot pushed a commit to cockroachdb/cockroach that referenced this pull request Sep 6, 2024
130084: raft: fix flaky leader index in waitLeader function r=pav-kv a=nvanbenschoten

Fixes #127413.

This commit bypasses the larger rebase in #122133 to pick up the test flake fix in etcd-io/raft#188. There was some discussion in etcd-io/raft#181 about alternatives for fixing this test. For now, we stick with a direct cherry-pick.

Release note: None

Co-authored-by: Nathan VanBenschoten <[email protected]>
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this pull request Sep 6, 2024
Fixes cockroachdb#127413.

This commit bypasses the larger rebase in cockroachdb#122133 to pick up the test
flake fix in etcd-io/raft#188. There was some
discussion in etcd-io/raft#181 about alternatives
for fixing this test. For now, we stick with a direct cherry-pick.

Release note: None
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this pull request Sep 6, 2024
Fixes cockroachdb#127413.

This commit bypasses the larger rebase in cockroachdb#122133 to pick up the test
flake fix in etcd-io/raft#188. There was some
discussion in etcd-io/raft#181 about alternatives
for fixing this test. For now, we stick with a direct cherry-pick.

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

rafttest: TestRestart is flaky
3 participants