Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clientv3test: add comments for clientv3test #16920

Merged
merged 1 commit into from
Nov 23, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion client/v3/ordering/util.go
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ func NewOrderViolationSwitchEndpointClosure(c *clientv3.Client) OrderViolationFu
violationCount := int32(0)
return func(_ clientv3.Op, _ clientv3.OpResponse, _ int64) error {
// Each request is assigned by round-robin load-balancer's picker to a different
// endpoints. If we cycled them 5 times (even with some level of concurrency),
// endpoint. If we cycled them 5 times (even with some level of concurrency),
// with high probability no endpoint points on a member with fresh data.
// TODO: Ideally we should track members (resp.opp.Header) that returned
// stale result and explicitly temporarily disable them in 'picker'.
Expand Down
11 changes: 11 additions & 0 deletions tests/integration/clientv3/ordering_util_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,9 @@ import (
integration2 "go.etcd.io/etcd/tests/v3/framework/integration"
)

// TestEndpointSwitchResolvesViolation ensures
// - ErrNoGreaterRev error is returned from partitioned member when it has stale revision
// - no more error after partition recovers
func TestEndpointSwitchResolvesViolation(t *testing.T) {
integration2.BeforeTest(t)
clus := integration2.NewCluster(t, &integration2.ClusterConfig{Size: 3})
Expand Down Expand Up @@ -78,8 +81,16 @@ func TestEndpointSwitchResolvesViolation(t *testing.T) {
if err != ordering.ErrNoGreaterRev {
t.Fatal("While speaking to partitioned leader, we should get ErrNoGreaterRev error")
}

clus.Members[2].RecoverPartition(t, clus.Members[:2]...)
time.Sleep(1 * time.Second) // give enough time for the operation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTE: The orderingKv.Get doesn't have backoff try. It might make the test case flaky since the CI VM is unstable and the 1 second might be not enough for member[2] to get latest data.

Sorry, I don't have good idea to ensure the data is synced from leader. Maybe we can remove clientv3.WithSerializable() option to force to use linear read.

Copy link
Contributor Author

@shaoqin2 shaoqin2 Nov 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think when creating this orderingKv object, we passed the retry function NewOrderViolationSwitchEndpointClosure

	orderingKv := ordering.NewKV(cli.KV, ordering.NewOrderViolationSwitchEndpointClosure(cli))

when the Get fails, we will enter the retry based on

		err = kv.orderViolationFunc(op, r, prevRev)
		if err != nil {
			return nil, err
		}

Am I understanding this correctly? The only thing missing might be an actual backoff function, as it is I think it immediately retries, not sure if that's what you meant.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for unclear comment.

The orderingKv did retry immediately.

However, the Github Action CI VM is unstable. When you recover member[2] from network partition and sleep 1s, it doesn't guarantee the member[2] can sync with leader during ordering's retry.

The Line 87 _, err = orderingKv.Get(ctx, "foo", clientv3.WithSerializable()) still might return error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah that make sense. The idea is with default linearizable it will be forced to talk to the other members to ensure the most recent data is returned when Get is called right?

Removed with WithSerializable option from the call. Let me know if looks good

_, err = orderingKv.Get(ctx, "foo")
if err != nil {
t.Fatal("After partition recovered, third member should recover and return no error")
}
}

// TestUnresolvableOrderViolation ensures ErrNoGreaterRev error is returned when available members only have stale revisions
func TestUnresolvableOrderViolation(t *testing.T) {
integration2.BeforeTest(t)
clus := integration2.NewCluster(t, &integration2.ClusterConfig{Size: 5, UseBridge: true})
Expand Down
Loading