-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tests/e2e: add test cases related to HashKV #18369
Conversation
tests/e2e/hashkv_test.go
Outdated
} | ||
} | ||
|
||
func TestVerifyHashKVAfterTwoCompactions_MixVersions(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Side note: If #18274 is accepted, the main branch always keep compacted revision. However, existing releases delete that compacted revisions if they're tombstone. The #18274 should skip previous compacted revision which is tombstone. So that HashKV result can be the same in mix versions, especially it won't file data corrupt alert when cluster is updating to new release.
tests/e2e/hashkv_test.go
Outdated
hashKVOnRev int64 | ||
}{ | ||
{ | ||
compactedOnRev: 33, // tombstone |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't use comments to imply some relation of number here and consts in newTestKeySetInCluster
. Passing an explicit variable should always be better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rollback to original one. Please take a look. If it's accepted, I will squash the commits. Thanks
ping @ahrtr |
/retest |
tests/e2e/hashkv_test.go
Outdated
|
||
scenarios := []struct { | ||
ClusterVersion fe2e.ClusterVersion | ||
OnlyOneKey bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you provide more context what OnlyOneKey
means?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When there is only one key and new compaction revision is on tombstone, all MVCC keys will be involved in hash value, because there is empty available revision result keep
.
etcd/server/storage/mvcc/hash.go
Lines 60 to 76 in e7f5729
func (h *kvHasher) WriteKeyValue(k, v []byte) { | |
kr := BytesToRev(k) | |
upper := Revision{Main: h.revision + 1} | |
if !upper.GreaterThan(kr) { | |
return | |
} | |
lower := Revision{Main: h.compactRevision + 1} | |
// skip revisions that are scheduled for deletion | |
// due to compacting; don't skip if there isn't one. | |
if lower.GreaterThan(kr) && len(h.keep) > 0 { | |
if _, ok := h.keep[kr]; !ok { | |
return | |
} | |
} | |
h.hash.Write(k) | |
h.hash.Write(v) | |
} |
I want to keep previous compaction revision in #18274, no matter what type of key it is, normal revision or tombstone. Both v3.5.x and v3.4.x releases delete compaction revision if it's tombstone. So, if there is empty available revision result keep
, #18274 patch could possibly involve the tombstone in hash value, while the existing releases doesn't. The scenario can be used to verify #18274 won't break compatible.
For example, in one cluster, we have two v3.5.X ETCD members and one main-branch(with #18274 patch) ETCD member. In this cluster, we only have one key foo
.
// key: "foo"
// modified: 9
// generations:
// {{9, 0}[1]}
// {{5, 0}, {6, 0}, {7, 0}, {8, 0}(t)[4]}
// {{2, 0}, {3, 0}, {4, 0}(t)[3]}
First compaction is on revision {Main: 4}
.
In v3.5.X member
// key: "foo"
// modified: 9
// generations:
// {{9, 0}[1]}
// {{5, 0}, {6, 0}, {7, 0}, {8, 0}(t)[4]}
in main-branch(with #18274 patch) member
// key: "foo"
// modified: 9
// generations:
// {{9, 0}[1]}
// {{5, 0}, {6, 0}, {7, 0}, {8, 0}(t)[4]}
// {4, 0}(t)[3]}
If HashKV is to have hash value on Revision{Main: 8}
, based on v3.5.x, the hash result will involve the following keys
- Revision{Main: 5}
- Revision{Main: 6}
- Revision{Main: 7}
- Revision{Main: 8}
But for main-branch(with #18274 patch), it could compute the Revision{Main: 4}
for HashKV result. It's unexpected, which could cause data corruption alert.
So, I use OnlyOneKey
to represent this case. Sorry It's confusing.
However, I don't want to involve too many detail in here because it seems reasonable to have different key set to test.
tests/e2e/hashkv_test.go
Outdated
|
||
compactedOnRev := dataset.tombstones[0] | ||
if !tt.compactedOnTombstoneRev { | ||
compactedOnRev += 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the significance of number 3
here? Can we create a local const? Like numberOfRevisions
or emptyWrites
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3 is random number I use to pick up a non-tombstone revision for compaction.
As your suggestion, I use afterWrites
variable for that. Please take a look.
tests/e2e/hashkv_test.go
Outdated
if clusVersion != fe2e.CurrentVersion { | ||
if !fileutil.Exist(fe2e.BinPath.EtcdLastRelease) { | ||
t.Skipf("%q does not exist", fe2e.BinPath.EtcdLastRelease) | ||
} | ||
} | ||
|
||
ctx := context.Background() | ||
|
||
fe2e.BeforeTest(t) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should be in the test case directly instead of a helper function.
I don't think we need newClusterForHashKV
at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I removed it.
tests/e2e/hashkv_test.go
Outdated
} | ||
|
||
type datasetInfo struct { | ||
keys map[string]struct{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You just populate the data for keys
, but it isn't used at all.
We can remove datasetInfo
, and just return tombstones
and latestRev
in the functions of populating data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
tests/e2e/hashkv_test.go
Outdated
t.Logf("HashKV on rev=%d", hashKVOnRev) | ||
resp, err := cli.HashKV(ctx, hashKVOnRev) | ||
require.NoError(t, err) | ||
|
||
require.Len(t, resp, 3) | ||
require.True(t, resp[0].Hash != 0) | ||
t.Logf("One Hash value is %d", resp[0].Hash) | ||
|
||
require.Equal(t, resp[0].Hash, resp[1].Hash) | ||
require.Equal(t, resp[1].Hash, resp[2].Hash) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor comment, we can add a helper function something verifyConsistentHashAcrossAllMembers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
tests/e2e/hashkv_test.go
Outdated
"go.etcd.io/etcd/client/pkg/v3/fileutil" | ||
clientv3 "go.etcd.io/etcd/client/v3" | ||
"go.etcd.io/etcd/tests/v3/framework/config" | ||
fe2e "go.etcd.io/etcd/tests/v3/framework/e2e" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just e2e? All other test just import this as e2e.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
tests/e2e/hashkv_test.go
Outdated
lastestRevision int64 | ||
} | ||
|
||
// key: "foo" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for crafting such a meticulous testing scenario, this seems very throughout, however for reviewer it's or any future contributor it might be hard to understand why we picked this exact data shape and no other.
My question would be, how hard would it be to generalize? We know one an exact case that we know that we could break hashKV, but how sure we are that we don't miss another case? Hardcoding a exact scenario protects us against this exact mistake, but can we be sure there will not be more such cases in the future? Can we be sure that HashKV, is resilient enough?
Have you heard about property testing? https://www.mayhem.security/blog/what-is-property-based-testing. For me the etcd should always hold property that no matter the type of write, no matter the compaction revision, no matter the etcd version, no matter the revision, HashKV should always be the same on all cluster members.
I think it would be ok to merge those exact scenarios, but I also we should think how we can generalize it. For example, instead of checking hashKV on one revision, test all of them in for loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of checking hashKV on one revision, test all of them in for loop.
Good point. Will simplify data generator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I follow @ahrtr suggestion with using populateData
. Please take a look.
tests/e2e/hashkv_test.go
Outdated
// generations: | ||
// | ||
// {34, 1} | ||
func newTestDatasetInCluster(t *testing.T, clus *fe2e.EtcdProcessCluster, cliCfg fe2e.ClientConfig, onlyOneKey bool) *datasetInfo { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's unnecessarily complicated.
Please consider a function something like below,
// populateData populates some sample data, and return a slice of tombstone revisions and the latest revision
func populateData(t *testing.T, clus *fe2e.EtcdProcessCluster, clientCfg fe2e.ClientConfig, keys []string) ([]int64, int64) {
c := newClient(t, clus.EndpointsGRPC(), clientCfg)
ctx := context.Background()
totalOperations := 40
var (
tombStoneRevs []int64
latestRev int64
)
deleteStep := 10 // submit a delete operation on every 10 operations
for i := 1; i <= totalOperations; i++ {
if i%deleteStep == 0 {
t.Logf("Deleting key=%s", keys[0]) // Only delete the first key for simplicity
resp, derr := c.Delete(ctx, keys[0])
require.NoError(t, derr)
latestRev = resp.Header.Revision
tombStoneRevs = append(tombStoneRevs, resp.Header.Revision)
continue
}
value := fmt.Sprintf("%d", i)
var ops []clientv3.Op
for _, key := range keys {
ops = append(ops, clientv3.OpPut(key, value))
}
t.Logf("Writing keys: %v, value: %s", keys, value)
resp, terr := c.Txn(ctx).Then(ops...).Commit()
require.NoError(t, terr)
require.True(t, resp.Succeeded)
require.Len(t, resp.Responses, len(ops))
latestRev = resp.Header.Revision
}
return tombStoneRevs, latestRev
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I used your code in latest commit. Please take a look.
tests/e2e/hashkv_test.go
Outdated
require.Len(t, resp, 3) | ||
require.True(t, resp[0].Hash != 0) | ||
t.Logf("One Hash value is %d", resp[0].Hash) | ||
|
||
require.Equal(t, resp[0].Hash, resp[1].Hash) | ||
require.Equal(t, resp[1].Hash, resp[2].Hash) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
require.Len(t, resp, 3) | |
require.True(t, resp[0].Hash != 0) | |
t.Logf("One Hash value is %d", resp[0].Hash) | |
require.Equal(t, resp[0].Hash, resp[1].Hash) | |
require.Equal(t, resp[1].Hash, resp[2].Hash) | |
require.Greater(t, len(resp), 1) | |
require.True(t, resp[0].Hash != 0) | |
t.Logf("One Hash value is %d", resp[0].Hash) | |
for i := 1; i < len(resp); i++ { | |
require.Equal(t, resp[0].Hash, resp[i].Hash) | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the suggestion. updated
tests/e2e/hashkv_test.go
Outdated
// If compaction revision is not tombstone, select revision after 3 writes from first tombstone. | ||
// And ensure it's not the following tombstone. | ||
const afterWriters = int64(3) | ||
if !compactedOnTombstoneRev { | ||
compactedOnRev += afterWriters | ||
require.True(t, tombstoneRevs[1] > compactedOnRev) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 3
may cause unnecessary confusion. We don't care about the number at all, we just need to ensure the compaction revision isn't a tombstone.
// If compaction revision is not tombstone, select revision after 3 writes from first tombstone. | |
// And ensure it's not the following tombstone. | |
const afterWriters = int64(3) | |
if !compactedOnTombstoneRev { | |
compactedOnRev += afterWriters | |
require.True(t, tombstoneRevs[1] > compactedOnRev) | |
} | |
// If compaction revision isn't a tombstone, select a revision in the middle of two tombstones. | |
if !compactedOnTombstoneRev { | |
compactedOnRev = (tombstoneRevs[0] + tombstoneRevs[1]) / 2 | |
require.True(t, tombstoneRevs[0] < compactedOnRev && compactedOnRev < tombstoneRevs[1]) | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
|
||
tombstoneRevs, lastestRev := populateDataForHashKV(t, clus, cfg.Client, scenario.keys) | ||
|
||
compactedOnRev := tombstoneRevs[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also add a scenario which compact the last tombstone tombstoneRevs[len(tombstoneRevs) - 1]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added TestVerifyHashKVAfterCompactionOnLastTombstone_MixVersions
34f81e7
to
c118987
Compare
tests/e2e/hashkv_test.go
Outdated
// If compaction revision is not tombstone, select revision after 3 writes from first tombstone. | ||
// And ensure it's not the following tombstone. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor comment
// If compaction revision is not tombstone, select revision after 3 writes from first tombstone. | |
// And ensure it's not the following tombstone. | |
// If compaction revision isn't a tombstone, select a revision in the middle of two tombstones. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my mistake. Updated
Signed-off-by: Wei Fu <[email protected]>
Ideally, we should consolidate the two cases, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Nice work, thanks @fuweid
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to give us confidence in fixing the tombstone compaction bug. Thanks @fuweid awesome work.
I think we should also consider generalizing it by adding hashKV checking to robustness tests.
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ahrtr, fuweid, serathius The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
// If compaction revision isn't a tombstone, select a revision in the middle of two tombstones. | ||
if !compactedOnTombstoneRev { | ||
compactedOnRev = (tombstoneRevs[0] + tombstoneRevs[1]) / 2 | ||
require.Greater(t, tombstoneRevs[1], compactedOnRev) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: It seems we partially missed the comment #18369 (comment)
require.True(t, tombstoneRevs[0] < compactedOnRev && compactedOnRev < tombstoneRevs[1])
Follow-up: etcd-io#18369 (comment) Signed-off-by: Wei Fu <[email protected]>
Thanks @ahrtr @serathius for the review. I file pr #18387 to fix the comment #18369 (comment). |
Related to #18274
Please read https://github.com/etcd-io/etcd/blob/main/CONTRIBUTING.md#contribution-flow.