Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix/TestEvacuateShard test #2868

Merged
merged 2 commits into from
Jun 13, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions pkg/local_object_storage/engine/evacuate.go
Original file line number Diff line number Diff line change
Expand Up @@ -164,6 +164,8 @@ mainLoop:
}
continue loop
}

e.log.Debug("could not put to shard, trying another", zap.String("shard", shards[j].ID().String()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if last shard failed there wont be more trying

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a problem to me, classic iteration: "try another shard; shard list is over; i finished cycle"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what exactly would you expect here?

}

if prm.handler == nil {
Expand Down
2 changes: 1 addition & 1 deletion pkg/local_object_storage/engine/evacuate_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ func newEngineEvacuate(t *testing.T, shardNum int, objPerShard int) (*StorageEng

e := New(
WithLogger(zaptest.NewLogger(t)),
WithShardPoolSize(1))
WithShardPoolSize(uint32(objPerShard)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in general, doubtful fix to me possibly hiding buggy implementation or test

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doubtful fix to me

why? this is literally the problem, i proofed it locally, required 100k+ test runs. ants.Pool has background workers and some logic that blocks execution (calculates num of wip workers), if it does not free it before another iteration is being done, current shard logic just tries another shard and duplicates objects

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in other words, 1-sized pool cannot guarantee that an object will be put if you want to put 2+ objects. on a few-core machine, it is more critical, on my notebook it is like ~ 0.003%. but an evacuate test should not suffer because of the put problem IMO, it should test evacuation logic and it cannot, because objects are duplicated and go to unexpected shards

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're testing evacuation here, so shard behavior details are not really relevant (subject to another bug).


ids := make([]*shard.ID, shardNum)

Expand Down
10 changes: 10 additions & 0 deletions pkg/local_object_storage/engine/put.go
Original file line number Diff line number Diff line change
Expand Up @@ -114,6 +114,11 @@

exists, err := sh.Exists(existPrm)
if err != nil {
e.log.Warn("object put: check object existence",
zap.Stringer("addr", addr),
zap.Stringer("shard", sh.ID()),
zap.Error(err))

Check warning on line 120 in pkg/local_object_storage/engine/put.go

View check run for this annotation

Codecov / codecov/patch

pkg/local_object_storage/engine/put.go#L117-L120

Added lines #L117 - L120 were not covered by tests

if shard.IsErrObjectExpired(err) {
// object is already found but
// expired => do nothing with it
Expand All @@ -138,6 +143,10 @@
}
}

e.log.Debug("object put: object already exists",
zap.Stringer("shard", sh.ID()),
zap.Stringer("addr", addr))

return
}

Expand All @@ -163,6 +172,7 @@

putSuccess = true
}); err != nil {
e.log.Warn("object put: pool task submitting", zap.Error(err))

Check warning on line 175 in pkg/local_object_storage/engine/put.go

View check run for this annotation

Codecov / codecov/patch

pkg/local_object_storage/engine/put.go#L175

Added line #L175 was not covered by tests
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how this affects the system and/or what's the admin's reaction to this message?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

load to this shard currently is bigger than it can handle. also, we have a system that says where to put an object but now (this line) it is not gonna work: an object goes to another shard (we had (have?) bugs that relate such cases). in fact, it always bothered me that object put was always kinda random, you never know from logs if this placement is ok or a "bad" shard was taken as a best-effort

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this PR (and the issue) is a real example BTW. if i had this log, i wouldn't have had to run this test so many times trying to understand what was happening, it would have taken 1 min looking at logs

close(exitCh)
}

Expand Down
Loading