Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lock resync #608

Merged
merged 6 commits into from
Oct 2, 2023
Merged

Lock resync #608

merged 6 commits into from
Oct 2, 2023

Conversation

vvarg229
Copy link
Collaborator

No description provided.

Copy link
Member

@roman-khimov roman-khimov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have other cases from #550 covered?

"""
Lock objects should fill metabase on resync_metabase
"""
restart_storage_nodes(self.cluster.storage_nodes)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not enough for the nspcc-dev/neofs-node#1502 case, metabase must be resynchronized.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not enough for the nspcc-dev/neofs-node#1502 case, metabase must be resynchronized.

@roman-khimov @carpawell can you please recommend what is the way to make the metabase resynchronized?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A configuration change is needed before the node restart, resync_metabase should be set to true, see https://github.com/nspcc-dev/neofs-node/blob/master/docs/storage-node-configuration.md#shard-subsection

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I strongly recommend dropping metabase too.

Copy link
Member

@roman-khimov roman-khimov Aug 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which means physically deleting metabase files (take path from the config). But it should be done on stopped node.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@roman-khimov @carpawell If I physically delete the metabase files, should I set resync_metabase to true?
Or should I change the configuration file in the dev-env https://github.com/nspcc-dev/neofs-dev-env/blob/master/services/storage/cfg/config.yml#L39 ?
Why is resync_metabase false?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I physically delete the metabase files, should I set resync_metabase to true?

Yes.

Why is resync_metabase false?

It's never true in production, this doesn't make much sense, it's only needed when there are some problems with it (storage failure or meta-incompatible node upgrade). So dev-env should have it set to false as usual, it's only some specific test that wants to do something. BTW, maybe it'd be easier for you to use environment variable to override this setting.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After deleting the metabase files from all node object deletion fails with an error:
gRPC dial: context deadline exceeded

https://http.t5.fs.neo.org/86C4P6uJC7gb5n3KkwEGpXRfdczubXyRNW5N9KeJRW73/334-1694095194/index.html#suites/b34feeefa7bd0ee424f598738938f10a/afc0e4a3ea957e70/

@roman-khimov @carpawell please tell me, is this expected behavior?

Copy link
Member

@carpawell carpawell Sep 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a too-wide error, usually, it means that a node is not ready to handle connections (not started successfully?)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@vvarg229
Copy link
Collaborator Author

Do we have other cases from #550 covered?

No, the other tests will be added in the next PR.

@vvarg229 vvarg229 marked this pull request as draft August 23, 2023 12:04
@roman-khimov
Copy link
Member

BTW, the test you have here is for #556, not #550.

@vvarg229 vvarg229 force-pushed the lock-resync branch 2 times, most recently from 132d7fa to 50a4c29 Compare September 4, 2023 08:49
@vvarg229 vvarg229 force-pushed the lock-resync branch 2 times, most recently from 5c18773 to a7e75dd Compare September 7, 2023 13:59
@roman-khimov
Copy link
Member

Needs to be rebased

@vvarg229 vvarg229 marked this pull request as ready for review September 7, 2023 16:39
@vvarg229
Copy link
Collaborator Author

vvarg229 commented Sep 7, 2023

Needs to be rebased

Fixed

@vvarg229 vvarg229 marked this pull request as draft September 7, 2023 17:07
@vvarg229 vvarg229 marked this pull request as ready for review September 7, 2023 17:07
robot/variables/common.py Outdated Show resolved Hide resolved
pytest_tests/testsuites/object/test_object_lock.py Outdated Show resolved Hide resolved
pytest_tests/testsuites/object/test_object_lock.py Outdated Show resolved Hide resolved
pytest_tests/testsuites/object/test_object_lock.py Outdated Show resolved Hide resolved
pytest_tests/testsuites/object/test_object_lock.py Outdated Show resolved Hide resolved

modified_lines = []

for line in lines:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, since that is a tricky hack, isn't appending the file with the required envs an easier approach? or renaming the original file, reading it, creating a new file (and appending the needed lines to it), and then renaming it back?

but that is just a reading simplification. mb you think differently

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, since that is a tricky hack, isn't appending the file with the required envs an easier approach? or renaming the original file, reading it, creating a new file (and appending the needed lines to it), and then renaming it back?

but that is just a reading simplification. mb you think differently

Yeah, I've been thinking about it.
However, it turns out that I have to copy the docker-compose.yml file from dev-env and change the path to the new file with new environment variables.
It seems to me that this will complicate the code too much and make it incomprehensible.

pytest_tests/testsuites/object/test_object_lock.py Outdated Show resolved Hide resolved
pytest_tests/testsuites/object/test_object_lock.py Outdated Show resolved Hide resolved

@allure.title("Locked object should be protected from deletion after the storage nodes are restarted")
@pytest.mark.parametrize(
# Only complex objects are required for this test
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why?

Is it important to check the simple object here as well? If you think it's important, I'll add simple_object_size too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the logic behind the complex objects are complicated enought to treat them an absolutelly different way of object storage, so i would expect testing both always. i have fixed both the bugs that relate the simple objects only and the bugs that affects chains of the objects

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the logic behind the complex objects are complicated enought to treat them an absolutelly different way of object storage, so i would expect testing both always. i have fixed both the bugs that relate the simple objects only and the bugs that affects chains of the objects

Fixed

pytest_tests/testsuites/object/test_object_lock.py Outdated Show resolved Hide resolved
@vvarg229 vvarg229 marked this pull request as draft September 8, 2023 09:08
@vvarg229 vvarg229 force-pushed the lock-resync branch 3 times, most recently from 1bed9c2 to 19caf62 Compare September 14, 2023 18:46
vvarg229 added a commit to vvarg229/neofs-node that referenced this pull request Sep 25, 2023
This commit rolls back e1aed14
It was a mistake to make that change.
Now it is not necessary, nspcc-dev/neofs-testcases#608
allows you to do without a separate flag.

Signed-off-by: Oleg Kulachenko <[email protected]>
vvarg229 added a commit to vvarg229/neofs-dev-env that referenced this pull request Sep 25, 2023
This commit rolls back 39f2bcf
It was a mistake to make that change.
Now it is not necessary, nspcc-dev/neofs-testcases#608
allows you to do without a separate flag.

Signed-off-by: Oleg Kulachenko <[email protected]>
vvarg229 added a commit to vvarg229/neofs-dev-env that referenced this pull request Sep 25, 2023
This commit rolls back 39f2bcf
It was a mistake to make that change.
Now it is not necessary, nspcc-dev/neofs-testcases#608
allows to do without a separate flag.

Signed-off-by: Oleg Kulachenko <[email protected]>
vvarg229 added a commit to vvarg229/neofs-node that referenced this pull request Sep 25, 2023
This commit rolls back e1aed14
It was a mistake to make that change.
Now it is not necessary, nspcc-dev/neofs-testcases#608
allows to do without a separate flag.

Signed-off-by: Oleg Kulachenko <[email protected]>
@vvarg229
Copy link
Collaborator Author

Colleagues, thank you very much for noticing and participating in the discussion.
All errors have been fixed, here are the test results:
https://http.t5.fs.neo.org/86C4P6uJC7gb5n3KkwEGpXRfdczubXyRNW5N9KeJRW73/351-1695662017/index.html

In order for the tests to work correctly, we need to merge these 2 PRs first:
nspcc-dev/neofs-node#2592
nspcc-dev/neofs-dev-env#291

@carpawell apologies to you - we made changes to the .env files before your review, if we had waited we wouldn't have had to revert those changes now.

@vvarg229 vvarg229 marked this pull request as ready for review September 25, 2023 17:48
Copy link
Member

@roman-khimov roman-khimov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But OK otherwise. @carpawell, @evgeniiz321?

Add DOCKER_COMPOSE_ENV_FILE, DOCKER_COMPOSE_CONFIG_FILE
and METABASE_RESYNC_TIMEOUT

Signed-off-by: Oleg Kulachenko <[email protected]>
Added a functions that stop and restarts the specified storage nodes.

Signed-off-by: Oleg Kulachenko <[email protected]>
Added a function to delete metadata from host for specified node.

Signed-off-by: Oleg Kulachenko <[email protected]>
Added a function to enable metabase resync on start.

Signed-off-by: Oleg Kulachenko <[email protected]>
The test_the_object_lock_should_be_kept_after_metabase_deletion
test added.
This test verifies the issue
nspcc-dev/neofs-node#1502

Signed-off-by: Oleg Kulachenko <[email protected]>
The test_enable_resync_metabase_delete_metadata and the test_delete_metadata
tests added.

Signed-off-by: Oleg Kulachenko <[email protected]>
@roman-khimov roman-khimov merged commit e71fc06 into nspcc-dev:master Oct 2, 2023
1 check passed
carpawell pushed a commit to carpawell/neofs-node that referenced this pull request Oct 23, 2023
This commit rolls back e1aed14
It was a mistake to make that change.
Now it is not necessary, nspcc-dev/neofs-testcases#608
allows to do without a separate flag.

Signed-off-by: Oleg Kulachenko <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants