Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test for Graceful node shutdown #8233

Merged

Conversation

avd-sagare
Copy link
Contributor

Test to verify that data is accessible and uncorrupted as well as
operational cluster and osd are up after graceful nodes shutdown

@avd-sagare avd-sagare requested a review from a team as a code owner August 14, 2023 08:44
@avd-sagare avd-sagare force-pushed the System_test-Shutdown_Restart branch from 4ab31cf to ae3545e Compare August 14, 2023 13:10
Copy link

@ocs-ci ocs-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR validation on existing cluster

Cluster Name: asagare-414aws
Cluster Configuration:
PR Test Suite: system_test
PR Test Path: tests/e2e/system_test/test_graceful_nodes_shutdown.py
Additional Test Params:
OCP VERSION: 4.14
OCS VERSION: 4.14
tested against branch: master

Job UNSTABLE (some or all tests failed).

@avd-sagare avd-sagare requested a review from Shrivaibavi August 17, 2023 13:28
Copy link

@ocs-ci ocs-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR validation on existing cluster

Cluster Name: asagare-414aws
Cluster Configuration:
PR Test Suite: system_test
PR Test Path: tests/e2e/system_test/test_graceful_nodes_shutdown.py
Additional Test Params:
OCP VERSION: 4.14
OCS VERSION: 4.14
tested against branch: master

Job UNSTABLE (some or all tests failed).

@avd-sagare
Copy link
Contributor Author

Blocked Validation due to https://bugzilla.redhat.com/show_bug.cgi?id=2227835

@avd-sagare avd-sagare force-pushed the System_test-Shutdown_Restart branch from ae3545e to 9bed8b5 Compare August 23, 2023 05:40
Copy link

@ocs-ci ocs-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR validation on existing cluster

Cluster Name: asagare-odf-hugep
Cluster Configuration:
PR Test Suite: system_test
PR Test Path: tests/e2e/system_test/test_graceful_nodes_shutdown.py
Additional Test Params:
OCP VERSION: 4.14
OCS VERSION: 4.14
tested against branch: master

Job FAILED (installation failed, tests not executed).

Copy link

@ocs-ci ocs-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR validation on existing cluster

Cluster Name: asagare-odf-hugepages
Cluster Configuration:
PR Test Suite: system_test
PR Test Path: tests/e2e/system_test/test_graceful_nodes_shutdown.py
Additional Test Params:
OCP VERSION: 4.14
OCS VERSION: 4.14
tested against branch: master

Job UNSTABLE (some or all tests failed).

@avd-sagare avd-sagare force-pushed the System_test-Shutdown_Restart branch from 9bed8b5 to 9b61886 Compare August 23, 2023 12:49
Copy link

@ocs-ci ocs-ci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR validation on existing cluster

Cluster Name: asagare-odf-hugepages
Cluster Configuration:
PR Test Suite: system_test
PR Test Path: tests/e2e/system_test/test_graceful_nodes_shutdown.py
Additional Test Params:
OCP VERSION: 4.14
OCS VERSION: 4.14
tested against branch: master

Job UNSTABLE (some or all tests failed).

@avd-sagare avd-sagare force-pushed the System_test-Shutdown_Restart branch from 9b61886 to 85bd0fd Compare September 21, 2023 06:55
# S3 bucket
self.setup_s3_bucket(mcg_obj=mcg_obj, bucket_factory=bucket_factory)

def setup_amq_kafka_notification(self, bucket_factory):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see if you can use existing amq common function instead of writing new one here.

Comment on lines 671 to 686
# (
# "OC",
# {
# "interface": "OC",
# "namespace_policy_dict": {
# "type": "Cache",
# "ttl": 3600,
# "namespacestore_dict": {"aws": [(1, None)]},
# },
# "placement_policy": {
# "tiers": [
# {"backingStores": [constants.DEFAULT_NOOBAA_BACKINGSTORE]}
# ]
# },
# },
# ),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please do not comment the core functions, instead handle it in the test script.

from ocs_ci.helpers.helpers import default_storage_class
from ocs_ci.ocs.bucket_utils import s3_put_object, retrieve_verification_mode

# from ocs_ci.ocs.resources.fips import check_fips_enabled
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can remove the comment code

from ocs_ci.ocs.utils import get_pod_name_by_pattern
from ocs_ci.helpers.helpers import default_storage_class
from ocs_ci.ocs.bucket_utils import s3_put_object, retrieve_verification_mode

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can delete this empty line

# S3 bucket
self.setup_s3_bucket(mcg_obj=mcg_obj, bucket_factory=bucket_factory)

def setup_amq_kafka_notification(self, bucket_factory):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are not calling it in setup

tests/e2e/system_test/test_graceful_nodes_shutdown.py Outdated Show resolved Hide resolved
multi_obc_lifecycle_factory(num_of_obcs=20, bulk=True, measure=False)

# OCP Workloads
start_ocp_workload(workloads_list=["registry", "monitoring"], run_in_bg=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you will have to handle the teardown of ocp workloads too

tests/e2e/system_test/test_graceful_nodes_shutdown.py Outdated Show resolved Hide resolved
@PrasadDesala PrasadDesala added the team/e2e E2E team related issues/PRs label Sep 21, 2023
tree_output = ct_pod.exec_ceph_cmd(ceph_cmd="ceph osd tree")
logger.info("ceph osd tree output:")
logger.info(tree_output)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@avd-sagare avd-sagare force-pushed the System_test-Shutdown_Restart branch from 85bd0fd to 1177594 Compare November 9, 2023 08:37
@pull-request-size pull-request-size bot added size/L PR that changes 100-499 lines size/XXL and removed size/XL size/L PR that changes 100-499 lines labels Nov 9, 2023
@avd-sagare avd-sagare force-pushed the System_test-Shutdown_Restart branch from c45c408 to f4d7a20 Compare November 17, 2023 14:16
@pull-request-size pull-request-size bot added size/L PR that changes 100-499 lines and removed size/XXL labels Nov 17, 2023
@avd-sagare avd-sagare force-pushed the System_test-Shutdown_Restart branch from f4d7a20 to 5b706fe Compare November 20, 2023 05:43
tests/e2e/system_test/test_graceful_nodes_shutdown.py Outdated Show resolved Hide resolved
Comment on lines 54 to 60
"""
nodes = get_nodes()
for node in nodes:
assert (
node.get()["status"]["allocatable"]["hugepages-2Mi"] == "64Mi"
), f"Huge pages is not applied on {node.name}"
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this code is not needed, please remove it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is needed. On testing cluster it was not enabled so for testing purpose its commented.


def teardown():
logger.info("cleanup the environment")
""" cleanup logging workload """
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a single line comment. pls consider adding with hash

logger.info("Logging is configured")
uninstall_cluster_logging()

""" cleanup monitoring workload """
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above


request.addfinalizer(teardown)

logger.info("Starting the test setup")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add the setup related func to a setup func?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we are not achieving optimisation or resource saving thne its ok to be as it is.

Comment on lines 158 to 162
##################################### non encrypted pvc (ne_pvc)
create + clone
non-encrypted block/fs pvc
Returns:
pvc object
pod object
file_name
origin md5sum
snapshot
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please fix doc string format issues

Comment on lines 333 to 335
if not check_if_monitoring_stack_exists():
assert "monitoring workload not started after reboot"
if not check_if_registry_stack_exists():
assert "registry workload not started after reboot"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the workload exists does not mean that the workload is running fine without any issues after reboot

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. what additional steps you suggest to check?

Sanity().health_check(tries=60)

self.validate_data_integrity()
self.validate_snapshot_restore(snapshot_restore_factory)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where are we checking the below step-
PVC expansion is possible on the restored snapshot- validate_pvc_expansion

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

calling validate_pvc_expansion inside validate_snapshot_restore


self.validate_data_integrity()
self.validate_snapshot_restore(snapshot_restore_factory)
validate_mcg_bg_features(skip_any_features=["expiration", "rgw kafka", "nsfs"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where are checking for the below steps-

  • get operation on the bucket gets the data that was stored before shutdown
  • delete operation on the bucket can delete the data that was stored before shutdown
  • put and list operations work correctly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

S3 operations on NSFS bucket on files is not implemented yet.

@avd-sagare avd-sagare force-pushed the System_test-Shutdown_Restart branch from 5b706fe to fed1568 Compare November 20, 2023 12:54
Avdhoot and others added 16 commits July 22, 2024 16:01
Signed-off-by: Avdhoot <[email protected]>
Signed-off-by: Avdhoot <[email protected]>
Signed-off-by: Avdhoot <[email protected]>
Signed-off-by: Avdhoot <[email protected]>
Signed-off-by: Avdhoot <[email protected]>
Signed-off-by: Avdhoot <[email protected]>
Signed-off-by: Avdhoot <[email protected]>
Signed-off-by: Avdhoot <[email protected]>
Signed-off-by: Avdhoot <[email protected]>
Signed-off-by: Avdhoot <[email protected]>
Signed-off-by: Avdhoot <[email protected]>
Signed-off-by: Avdhoot <[email protected]>
Signed-off-by: Avdhoot <[email protected]>
Signed-off-by: Avdhoot <[email protected]>
@avd-sagare avd-sagare force-pushed the System_test-Shutdown_Restart branch from 39a0e2b to 4fa1e16 Compare July 22, 2024 10:32
Signed-off-by: Avdhoot <[email protected]>
@pull-request-size pull-request-size bot added size/L PR that changes 100-499 lines and removed size/XL labels Jul 22, 2024
@avd-sagare
Copy link
Contributor Author

@avd-sagare
Copy link
Contributor Author

Copy link

openshift-ci bot commented Jul 24, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: avd-sagare, mashetty330, PrasadDesala

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@PrasadDesala PrasadDesala merged commit ba411e7 into red-hat-storage:master Jul 24, 2024
5 of 6 checks passed
amr1ta pushed a commit to amr1ta/ocs-ci that referenced this pull request Jul 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm size/L PR that changes 100-499 lines team/e2e E2E team related issues/PRs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants