-
Notifications
You must be signed in to change notification settings - Fork 170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test for Graceful node shutdown #8233
Test for Graceful node shutdown #8233
Conversation
4ab31cf
to
ae3545e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR validation on existing cluster
Cluster Name: asagare-414aws
Cluster Configuration:
PR Test Suite: system_test
PR Test Path: tests/e2e/system_test/test_graceful_nodes_shutdown.py
Additional Test Params:
OCP VERSION: 4.14
OCS VERSION: 4.14
tested against branch: master
Job UNSTABLE (some or all tests failed).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR validation on existing cluster
Cluster Name: asagare-414aws
Cluster Configuration:
PR Test Suite: system_test
PR Test Path: tests/e2e/system_test/test_graceful_nodes_shutdown.py
Additional Test Params:
OCP VERSION: 4.14
OCS VERSION: 4.14
tested against branch: master
Job UNSTABLE (some or all tests failed).
Blocked Validation due to https://bugzilla.redhat.com/show_bug.cgi?id=2227835 |
ae3545e
to
9bed8b5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR validation on existing cluster
Cluster Name: asagare-odf-hugep
Cluster Configuration:
PR Test Suite: system_test
PR Test Path: tests/e2e/system_test/test_graceful_nodes_shutdown.py
Additional Test Params:
OCP VERSION: 4.14
OCS VERSION: 4.14
tested against branch: master
Job FAILED (installation failed, tests not executed).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR validation on existing cluster
Cluster Name: asagare-odf-hugepages
Cluster Configuration:
PR Test Suite: system_test
PR Test Path: tests/e2e/system_test/test_graceful_nodes_shutdown.py
Additional Test Params:
OCP VERSION: 4.14
OCS VERSION: 4.14
tested against branch: master
Job UNSTABLE (some or all tests failed).
9bed8b5
to
9b61886
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR validation on existing cluster
Cluster Name: asagare-odf-hugepages
Cluster Configuration:
PR Test Suite: system_test
PR Test Path: tests/e2e/system_test/test_graceful_nodes_shutdown.py
Additional Test Params:
OCP VERSION: 4.14
OCS VERSION: 4.14
tested against branch: master
Job UNSTABLE (some or all tests failed).
9b61886
to
85bd0fd
Compare
# S3 bucket | ||
self.setup_s3_bucket(mcg_obj=mcg_obj, bucket_factory=bucket_factory) | ||
|
||
def setup_amq_kafka_notification(self, bucket_factory): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see if you can use existing amq common function instead of writing new one here.
ocs_ci/helpers/longevity_helpers.py
Outdated
# ( | ||
# "OC", | ||
# { | ||
# "interface": "OC", | ||
# "namespace_policy_dict": { | ||
# "type": "Cache", | ||
# "ttl": 3600, | ||
# "namespacestore_dict": {"aws": [(1, None)]}, | ||
# }, | ||
# "placement_policy": { | ||
# "tiers": [ | ||
# {"backingStores": [constants.DEFAULT_NOOBAA_BACKINGSTORE]} | ||
# ] | ||
# }, | ||
# }, | ||
# ), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please do not comment the core functions, instead handle it in the test script.
from ocs_ci.helpers.helpers import default_storage_class | ||
from ocs_ci.ocs.bucket_utils import s3_put_object, retrieve_verification_mode | ||
|
||
# from ocs_ci.ocs.resources.fips import check_fips_enabled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can remove the comment code
from ocs_ci.ocs.utils import get_pod_name_by_pattern | ||
from ocs_ci.helpers.helpers import default_storage_class | ||
from ocs_ci.ocs.bucket_utils import s3_put_object, retrieve_verification_mode | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can delete this empty line
# S3 bucket | ||
self.setup_s3_bucket(mcg_obj=mcg_obj, bucket_factory=bucket_factory) | ||
|
||
def setup_amq_kafka_notification(self, bucket_factory): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are not calling it in setup
multi_obc_lifecycle_factory(num_of_obcs=20, bulk=True, measure=False) | ||
|
||
# OCP Workloads | ||
start_ocp_workload(workloads_list=["registry", "monitoring"], run_in_bg=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you will have to handle the teardown of ocp workloads too
tree_output = ct_pod.exec_ceph_cmd(ceph_cmd="ceph osd tree") | ||
logger.info("ceph osd tree output:") | ||
logger.info(tree_output) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please cover other post verification steps from https://docs.google.com/document/d/129IreytPAPPQW1sov1JyQ6_SHXfgpVF4EtD9ZP9m10s/edit#bookmark=id.w6l0liw6o9rr
85bd0fd
to
1177594
Compare
c45c408
to
f4d7a20
Compare
f4d7a20
to
5b706fe
Compare
""" | ||
nodes = get_nodes() | ||
for node in nodes: | ||
assert ( | ||
node.get()["status"]["allocatable"]["hugepages-2Mi"] == "64Mi" | ||
), f"Huge pages is not applied on {node.name}" | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this code is not needed, please remove it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is needed. On testing cluster it was not enabled so for testing purpose its commented.
|
||
def teardown(): | ||
logger.info("cleanup the environment") | ||
""" cleanup logging workload """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a single line comment. pls consider adding with hash
logger.info("Logging is configured") | ||
uninstall_cluster_logging() | ||
|
||
""" cleanup monitoring workload """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above
|
||
request.addfinalizer(teardown) | ||
|
||
logger.info("Starting the test setup") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add the setup related func to a setup func?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if we are not achieving optimisation or resource saving thne its ok to be as it is.
##################################### non encrypted pvc (ne_pvc) | ||
create + clone | ||
non-encrypted block/fs pvc | ||
Returns: | ||
pvc object | ||
pod object | ||
file_name | ||
origin md5sum | ||
snapshot | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please fix doc string format issues
if not check_if_monitoring_stack_exists(): | ||
assert "monitoring workload not started after reboot" | ||
if not check_if_registry_stack_exists(): | ||
assert "registry workload not started after reboot" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the workload exists does not mean that the workload is running fine without any issues after reboot
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. what additional steps you suggest to check?
Sanity().health_check(tries=60) | ||
|
||
self.validate_data_integrity() | ||
self.validate_snapshot_restore(snapshot_restore_factory) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where are we checking the below step-
PVC expansion is possible on the restored snapshot- validate_pvc_expansion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
calling validate_pvc_expansion inside validate_snapshot_restore
|
||
self.validate_data_integrity() | ||
self.validate_snapshot_restore(snapshot_restore_factory) | ||
validate_mcg_bg_features(skip_any_features=["expiration", "rgw kafka", "nsfs"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where are checking for the below steps-
- get operation on the bucket gets the data that was stored before shutdown
- delete operation on the bucket can delete the data that was stored before shutdown
- put and list operations work correctly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
S3 operations on NSFS bucket on files is not implemented yet.
5b706fe
to
fed1568
Compare
Signed-off-by: Avdhoot <[email protected]> Signed-off-by: Avdhoot <[email protected]>
Signed-off-by: Avdhoot <[email protected]> Signed-off-by: Avdhoot <[email protected]>
Signed-off-by: Avdhoot <[email protected]> Signed-off-by: Avdhoot <[email protected]>
Signed-off-by: Avdhoot <[email protected]> Signed-off-by: Avdhoot <[email protected]>
Signed-off-by: Avdhoot <[email protected]> Signed-off-by: Avdhoot <[email protected]>
Signed-off-by: Avdhoot <[email protected]> Signed-off-by: Avdhoot <[email protected]>
Signed-off-by: Avdhoot <[email protected]>
Signed-off-by: Avdhoot <[email protected]>
Signed-off-by: Avdhoot <[email protected]>
Signed-off-by: Avdhoot <[email protected]>
Signed-off-by: Avdhoot <[email protected]>
Signed-off-by: Avdhoot <[email protected]>
Signed-off-by: Avdhoot <[email protected]>
Signed-off-by: Avdhoot <[email protected]>
Signed-off-by: Avdhoot <[email protected]>
Signed-off-by: Avdhoot <[email protected]>
39a0e2b
to
4fa1e16
Compare
Signed-off-by: Avdhoot <[email protected]>
Tested PR after restoring BS size to 17GB- |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: avd-sagare, mashetty330, PrasadDesala The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Signed-off-by: Avdhoot <[email protected]>
Test to verify that data is accessible and uncorrupted as well as
operational cluster and osd are up after graceful nodes shutdown