Improvements for MG collections #11161

petr-balogh · 2025-01-17T16:57:49Z

Several improvements in MG logs like preventing running MG over and over
when it's still failing or getting timeouted.

Collecting OCP logs for Ecosystem tests like upgrade decorated with purple
squad.

Not collect logs again at the end of execution on success when it was
collected at least once during execution by some failed test.

petr-balogh · 2025-01-17T17:42:51Z

Trying to verify here:
https://url.corp.redhat.com/1f1ea06

petr-balogh · 2025-01-17T20:20:48Z

New verification triggered here:
https://url.corp.redhat.com/108c27f

petr-balogh · 2025-01-20T16:57:44Z

Verification job:
https://url.corp.redhat.com/95eab59

dahorak

LGTM

petr-balogh · 2025-01-21T14:38:48Z

Verification job:
https://url.corp.redhat.com/d37eb8b

ebenahar · 2025-01-22T08:35:10Z

ocs_ci/framework/pytest_customization/ocscilib.py

+            "mcg",
+            "purple_squad",
+        }
+        # For every failure in MG we are trying to extend next attempt by 20 minutes


this would be up to 2 hours wait (20 min + 40 min + 60 min), in case the default max_mg_fail_attempts is being used.
Isn't it too much time?

We see sometime it's really not enough the 60 mins - so I am giving a chance to try to give more time to collect logs - to let us analyze and check MG if we have anything. If it will fail 3 times it will not do anything more which is still better than previously on example run it spent more than 24 hours on collecting logs which always time outed and we didn't get any log. So I was thinking that this increased time might help us to maybe get some logs from MG to identify the issue why it's taking longer than used to. We can reduce number of failed MG count to 2 only.

as long as we don't try again to collect in any other later test case, except for once before teardown, this should be fine

if we reach our max attempts we do not collect it again and skipping whole collection.

ebenahar · 2025-01-22T08:36:12Z

ocs_ci/ocs/utils.py

+    max_mg_fail_attempts = config.REPORTING.get("max_mg_fail_attempts")
+    if skip_after_max_fail:
+        with mg_lock:
+            if mg_fail_count > max_mg_fail_attempts:


are we deleting the MG dir structure in case of timeout failure?

No, it's still contain some useful data produced by MG even if it time out.

should we maybe delete the dir structure of failed MG collection, in case MG collection succeeds in a later attempt?

there is a directory for every failed test case - we are not doing any retry for one failed connection we just continue to next test case - and next time another failure occur we are trying collect mg for another test failure. So new directory is created for MG and old one still have some valuable information - also only one of MG can fail - OCP or ODF - and we also collecting Noobaa logs to same directory - so really nothing to delete as all collected data are valuable if we have some.

petr-balogh · 2025-01-22T12:10:29Z

New verification:
https://url.corp.redhat.com/0f2103c

petr-balogh · 2025-01-22T20:21:26Z

Verification job:
https://url.corp.redhat.com/0a180da

petr-balogh · 2025-01-31T16:52:41Z

Verification job: https://url.corp.redhat.com/05642f2

Fixes: red-hat-storage#10526 Fixes: red-hat-storage#11159 Several improvements in MG logs like preventing running MG over and over when it's still failing or getting timeouted. Collecting OCP logs for Ecosystem tests like upgrade decorated with purple squad. Not collect logs again at the end of execution on success when it was collected at least once during execution by some failed test. Signed-off-by: Petr Balogh <[email protected]>

openshift-ci · 2025-02-03T17:00:20Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: ebenahar, OdedViner, petr-balogh

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

petr-balogh requested a review from a team as a code owner January 17, 2025 16:57

pull-request-size bot added the size/L PR that changes 100-499 lines label Jan 17, 2025

petr-balogh force-pushed the improvmentes_for_mg_collections branch 3 times, most recently from c368dd8 to eabf343 Compare January 17, 2025 17:39

petr-balogh force-pushed the improvmentes_for_mg_collections branch from eabf343 to 752cfd4 Compare January 17, 2025 20:18

petr-balogh force-pushed the improvmentes_for_mg_collections branch 2 times, most recently from 0befd70 to bfe7cf7 Compare January 20, 2025 16:56

petr-balogh force-pushed the improvmentes_for_mg_collections branch 2 times, most recently from 2b07848 to a272f62 Compare January 21, 2025 14:35

dahorak previously approved these changes Jan 21, 2025

View reviewed changes

openshift-ci bot assigned dahorak Jan 21, 2025

openshift-ci bot added the lgtm label Jan 21, 2025

petr-balogh dismissed dahorak’s stale review via 8d56cdd January 21, 2025 16:36

petr-balogh force-pushed the improvmentes_for_mg_collections branch from a272f62 to 8d56cdd Compare January 21, 2025 16:36

openshift-ci bot removed the lgtm label Jan 21, 2025

ebenahar reviewed Jan 22, 2025

View reviewed changes

petr-balogh force-pushed the improvmentes_for_mg_collections branch from ad1d736 to cd5d9ca Compare January 23, 2025 22:27

petr-balogh requested a review from a team as a code owner January 23, 2025 22:27

petr-balogh force-pushed the improvmentes_for_mg_collections branch 2 times, most recently from 20c0450 to c71a8a8 Compare January 31, 2025 16:52

ebenahar previously approved these changes Feb 3, 2025

View reviewed changes

openshift-ci bot assigned ebenahar Feb 3, 2025

openshift-ci bot added the lgtm label Feb 3, 2025

petr-balogh dismissed ebenahar’s stale review via a29670c February 3, 2025 16:28

petr-balogh force-pushed the improvmentes_for_mg_collections branch from c71a8a8 to a29670c Compare February 3, 2025 16:28

openshift-ci bot removed the lgtm label Feb 3, 2025

OdedViner approved these changes Feb 3, 2025

View reviewed changes

openshift-ci bot assigned OdedViner Feb 3, 2025

openshift-ci bot added the lgtm label Feb 3, 2025

ebenahar approved these changes Feb 3, 2025

View reviewed changes

petr-balogh merged commit f02c6e4 into red-hat-storage:master Feb 3, 2025
6 of 7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improvements for MG collections #11161

Improvements for MG collections #11161

petr-balogh commented Jan 17, 2025 •

edited

Loading

petr-balogh commented Jan 17, 2025

petr-balogh commented Jan 17, 2025

petr-balogh commented Jan 20, 2025

dahorak left a comment

petr-balogh commented Jan 21, 2025

ebenahar Jan 22, 2025

petr-balogh Jan 22, 2025

ebenahar Jan 22, 2025

petr-balogh Jan 22, 2025

ebenahar Jan 22, 2025

petr-balogh Jan 22, 2025

ebenahar Jan 22, 2025

petr-balogh Jan 22, 2025

petr-balogh commented Jan 22, 2025

petr-balogh commented Jan 22, 2025

petr-balogh commented Jan 31, 2025

openshift-ci bot commented Feb 3, 2025

Improvements for MG collections #11161

Improvements for MG collections #11161

Conversation

petr-balogh commented Jan 17, 2025 • edited Loading

petr-balogh commented Jan 17, 2025

petr-balogh commented Jan 17, 2025

petr-balogh commented Jan 20, 2025

dahorak left a comment

Choose a reason for hiding this comment

petr-balogh commented Jan 21, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

petr-balogh commented Jan 22, 2025

petr-balogh commented Jan 22, 2025

petr-balogh commented Jan 31, 2025

openshift-ci bot commented Feb 3, 2025

petr-balogh commented Jan 17, 2025 •

edited

Loading