[PERFSCALE-3428] Cluster health-check #727

rsevilla87 · 2024-10-09T11:46:31Z

Type of change

Description

Performing a cluster operator health check before running the actual workload,
I have considered adding this logic as a step in prow, but I ended up adding this logic here for several reasons like:

Too many jobs: a new chain will have to be added to every job
More maintenance: maintaining steps in prow is more difficult than here.

Please leave your thoughts here.

Related Tickets & Documents

Related Issue #
Closes #

openshift-ci · 2024-10-09T11:46:36Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

openshift-ci · 2024-10-09T11:46:38Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: rsevilla87

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [rsevilla87]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

rsevilla87 · 2024-10-09T11:48:11Z

workloads/network-perf-v2/run.sh


 source ./env.sh
-source ../../utils/common.sh


Let's consolidate into "echo" rather than "log"

Signed-off-by: Raul Sevilla <[email protected]>

rsevilla87 · 2024-10-09T13:13:56Z

/cc @afcollins @jtaleric @paigerube14 @mukrishn @dry923 @krishvoor @josecastillolema

vishnuchalla · 2024-10-09T13:28:07Z

workloads/kube-burner-ocp-wrapper/run.sh

@@ -1,6 +1,7 @@
 #!/bin/bash -e

 set -e
+oc adm wait-for-stable-cluster --minimum-stable-period=1m --timeout=5m


Just curious if we need this here, while we already have cluster health check enabled by default in ocp-wrapper driver code.

Yeah, it's probably redundant, let me think in some use cases where cluster-health is not performed...

jtaleric · 2024-10-10T17:09:11Z

workloads/kube-burner-ocp-wrapper/run.sh

@@ -1,6 +1,7 @@
 #!/bin/bash -e

 set -e
+oc adm wait-for-stable-cluster --minimum-stable-period=1m --timeout=5m


5m might be too short. for example, if we want to run kube-burner after an upgrade, I am seeing some of the pods (kubeapi) take > 5 min just to roll out.

The control plane pods are supposed to be rolled out once the upgrade is finished, another potential issue is the mcp rollout, which runs asynchronously of the oc adm upgrade command.

Also, I think we're not running any workload after an upgrade completion

I have tried to set --minimum-stable-period to 1m and I saw kube-burner errors due to the cluster still stabilizing, but keeping it at 5min I have had better results, just my current experience...

openshift-ci bot added the do-not-merge/work-in-progress label Oct 9, 2024

openshift-ci bot added the approved label Oct 9, 2024

rsevilla87 force-pushed the health-check branch from 38ceaed to e661250 Compare October 9, 2024 11:47

rsevilla87 commented Oct 9, 2024

View reviewed changes

Cluster health-check

fe9ae32

Signed-off-by: Raul Sevilla <[email protected]>

rsevilla87 force-pushed the health-check branch from e661250 to fe9ae32 Compare October 9, 2024 11:56

rsevilla87 marked this pull request as ready for review October 9, 2024 13:08

openshift-ci bot removed the do-not-merge/work-in-progress label Oct 9, 2024

openshift-ci bot requested review from shashank-boyapally and venkataanil October 9, 2024 13:08

rsevilla87 removed the approved label Oct 9, 2024

openshift-ci bot requested review from afcollins, dry923, jtaleric, krishvoor, mukrishn and paigerube14 October 9, 2024 13:13

vishnuchalla self-requested a review October 9, 2024 13:25

vishnuchalla reviewed Oct 9, 2024

View reviewed changes

jtaleric reviewed Oct 10, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PERFSCALE-3428] Cluster health-check #727

[PERFSCALE-3428] Cluster health-check #727

rsevilla87 commented Oct 9, 2024 •

edited

Loading

openshift-ci bot commented Oct 9, 2024

openshift-ci bot commented Oct 9, 2024

rsevilla87 Oct 9, 2024

rsevilla87 commented Oct 9, 2024 •

edited

Loading

vishnuchalla Oct 9, 2024

rsevilla87 Oct 9, 2024

jtaleric Oct 10, 2024

rsevilla87 Oct 10, 2024

jtaleric Oct 14, 2024

[PERFSCALE-3428] Cluster health-check #727

Are you sure you want to change the base?

[PERFSCALE-3428] Cluster health-check #727

Conversation

rsevilla87 commented Oct 9, 2024 • edited Loading

Type of change

Description

Related Tickets & Documents

openshift-ci bot commented Oct 9, 2024

openshift-ci bot commented Oct 9, 2024

rsevilla87 Oct 9, 2024

Choose a reason for hiding this comment

rsevilla87 commented Oct 9, 2024 • edited Loading

vishnuchalla Oct 9, 2024

Choose a reason for hiding this comment

rsevilla87 Oct 9, 2024

Choose a reason for hiding this comment

jtaleric Oct 10, 2024

Choose a reason for hiding this comment

rsevilla87 Oct 10, 2024

Choose a reason for hiding this comment

jtaleric Oct 14, 2024

Choose a reason for hiding this comment

rsevilla87 commented Oct 9, 2024 •

edited

Loading

rsevilla87 commented Oct 9, 2024 •

edited

Loading