-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PERFSCALE-3428] Cluster health-check #727
base: master
Are you sure you want to change the base?
Conversation
Skipping CI for Draft Pull Request. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: rsevilla87 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
38ceaed
to
e661250
Compare
|
||
source ./env.sh | ||
source ../../utils/common.sh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's consolidate into "echo" rather than "log"
Signed-off-by: Raul Sevilla <[email protected]>
e661250
to
fe9ae32
Compare
@@ -1,6 +1,7 @@ | |||
#!/bin/bash -e | |||
|
|||
set -e | |||
oc adm wait-for-stable-cluster --minimum-stable-period=1m --timeout=5m |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious if we need this here, while we already have cluster health check enabled by default in ocp-wrapper driver code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's probably redundant, let me think in some use cases where cluster-health is not performed...
@@ -1,6 +1,7 @@ | |||
#!/bin/bash -e | |||
|
|||
set -e | |||
oc adm wait-for-stable-cluster --minimum-stable-period=1m --timeout=5m |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
5m might be too short. for example, if we want to run kube-burner after an upgrade, I am seeing some of the pods (kubeapi) take > 5 min just to roll out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The control plane pods are supposed to be rolled out once the upgrade is finished, another potential issue is the mcp rollout, which runs asynchronously of the oc adm upgrade
command.
Also, I think we're not running any workload after an upgrade completion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have tried to set --minimum-stable-period
to 1m and I saw kube-burner errors due to the cluster still stabilizing, but keeping it at 5min I have had better results, just my current experience...
Type of change
Description
Performing a cluster operator health check before running the actual workload,
I have considered adding this logic as a step in prow, but I ended up adding this logic here for several reasons like:
Please leave your thoughts here.
Related Tickets & Documents