-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PERFSCALE-3428] Cluster health-check #727
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,7 @@ | ||
#!/bin/bash -e | ||
|
||
set -e | ||
oc adm wait-for-stable-cluster --minimum-stable-period=1m --timeout=5m | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 5m might be too short. for example, if we want to run kube-burner after an upgrade, I am seeing some of the pods (kubeapi) take > 5 min just to roll out. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The control plane pods are supposed to be rolled out once the upgrade is finished, another potential issue is the mcp rollout, which runs asynchronously of the Also, I think we're not running any workload after an upgrade completion There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have tried to set |
||
source ./egressip.sh | ||
|
||
ES_SERVER=${ES_SERVER=https://search-perfscale-dev-chmf5l4sh66lvxbnadi4bznl3a.us-west-2.es.amazonaws.com} | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,8 +1,8 @@ | ||
#!/usr/bin/env bash | ||
set -e | ||
oc adm wait-for-stable-cluster --minimum-stable-period=1m --timeout=5m | ||
|
||
source ./env.sh | ||
source ../../utils/common.sh | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let's consolidate into "echo" rather than "log" |
||
|
||
# Download k8s-netperf function | ||
download_netperf() { | ||
|
@@ -22,10 +22,10 @@ else | |
download_netperf "Downloading k8s-netperf." | ||
fi | ||
|
||
log "###############################################" | ||
log "Workload: ${WORKLOAD}" | ||
log "UUID: ${UUID}" | ||
log "###############################################" | ||
echo "###############################################" | ||
echo "Workload: ${WORKLOAD}" | ||
echo "UUID: ${UUID}" | ||
echo "###############################################" | ||
|
||
# Capture exit code of k8s-netperf | ||
set +e | ||
|
@@ -69,7 +69,7 @@ oc get pods -n netperf -o wide | |
oc get nodes -o wide | ||
oc get machineset -A || true | ||
|
||
log "Finished workload ${0} ${WORKLOAD}, exit code ($run)" | ||
echo "Finished workload ${0} ${WORKLOAD}, exit code ($run)" | ||
|
||
cat *.csv | ||
if [ $run -eq 0 ]; then | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious if we need this here, while we already have cluster health check enabled by default in ocp-wrapper driver code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's probably redundant, let me think in some use cases where cluster-health is not performed...