Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PERFSCALE-3428] Cluster health-check #727

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions workloads/ingress-perf/run.sh
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
#!/bin/bash -e

set -e
oc adm wait-for-stable-cluster --minimum-stable-period=1m --timeout=5m

UUID=${UUID:-$(uuidgen)}
ES_INDEX=${ES_INDEX:-ingress-performance}
Expand Down
1 change: 1 addition & 0 deletions workloads/kube-burner-ocp-wrapper/run.sh
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
#!/bin/bash -e

set -e
oc adm wait-for-stable-cluster --minimum-stable-period=1m --timeout=5m
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious if we need this here, while we already have cluster health check enabled by default in ocp-wrapper driver code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's probably redundant, let me think in some use cases where cluster-health is not performed...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5m might be too short. for example, if we want to run kube-burner after an upgrade, I am seeing some of the pods (kubeapi) take > 5 min just to roll out.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The control plane pods are supposed to be rolled out once the upgrade is finished, another potential issue is the mcp rollout, which runs asynchronously of the oc adm upgrade command.

Also, I think we're not running any workload after an upgrade completion

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tried to set --minimum-stable-period to 1m and I saw kube-burner errors due to the cluster still stabilizing, but keeping it at 5min I have had better results, just my current experience...

source ./egressip.sh

ES_SERVER=${ES_SERVER=https://search-perfscale-dev-chmf5l4sh66lvxbnadi4bznl3a.us-west-2.es.amazonaws.com}
Expand Down
12 changes: 6 additions & 6 deletions workloads/network-perf-v2/run.sh
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
#!/usr/bin/env bash
set -e
oc adm wait-for-stable-cluster --minimum-stable-period=1m --timeout=5m

source ./env.sh
source ../../utils/common.sh
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's consolidate into "echo" rather than "log"


# Download k8s-netperf function
download_netperf() {
Expand All @@ -22,10 +22,10 @@ else
download_netperf "Downloading k8s-netperf."
fi

log "###############################################"
log "Workload: ${WORKLOAD}"
log "UUID: ${UUID}"
log "###############################################"
echo "###############################################"
echo "Workload: ${WORKLOAD}"
echo "UUID: ${UUID}"
echo "###############################################"

# Capture exit code of k8s-netperf
set +e
Expand Down Expand Up @@ -69,7 +69,7 @@ oc get pods -n netperf -o wide
oc get nodes -o wide
oc get machineset -A || true

log "Finished workload ${0} ${WORKLOAD}, exit code ($run)"
echo "Finished workload ${0} ${WORKLOAD}, exit code ($run)"

cat *.csv
if [ $run -eq 0 ]; then
Expand Down