Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

router-perf reschedule monitoring not fully completed after test exit #552

Open
qiliRedHat opened this issue Mar 31, 2023 · 0 comments
Open
Assignees
Labels
bug Something isn't working

Comments

@qiliRedHat
Copy link
Collaborator

Run router-perf-v2 test, the test reschedule monitoring stack to infra nodes before test finish.

reschedule_monitoring_stack(){
[[ $(oc get cm -n openshift-monitoring cluster-monitoring-config --ignore-not-found --no-headers | wc -l) == 0 ]] && return 0
log "Re-scheduling monitoring stack to ${1} nodes"
oc get cm -n openshift-monitoring cluster-monitoring-config -o yaml | sed "s#kubernetes.io/\w*#kubernetes.io/${1}#g" | oc apply -f -
# cluster-monitoring-operator can take some time to reconcile the changes
sleep 1m
oc rollout status -n openshift-monitoring deploy/cluster-monitoring-operator
oc rollout status -n openshift-monitoring sts/prometheus-k8s
}

Mar 16 08:04:12 UTC 2023 Re-scheduling monitoring stack to infra nodes�[0m
configmap/cluster-monitoring-config configured
deployment "cluster-monitoring-operator" successfully rolled out
statefulset rolling update complete 2 pods at revision prometheus-k8s-689d496649...

After the router-perf-v2 test script ingress-performance.sh exit, run Cerberus there are still some failed containers.

2023-03-16 08:06:34,743 [INFO] Iteration 1: Failed pods and components
2023-03-16 08:06:34,743 [INFO] openshift-monitoring: ['alertmanager-main-0', 'prometheus-adapter-65f8bdf5d5-t9tnv', 'prometheus-k8s-1', 'prometheus-adapter-65f8bdf5d5-6pb5r']
2023-03-16 08:06:34,743 [INFO] Failed containers in alertmanager-main-0: ['alertmanager', 'alertmanager-proxy', 'config-reloader', 'kube-rbac-proxy', 'kube-rbac-proxy-metric', 'prom-label-proxy']
2023-03-16 08:06:34,743 [INFO] Failed containers in prometheus-adapter-65f8bdf5d5-6pb5r: ['prometheus-adapter']
2023-03-16 08:06:34,743 [INFO] Failed containers in prometheus-adapter-65f8bdf5d5-t9tnv: ['prometheus-adapter']
2023-03-16 08:06:34,743 [INFO] Failed containers in prometheus-k8s-1: ['config-reloader', 'kube-rbac-proxy', 'kube-rbac-proxy-thanos', 'prometheus', 'prometheus-proxy', 'thanos-sidecar', 'init-config-reloader']

[container "alertmanager" in pod "alertmanager-main-0" is waiting to start: ContainerCreating, previous terminated container "alertmanager" in pod "alertmanager-main-0" not found, previous terminated container "config-reloader" in pod "alertmanager-main-0" not found, container "config-reloader" in pod "alertmanager-main-0" is waiting to start: ContainerCreating, container "alertmanager-proxy" in pod "alertmanager-main-0" is waiting to start: ContainerCreating, previous terminated container "alertmanager-proxy" in pod "alertmanager-main-0" not found, container "kube-rbac-proxy" in pod "alertmanager-main-0" is waiting to start: ContainerCreating, previous terminated container "kube-rbac-proxy" in pod "alertmanager-main-0" not found, container "kube-rbac-proxy-metric" in pod "alertmanager-main-0" is waiting to start: ContainerCreating, previous terminated container "kube-rbac-proxy-metric" in pod "alertmanager-main-0" not found, previous terminated container "prom-label-proxy" in pod "alertmanager-main-0" not found, container "prom-label-proxy" in pod "alertmanager-main-0" is waiting to start: ContainerCreating], one or more errors occurred while gathering container data for pod prometheus-adapter-78d8b6cd95-7rdwf:
pods "prometheus-adapter-78d8b6cd95-7rdwf" not found]

Those containers needs some time to be up and running, I hope to add some check to ensure they are up and running before ingress-performance.sh exit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

7 participants