Skip to content

Commit b2e4117

Browse files
authored
Add alert for when ClusterImport CR is in error state (#5)
* Add alert for when ClusterImport CR is in error state * 5 mins * 10 mins
1 parent 6cb2d14 commit b2e4117

File tree

1 file changed

+17
-2
lines changed

1 file changed

+17
-2
lines changed

charts/controlplane-operations/alerts/controlplane-remote.yaml

Lines changed: 17 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,12 +6,27 @@ groups:
66
expr: >
77
kube_customresource_status_state{customresource_kind="Update",customresource_group="argora.cloud.sap",state=~"Error"}
88
== 1
9-
for: {{ dig "ArgoraUpdateInError" "for" "1m" .Values.prometheusRules }}
9+
for: {{ dig "ArgoraUpdateInError" "for" "10m" .Values.prometheusRules }}
1010
labels:
1111
severity: {{ dig "ArgoraUpdateInError" "severity" "warning" .Values.prometheusRules }}
1212
playbook: https://github.com/cobaltcore-dev/controlplane-operations/playbooks/ArgoraUpdateInError.md
1313
{{ include "controlplane-operations.additionalRuleLabels" . }}
1414
annotations:
15-
description: "Argora Update CR status is in Error state for more than 1 minute."
15+
description: "Argora Update CR status is in Error state for more than 10 minutes."
1616
summary: "Update CR in Error state."
1717
{{- end }}
18+
19+
{{- if not (.Values.prometheusRules.disabled.ArgoraClusterImportInError | default false) }}
20+
- alert: ArgoraClusterImportInError
21+
expr: >
22+
kube_customresource_status_state{customresource_kind="ClusterImport",customresource_group="argora.cloud.sap",state=~"Error"}
23+
== 1
24+
for: {{ dig "ArgoraClusterImportInError" "for" "10m" .Values.prometheusRules }}
25+
labels:
26+
severity: {{ dig "ArgoraClusterImportInError" "severity" "warning" .Values.prometheusRules }}
27+
playbook: https://github.com/cobaltcore-dev/controlplane-operations/playbooks/ArgoraClusterImportInError.md
28+
{{ include "controlplane-operations.additionalRuleLabels" . }}
29+
annotations:
30+
description: "Argora ClusterImport CR status is in Error state for more than 10 minutes."
31+
summary: "ClusterImport CR in Error state."
32+
{{- end }}

0 commit comments

Comments
 (0)