Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(helm)!: Update chart kube-prometheus-stack to 67.11.0 - autoclosed #1144

Closed
wants to merge 1 commit into from

Conversation

renovate[bot]
Copy link
Contributor

@renovate renovate bot commented Dec 16, 2024

This PR contains the following updates:

Package Update Change
kube-prometheus-stack (source) major 62.7.0 -> 67.11.0

Warning

Some dependencies could not be looked up. Check the Dependency Dashboard for more information.


Release Notes

prometheus-community/helm-charts (kube-prometheus-stack)

v67.11.0

Compare Source

kube-prometheus-stack collects Kubernetes manifests, Grafana dashboards, and Prometheus rules combined with documentation and scripts to provide easy to operate end-to-end Kubernetes cluster monitoring with Prometheus using the Prometheus Operator.

What's Changed

Full Changelog: prometheus-community/helm-charts@kube-prometheus-stack-67.10.0...kube-prometheus-stack-67.11.0

v67.10.0

Compare Source

kube-prometheus-stack collects Kubernetes manifests, Grafana dashboards, and Prometheus rules combined with documentation and scripts to provide easy to operate end-to-end Kubernetes cluster monitoring with Prometheus using the Prometheus Operator.

What's Changed

New Contributors

Full Changelog: prometheus-community/helm-charts@prometheus-snmp-exporter-6.0.0...kube-prometheus-stack-67.10.0

v67.9.0

Compare Source

kube-prometheus-stack collects Kubernetes manifests, Grafana dashboards, and Prometheus rules combined with documentation and scripts to provide easy to operate end-to-end Kubernetes cluster monitoring with Prometheus using the Prometheus Operator.

What's Changed

Full Changelog: prometheus-community/helm-charts@prometheus-stackdriver-exporter-4.7.1...kube-prometheus-stack-67.9.0

v67.8.0

Compare Source

v67.7.0

Compare Source

v67.6.0

Compare Source

v67.5.0

Compare Source

kube-prometheus-stack collects Kubernetes manifests, Grafana dashboards, and Prometheus rules combined with documentation and scripts to provide easy to operate end-to-end Kubernetes cluster monitoring with Prometheus using the Prometheus Operator.

What's Changed

Full Changelog: prometheus-community/helm-charts@prom-label-proxy-0.10.1...kube-prometheus-stack-67.5.0

v67.4.0

Compare Source

v67.3.1

Compare Source

v67.3.0

Compare Source

v67.2.1

Compare Source

v67.2.0

Compare Source

v67.1.0

Compare Source

v67.0.0

Compare Source

v66.7.1

Compare Source

v66.7.0

Compare Source

v66.6.0

Compare Source

v66.5.0

Compare Source

v66.4.0

Compare Source

v66.3.1

Compare Source

v66.3.0

Compare Source

v66.2.2

Compare Source

v66.2.1

Compare Source

kube-prometheus-stack collects Kubernetes manifests, Grafana dashboards, and Prometheus rules combined with documentation and scripts to provide easy to operate end-to-end Kubernetes cluster monitoring with Prometheus using the Prometheus Operator.

What's Changed

Full Changelog: prometheus-community/helm-charts@kube-prometheus-stack-66.2.0...kube-prometheus-stack-66.2.1

v66.2.0

Compare Source

kube-prometheus-stack collects Kubernetes manifests, Grafana dashboards, and Prometheus rules combined with documentation and scripts to provide easy to operate end-to-end Kubernetes cluster monitoring with Prometheus using the Prometheus Operator.

What's Changed

Full Changelog: prometheus-community/helm-charts@kube-prometheus-stack-66.1.1...kube-prometheus-stack-66.2.0

v66.1.1

Compare Source

kube-prometheus-stack collects Kubernetes manifests, Grafana dashboards, and Prometheus rules combined with documentation and scripts to provide easy to operate end-to-end Kubernetes cluster monitoring with Prometheus using the Prometheus Operator.

What's Changed

New Contributors

Full Changelog: prometheus-community/helm-charts@prometheus-25.30.0...kube-prometheus-stack-66.1.1

v66.1.0

Compare Source

kube-prometheus-stack collects Kubernetes manifests, Grafana dashboards, and Prometheus rules combined with documentation and scripts to provide easy to operate end-to-end Kubernetes cluster monitoring with Prometheus using the Prometheus Operator.

What's Changed

Full Changelog: prometheus-community/helm-charts@kube-state-metrics-5.27.0...kube-prometheus-stack-66.1.0

v66.0.0

Compare Source

kube-prometheus-stack collects Kubernetes manifests, Grafana dashboards, and Prometheus rules combined with documentation and scripts to provide easy to operate end-to-end Kubernetes cluster monitoring with Prometheus using the Prometheus Operator.

What's Changed

Full Changelog: prometheus-community/helm-charts@prometheus-operator-crds-16.0.0...kube-prometheus-stack-66.0.0

v65.8.1

Compare Source

kube-prometheus-stack collects Kubernetes manifests, Grafana dashboards, and Prometheus rules combined with documentation and scripts to provide easy to operate end-to-end Kubernetes cluster monitoring with Prometheus using the Prometheus Operator.

What's Changed

New Contributors

Full Changelog: prometheus-community/helm-charts@kube-prometheus-stack-65.8.0...kube-prometheus-stack-65.8.1

v65.8.0

Compare Source

kube-prometheus-stack collects Kubernetes manifests, Grafana dashboards, and Prometheus rules combined with documentation and scripts to provide easy to operate end-to-end Kubernetes cluster monitoring with Prometheus using the Prometheus Operator.

What's Changed

Full Changelog: prometheus-community/helm-charts@prometheus-mongodb-exporter-3.9.0...kube-prometheus-stack-65.8.0

v65.7.0

Compare Source

kube-prometheus-stack collects Kubernetes manifests, Grafana dashboards, and Prometheus rules combined with documentation and scripts to provide easy to operate end-to-end Kubernetes cluster monitoring with Prometheus using the Prometheus Operator.

What's Changed

Full Changelog: prometheus-community/helm-charts@prometheus-node-exporter-4.42.0...kube-prometheus-stack-65.7.0

v65.6.0

Compare Source

kube-prometheus-stack collects Kubernetes manifests, Grafana dashboards, and Prometheus rules combined with documentation and scripts to provide easy to operate end-to-end Kubernetes cluster monitoring with Prometheus using the Prometheus Operator.

What's Changed

Full Changelog: prometheus-community/helm-charts@prometheus-25.29.0...kube-prometheus-stack-65.6.0

v65.5.1

Compare Source

kube-prometheus-stack collects Kubernetes manifests, Grafana dashboards, and Prometheus rules combined with documentation and scripts to provide easy to operate end-to-end Kubernetes cluster monitoring with Prometheus using the Prometheus Operator.

What's Changed

New Contributors

Full Changelog: prometheus-community/helm-charts@prometheus-ipmi-exporter-0.5.0...kube-prometheus-stack-65.5.1

v65.5.0

Compare Source

kube-prometheus-stack collects Kubernetes manifests, Grafana dashboards, and Prometheus rules combined with documentation and scripts to provide easy to operate end-to-end Kubernetes cluster monitoring with Prometheus using the Prometheus Operator.

What's Changed

Full Changelog: prometheus-community/helm-charts@kube-prometheus-stack-65.4.0...kube-prometheus-stack-65.5.0

v65.4.0

Compare Source

kube-prometheus-stack collects Kubernetes manifests, Grafana dashboards, and Prometheus rules combined with documentation and scripts to provide easy to operate end-to-end Kubernetes cluster monitoring with Prometheus using the Prometheus Operator.

What's Changed

Full Changelog: prometheus-community/helm-charts@prometheus-node-exporter-4.40.0...kube-prometheus-stack-65.4.0

v65.3.2

Compare Source

kube-prometheus-stack collects Kubernetes manifests, Grafana dashboards, and Prometheus rules combined with documentation and scripts to provide easy to operate end-to-end Kubernetes cluster monitoring with Prometheus using the Prometheus Operator.

What's Changed

Full Changelog: prometheus-community/helm-charts@alertmanager-snmp-notifier-0.4.0...kube-prometheus-stack-65.3.2

v65.3.1

Compare Source

kube-prometheus-stack collects Kubernetes manifests, Grafana dashboards, and Prometheus rules combined with documentation and scripts to provide easy to operate end-to-end Kubernetes cluster monitoring with Prometheus using the Prometheus Operator.

What's Changed

New Contributors

Full Changelog: prometheus-community/helm-charts@prometheus-operator-admission-webhook-0.16.0...kube-prometheus-stack-65.3.1

v65.3.0

Compare Source

kube-prometheus-stack collects Kubernetes manifests, Grafana dashboards, and Prometheus rules combined with documentation and scripts to provide easy to operate end-to-end Kubernetes cluster monitoring with Prometheus using the Prometheus Operator.

What's Changed

Full Changelog: prometheus-community/helm-charts@prometheus-mongodb-exporter-3.7.2...kube-prometheus-stack-65.3.0

v65.2.0

Compare Source

kube-prometheus-stack collects Kubernetes manifests, Grafana dashboards, and Prometheus rules combined with documentation and scripts to provide easy to operate end-to-end Kubernetes cluster monitoring with Prometheus using the Prometheus Operator.

What's Changed

Full Changelog: prometheus-community/helm-charts@kube-state-metrics-5.26.0...kube-prometheus-stack-65.2.0

v65.1.1

Compare Source

kube-prometheus-stack collects Kubernetes manifests, Grafana dashboards, and Prometheus rules combined with documentation and scripts to provide easy to operate end-to-end Kubernetes cluster monitoring with Prometheus using the Prometheus Operator.

What's Changed

New Contributors

Full Changelog: prometheus-community/helm-charts@prometheus-stackdriver-exporter-4.6.1...kube-prometheus-stack-65.1.1

v65.1.0

Compare Source

kube-prometheus-stack collects Kubernetes manifests, Grafana dashboards, and Prometheus rules combined with documentation and scripts to provide easy to operate end-to-end Kubernetes cluster monitoring with Prometheus using the Prometheus Operator.

What's Changed

Full Changelog: prometheus-community/helm-charts@prometheus-sql-exporter-0.1.1...kube-prometheus-stack-65.1.0

v65.0.0

Compare Source

kube-prometheus-stack collects Kubernetes manifests, Grafana dashboards, and Prometheus rules combined with documentation and scripts to provide easy to operate end-to-end Kubernetes cluster monitoring with Prometheus using the Prometheus Operator.

What's Changed

Full Changelog: prometheus-community/helm-charts@prometheus-fastly-exporter-0.5.0...kube-prometheus-stack-65.0.0

v64.0.0

Compare Source

kube-prometheus-stack collects Kubernetes manifests, Grafana dashboards, and Prometheus rules combined with documentation and scripts to provide easy to operate end-to-end Kubernetes cluster monitoring with Prometheus using the Prometheus Operator.

What's Changed

Full Changelog: prometheus-community/helm-charts@prometheus-snmp-exporter-5.5.1...kube-prometheus-stack-64.0.0

v63.1.0

Compare Source

kube-prometheus-stack collects Kubernetes manifests, Grafana dashboards, and Prometheus rules combined with documentation and scripts to provide easy to operate end-to-end Kubernetes cluster monitoring with Prometheus using the Prometheus Operator.

What's Changed

New Contributors

Full Changelog: prometheus-community/helm-charts@kube-prometheus-stack-63.0.0...kube-prometheus-stack-63.1.0

v63.0.0

Compare Source

kube-prometheus-stack collects Kubernetes manifests, Grafana dashboards, and Prometheus rules combined with documentation and scripts to provide easy to operate end-to-end Kubernetes cluster monitoring with Prometheus using the Prometheus Operator.

What's Changed

New Contributors

Full Changelog: prometheus-community/helm-charts@prometheus-conntrack-stats-exporter-0.5.11...kube-prometheus-stack-63.0.0


Configuration

📅 Schedule: Branch creation - "after 9am,before 5pm" (UTC), Automerge - At any time (no schedule defined).

🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 Ignore: Close this PR and you won't be reminded about this update again.


  • If you want to rebase/retry this PR, check this box

This PR was generated by Mend Renovate. View the repository job log.

Copy link

github-actions bot commented Dec 16, 2024

--- HelmRelease: monitoring/kube-prometheus-stack DaemonSet: monitoring/kube-prometheus-stack-prometheus-node-exporter

+++ HelmRelease: monitoring/kube-prometheus-stack DaemonSet: monitoring/kube-prometheus-stack-prometheus-node-exporter

@@ -95,12 +95,25 @@

           mountPath: /host/root
           mountPropagation: HostToContainer
           readOnly: true
       hostNetwork: true
       hostPID: true
       hostIPC: false
+      affinity:
+        nodeAffinity:
+          requiredDuringSchedulingIgnoredDuringExecution:
+            nodeSelectorTerms:
+            - matchExpressions:
+              - key: eks.amazonaws.com/compute-type
+                operator: NotIn
+                values:
+                - fargate
+              - key: type
+                operator: NotIn
+                values:
+                - virtual-kubelet
       nodeSelector:
         kubernetes.io/os: linux
       tolerations:
       - effect: NoExecute
         key: CriticalAddonsOnly
         operator: Exists
--- HelmRelease: monitoring/kube-prometheus-stack Deployment: monitoring/kube-prometheus-stack-kube-state-metrics

+++ HelmRelease: monitoring/kube-prometheus-stack Deployment: monitoring/kube-prometheus-stack-kube-state-metrics

@@ -44,13 +44,13 @@

       - name: kube-state-metrics
         args:
         - --port=8080
         - --resources=certificatesigningrequests,configmaps,cronjobs,daemonsets,deployments,endpoints,horizontalpodautoscalers,ingresses,jobs,leases,limitranges,mutatingwebhookconfigurations,namespaces,networkpolicies,nodes,persistentvolumeclaims,persistentvolumes,poddisruptionbudgets,pods,replicasets,replicationcontrollers,resourcequotas,secrets,services,statefulsets,storageclasses,validatingwebhookconfigurations,volumeattachments
         - --metric-labels-allowlist=persistentvolumeclaims=[*]
         imagePullPolicy: IfNotPresent
-        image: registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.13.0
+        image: registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.14.0
         ports:
         - containerPort: 8080
           name: http
         livenessProbe:
           failureThreshold: 3
           httpGet:
@@ -64,13 +64,13 @@

           timeoutSeconds: 5
         readinessProbe:
           failureThreshold: 3
           httpGet:
             httpHeaders: null
             path: /readyz
-            port: 8080
+            port: 8081
             scheme: HTTP
           initialDelaySeconds: 5
           periodSeconds: 10
           successThreshold: 1
           timeoutSeconds: 5
         resources: {}
--- HelmRelease: monitoring/kube-prometheus-stack Deployment: monitoring/kube-prometheus-stack-operator

+++ HelmRelease: monitoring/kube-prometheus-stack Deployment: monitoring/kube-prometheus-stack-operator

@@ -31,23 +31,25 @@

         app: kube-prometheus-stack-operator
         app.kubernetes.io/name: kube-prometheus-stack-prometheus-operator
         app.kubernetes.io/component: prometheus-operator
     spec:
       containers:
       - name: kube-prometheus-stack
-        image: quay.io/prometheus-operator/prometheus-operator:v0.76.1
+        image: quay.io/prometheus-operator/prometheus-operator:v0.79.2
         imagePullPolicy: IfNotPresent
         args:
         - --kubelet-service=kube-system/kube-prometheus-stack-kubelet
+        - --kubelet-endpoints=true
+        - --kubelet-endpointslice=false
         - --localhost=127.0.0.1
-        - --prometheus-config-reloader=quay.io/prometheus-operator/prometheus-config-reloader:v0.76.1
+        - --prometheus-config-reloader=quay.io/prometheus-operator/prometheus-config-reloader:v0.79.2
         - --config-reloader-cpu-request=0
         - --config-reloader-cpu-limit=0
         - --config-reloader-memory-request=0
         - --config-reloader-memory-limit=0
-        - --thanos-default-base-image=quay.io/thanos/thanos:v0.36.1
+        - --thanos-default-base-image=quay.io/thanos/thanos:v0.37.2
         - --secret-field-selector=type!=kubernetes.io/dockercfg,type!=kubernetes.io/service-account-token,type!=helm.sh/release.v1
         - --web.enable-tls=true
         - --web.cert-file=/cert/cert
         - --web.key-file=/cert/key
         - --web.listen-address=:10250
         - --web.tls-min-version=VersionTLS13
@@ -99,7 +101,8 @@

         runAsNonRoot: true
         runAsUser: 65534
         seccompProfile:
           type: RuntimeDefault
       serviceAccountName: kube-prometheus-stack-operator
       automountServiceAccountToken: true
+      terminationGracePeriodSeconds: 30
 
--- HelmRelease: monitoring/kube-prometheus-stack Prometheus: monitoring/kube-prometheus-stack

+++ HelmRelease: monitoring/kube-prometheus-stack Prometheus: monitoring/kube-prometheus-stack

@@ -17,14 +17,14 @@

     alertmanagers:
     - namespace: monitoring
       name: kube-prometheus-stack-alertmanager
       port: http-web
       pathPrefix: /
       apiVersion: v2
-  image: quay.io/prometheus/prometheus:v2.54.1
-  version: v2.54.1
+  image: quay.io/prometheus/prometheus:v3.1.0
+  version: v3.1.0
   externalUrl: http://prometheus.deangalvin.dev/
   paused: false
   replicas: 2
   shards: 1
   logLevel: info
   logFormat: logfmt
@@ -66,9 +66,25 @@

     volumeClaimTemplate:
       spec:
         resources:
           requests:
             storage: 50Gi
         storageClassName: ceph-rbd
+  affinity:
+    podAntiAffinity:
+      preferredDuringSchedulingIgnoredDuringExecution:
+      - weight: 100
+        podAffinityTerm:
+          topologyKey: kubernetes.io/hostname
+          labelSelector:
+            matchExpressions:
+            - key: app.kubernetes.io/name
+              operator: In
+              values:
+              - prometheus
+            - key: prometheus
+              operator: In
+              values:
+              - kube-prometheus-stack
   portName: http-web
   hostNetwork: false
 
--- HelmRelease: monitoring/kube-prometheus-stack PrometheusRule: monitoring/kube-prometheus-stack-etcd

+++ HelmRelease: monitoring/kube-prometheus-stack PrometheusRule: monitoring/kube-prometheus-stack-etcd

@@ -19,29 +19,29 @@

       annotations:
         description: 'etcd cluster "{{ $labels.job }}": members are down ({{ $value
           }}).'
         summary: etcd cluster members are down.
       expr: |-
         max without (endpoint) (
-          sum without (instance) (up{job=~".*etcd.*"} == bool 0)
+          sum without (instance, pod) (up{job=~".*etcd.*"} == bool 0)
         or
           count without (To) (
-            sum without (instance) (rate(etcd_network_peer_sent_failures_total{job=~".*etcd.*"}[120s])) > 0.01
+            sum without (instance, pod) (rate(etcd_network_peer_sent_failures_total{job=~".*etcd.*"}[120s])) > 0.01
           )
         )
         > 0
       for: 10m
       labels:
         severity: critical
     - alert: etcdInsufficientMembers
       annotations:
         description: 'etcd cluster "{{ $labels.job }}": insufficient members ({{ $value
           }}).'
         summary: etcd cluster has insufficient number of members.
-      expr: sum(up{job=~".*etcd.*"} == bool 1) without (instance) < ((count(up{job=~".*etcd.*"})
-        without (instance) + 1) / 2)
+      expr: sum(up{job=~".*etcd.*"} == bool 1) without (instance, pod) < ((count(up{job=~".*etcd.*"})
+        without (instance, pod) + 1) / 2)
       for: 3m
       labels:
         severity: critical
     - alert: etcdNoLeader
       annotations:
         description: 'etcd cluster "{{ $labels.job }}": member {{ $labels.instance
@@ -55,13 +55,13 @@

       annotations:
         description: 'etcd cluster "{{ $labels.job }}": {{ $value }} leader changes
           within the last 15 minutes. Frequent elections may be a sign of insufficient
           resources, high network latency, or disruptions by other components and
           should be investigated.'
         summary: etcd cluster has high number of leader changes.
-      expr: increase((max without (instance) (etcd_server_leader_changes_seen_total{job=~".*etcd.*"})
+      expr: increase((max without (instance, pod) (etcd_server_leader_changes_seen_total{job=~".*etcd.*"})
         or 0*absent(etcd_server_leader_changes_seen_total{job=~".*etcd.*"}))[15m:1m])
         >= 4
       for: 5m
       labels:
         severity: warning
     - alert: etcdHighNumberOfFailedGRPCRequests
--- HelmRelease: monitoring/kube-prometheus-stack PrometheusRule: monitoring/kube-prometheus-stack-kube-apiserver-availability.rules

+++ HelmRelease: monitoring/kube-prometheus-stack PrometheusRule: monitoring/kube-prometheus-stack-kube-apiserver-availability.rules

@@ -24,44 +24,43 @@

         verb: read
       record: code:apiserver_request_total:increase30d
     - expr: sum by (cluster, code) (code_verb:apiserver_request_total:increase30d{verb=~"POST|PUT|PATCH|DELETE"})
       labels:
         verb: write
       record: code:apiserver_request_total:increase30d
-    - expr: sum by (cluster, verb, scope) (increase(apiserver_request_sli_duration_seconds_count{job="apiserver"}[1h]))
-      record: cluster_verb_scope:apiserver_request_sli_duration_seconds_count:increase1h
-    - expr: sum by (cluster, verb, scope) (avg_over_time(cluster_verb_scope:apiserver_request_sli_duration_seconds_count:increase1h[30d])
-        * 24 * 30)
-      record: cluster_verb_scope:apiserver_request_sli_duration_seconds_count:increase30d
     - expr: sum by (cluster, verb, scope, le) (increase(apiserver_request_sli_duration_seconds_bucket[1h]))
       record: cluster_verb_scope_le:apiserver_request_sli_duration_seconds_bucket:increase1h
     - expr: sum by (cluster, verb, scope, le) (avg_over_time(cluster_verb_scope_le:apiserver_request_sli_duration_seconds_bucket:increase1h[30d])
         * 24 * 30)
       record: cluster_verb_scope_le:apiserver_request_sli_duration_seconds_bucket:increase30d
+    - expr: sum by (cluster, verb, scope) (cluster_verb_scope_le:apiserver_request_sli_duration_seconds_bucket:increase1h{le="+Inf"})
+      record: cluster_verb_scope:apiserver_request_sli_duration_seconds_count:increase1h
+    - expr: sum by (cluster, verb, scope) (cluster_verb_scope_le:apiserver_request_sli_duration_seconds_bucket:increase30d{le="+Inf"})
+      record: cluster_verb_scope:apiserver_request_sli_duration_seconds_count:increase30d
     - expr: |-
         1 - (
           (
             # write too slow
             sum by (cluster) (cluster_verb_scope:apiserver_request_sli_duration_seconds_count:increase30d{verb=~"POST|PUT|PATCH|DELETE"})
             -
-            sum by (cluster) (cluster_verb_scope_le:apiserver_request_sli_duration_seconds_bucket:increase30d{verb=~"POST|PUT|PATCH|DELETE",le="1"})
+            sum by (cluster) (cluster_verb_scope_le:apiserver_request_sli_duration_seconds_bucket:increase30d{verb=~"POST|PUT|PATCH|DELETE",le=~"1(\\.0)?"})
           ) +
           (
             # read too slow
             sum by (cluster) (cluster_verb_scope:apiserver_request_sli_duration_seconds_count:increase30d{verb=~"LIST|GET"})
             -
             (
               (
-                sum by (cluster) (cluster_verb_scope_le:apiserver_request_sli_duration_seconds_bucket:increase30d{verb=~"LIST|GET",scope=~"resource|",le="1"})
+                sum by (cluster) (cluster_verb_scope_le:apiserver_request_sli_duration_seconds_bucket:increase30d{verb=~"LIST|GET",scope=~"resource|",le=~"1(\\.0)?"})
                 or
                 vector(0)
               )
               +
-              sum by (cluster) (cluster_verb_scope_le:apiserver_request_sli_duration_seconds_bucket:increase30d{verb=~"LIST|GET",scope="namespace",le="5"})
+              sum by (cluster) (cluster_verb_scope_le:apiserver_request_sli_duration_seconds_bucket:increase30d{verb=~"LIST|GET",scope="namespace",le=~"5(\\.0)?"})
               +
-              sum by (cluster) (cluster_verb_scope_le:apiserver_request_sli_duration_seconds_bucket:increase30d{verb=~"LIST|GET",scope="cluster",le="30"})
+              sum by (cluster) (cluster_verb_scope_le:apiserver_request_sli_duration_seconds_bucket:increase30d{verb=~"LIST|GET",scope="cluster",le=~"30(\\.0)?"})
             )
           ) +
           # errors
           sum by (cluster) (code:apiserver_request_total:increase30d{code=~"5.."} or vector(0))
         )
         /
@@ -73,20 +72,20 @@

         1 - (
           sum by (cluster) (cluster_verb_scope:apiserver_request_sli_duration_seconds_count:increase30d{verb=~"LIST|GET"})
           -
           (
             # too slow
             (
-              sum by (cluster) (cluster_verb_scope_le:apiserver_request_sli_duration_seconds_bucket:increase30d{verb=~"LIST|GET",scope=~"resource|",le="1"})
+              sum by (cluster) (cluster_verb_scope_le:apiserver_request_sli_duration_seconds_bucket:increase30d{verb=~"LIST|GET",scope=~"resource|",le=~"1(\\.0)?"})
               or
               vector(0)
             )
             +
-            sum by (cluster) (cluster_verb_scope_le:apiserver_request_sli_duration_seconds_bucket:increase30d{verb=~"LIST|GET",scope="namespace",le="5"})
+            sum by (cluster) (cluster_verb_scope_le:apiserver_request_sli_duration_seconds_bucket:increase30d{verb=~"LIST|GET",scope="namespace",le=~"5(\\.0)?"})
             +
-            sum by (cluster) (cluster_verb_scope_le:apiserver_request_sli_duration_seconds_bucket:increase30d{verb=~"LIST|GET",scope="cluster",le="30"})
+            sum by (cluster) (cluster_verb_scope_le:apiserver_request_sli_duration_seconds_bucket:increase30d{verb=~"LIST|GET",scope="cluster",le=~"30(\\.0)?"})
           )
           +
           # errors
           sum by (cluster) (code:apiserver_request_total:increase30d{verb="read",code=~"5.."} or vector(0))
         )
         /
@@ -97,13 +96,13 @@

     - expr: |-
         1 - (
           (
             # too slow
             sum by (cluster) (cluster_verb_scope:apiserver_request_sli_duration_seconds_count:increase30d{verb=~"POST|PUT|PATCH|DELETE"})
             -
-            sum by (cluster) (cluster_verb_scope_le:apiserver_request_sli_duration_seconds_bucket:increase30d{verb=~"POST|PUT|PATCH|DELETE",le="1"})
+            sum by (cluster) (cluster_verb_scope_le:apiserver_request_sli_duration_seconds_bucket:increase30d{verb=~"POST|PUT|PATCH|DELETE",le=~"1(\\.0)?"})
           )
           +
           # errors
           sum by (cluster) (code:apiserver_request_total:increase30d{verb="write",code=~"5.."} or vector(0))
         )
         /
--- HelmRelease: monitoring/kube-prometheus-stack PrometheusRule: monitoring/kube-prometheus-stack-kube-apiserver-burnrate.rules

+++ HelmRelease: monitoring/kube-prometheus-stack PrometheusRule: monitoring/kube-prometheus-stack-kube-apiserver-burnrate.rules

@@ -20,20 +20,20 @@

           (
             # too slow
             sum by (cluster) (rate(apiserver_request_sli_duration_seconds_count{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward"}[1d]))
             -
             (
               (
-                sum by (cluster) (rate(apiserver_request_sli_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward",scope=~"resource|",le="1"}[1d]))
-                or
-                vector(0)
-              )
-              +
-              sum by (cluster) (rate(apiserver_request_sli_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward",scope="namespace",le="5"}[1d]))
-              +
-              sum by (cluster) (rate(apiserver_request_sli_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward",scope="cluster",le="30"}[1d]))
+                sum by (cluster) (rate(apiserver_request_sli_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward",scope=~"resource|",le=~"1(\\.0)?"}[1d]))
+                or
+                vector(0)
+              )
+              +
+              sum by (cluster) (rate(apiserver_request_sli_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward",scope="namespace",le=~"5(\\.0)?"}[1d]))
+              +
+              sum by (cluster) (rate(apiserver_request_sli_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward",scope="cluster",le=~"30(\\.0)?"}[1d]))
             )
           )
           +
           # errors
           sum by (cluster) (rate(apiserver_request_total{job="apiserver",verb=~"LIST|GET",code=~"5.."}[1d]))
         )
@@ -47,20 +47,20 @@

           (
             # too slow
             sum by (cluster) (rate(apiserver_request_sli_duration_seconds_count{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward"}[1h]))
             -
             (
               (
-                sum by (cluster) (rate(apiserver_request_sli_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward",scope=~"resource|",le="1"}[1h]))
-                or
-                vector(0)
-              )
-              +
-              sum by (cluster) (rate(apiserver_request_sli_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward",scope="namespace",le="5"}[1h]))
-              +
-              sum by (cluster) (rate(apiserver_request_sli_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward",scope="cluster",le="30"}[1h]))
+                sum by (cluster) (rate(apiserver_request_sli_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward",scope=~"resource|",le=~"1(\\.0)?"}[1h]))
+                or
+                vector(0)
+              )
+              +
+              sum by (cluster) (rate(apiserver_request_sli_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward",scope="namespace",le=~"5(\\.0)?"}[1h]))
+              +
+              sum by (cluster) (rate(apiserver_request_sli_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward",scope="cluster",le=~"30(\\.0)?"}[1h]))
             )
           )
           +
           # errors
           sum by (cluster) (rate(apiserver_request_total{job="apiserver",verb=~"LIST|GET",code=~"5.."}[1h]))
         )
@@ -74,20 +74,20 @@

           (
             # too slow
             sum by (cluster) (rate(apiserver_request_sli_duration_seconds_count{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward"}[2h]))
             -
             (
               (
-                sum by (cluster) (rate(apiserver_request_sli_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward",scope=~"resource|",le="1"}[2h]))
-                or
-                vector(0)
-              )
-              +
-              sum by (cluster) (rate(apiserver_request_sli_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward",scope="namespace",le="5"}[2h]))
-              +
-              sum by (cluster) (rate(apiserver_request_sli_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward",scope="cluster",le="30"}[2h]))
+                sum by (cluster) (rate(apiserver_request_sli_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward",scope=~"resource|",le=~"1(\\.0)?"}[2h]))
+                or
+                vector(0)
+              )
+              +
+              sum by (cluster) (rate(apiserver_request_sli_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward",scope="namespace",le=~"5(\\.0)?"}[2h]))
+              +
+              sum by (cluster) (rate(apiserver_request_sli_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward",scope="cluster",le=~"30(\\.0)?"}[2h]))
             )
           )
           +
           # errors
           sum by (cluster) (rate(apiserver_request_total{job="apiserver",verb=~"LIST|GET",code=~"5.."}[2h]))
         )
@@ -101,20 +101,20 @@

           (
             # too slow
             sum by (cluster) (rate(apiserver_request_sli_duration_seconds_count{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward"}[30m]))
             -
             (
               (
-                sum by (cluster) (rate(apiserver_request_sli_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward",scope=~"resource|",le="1"}[30m]))
-                or
-                vector(0)
-              )
-              +
-              sum by (cluster) (rate(apiserver_request_sli_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward",scope="namespace",le="5"}[30m]))
-              +
-              sum by (cluster) (rate(apiserver_request_sli_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward",scope="cluster",le="30"}[30m]))
+                sum by (cluster) (rate(apiserver_request_sli_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward",scope=~"resource|",le=~"1(\\.0)?"}[30m]))
+                or
+                vector(0)
+              )
+              +
+              sum by (cluster) (rate(apiserver_request_sli_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward",scope="namespace",le=~"5(\\.0)?"}[30m]))
+              +
+              sum by (cluster) (rate(apiserver_request_sli_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward",scope="cluster",le=~"30(\\.0)?"}[30m]))
             )
           )
           +
           # errors
           sum by (cluster) (rate(apiserver_request_total{job="apiserver",verb=~"LIST|GET",code=~"5.."}[30m]))
         )
@@ -128,20 +128,20 @@

           (
             # too slow
             sum by (cluster) (rate(apiserver_request_sli_duration_seconds_count{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward"}[3d]))
             -
             (
               (
-                sum by (cluster) (rate(apiserver_request_sli_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward",scope=~"resource|",le="1"}[3d]))
-                or
-                vector(0)
-              )
-              +
-              sum by (cluster) (rate(apiserver_request_sli_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward",scope="namespace",le="5"}[3d]))
-              +
-              sum by (cluster) (rate(apiserver_request_sli_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward",scope="cluster",le="30"}[3d]))
+                sum by (cluster) (rate(apiserver_request_sli_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward",scope=~"resource|",le=~"1(\\.0)?"}[3d]))
+                or
+                vector(0)
+              )
+              +
+              sum by (cluster) (rate(apiserver_request_sli_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward",scope="namespace",le=~"5(\\.0)?"}[3d]))
+              +
+              sum by (cluster) (rate(apiserver_request_sli_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward",scope="cluster",le=~"30(\\.0)?"}[3d]))
             )
           )
           +
           # errors
           sum by (cluster) (rate(apiserver_request_total{job="apiserver",verb=~"LIST|GET",code=~"5.."}[3d]))
         )
@@ -155,20 +155,20 @@

           (
             # too slow
             sum by (cluster) (rate(apiserver_request_sli_duration_seconds_count{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward"}[5m]))
             -
             (
               (
-                sum by (cluster) (rate(apiserver_request_sli_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward",scope=~"resource|",le="1"}[5m]))
-                or
-                vector(0)
[Diff truncated by flux-local]
--- HelmRelease: monitoring/kube-prometheus-stack PrometheusRule: monitoring/kube-prometheus-stack-kube-apiserver-slos

+++ HelmRelease: monitoring/kube-prometheus-stack PrometheusRule: monitoring/kube-prometheus-stack-kube-apiserver-slos

@@ -18,57 +18,57 @@

     - alert: KubeAPIErrorBudgetBurn
       annotations:
         description: The API server is burning too much error budget.
         runbook_url: https://runbooks.prometheus-operator.dev/runbooks/kubernetes/kubeapierrorbudgetburn
         summary: The API server is burning too much error budget.
       expr: |-
-        sum(apiserver_request:burnrate1h) > (14.40 * 0.01000)
-        and
-        sum(apiserver_request:burnrate5m) > (14.40 * 0.01000)
+        sum by (cluster) (apiserver_request:burnrate1h) > (14.40 * 0.01000)
+        and on (cluster)
+        sum by (cluster) (apiserver_request:burnrate5m) > (14.40 * 0.01000)
       for: 2m
       labels:
         long: 1h
         severity: critical
         short: 5m
     - alert: KubeAPIErrorBudgetBurn
       annotations:
         description: The API server is burning too much error budget.
         runbook_url: https://runbooks.prometheus-operator.dev/runbooks/kubernetes/kubeapierrorbudgetburn
         summary: The API server is burning too much error budget.
       expr: |-
-        sum(apiserver_request:burnrate6h) > (6.00 * 0.01000)
-        and
-        sum(apiserver_request:burnrate30m) > (6.00 * 0.01000)
+        sum by (cluster) (apiserver_request:burnrate6h) > (6.00 * 0.01000)
+        and on (cluster)
+        sum by (cluster) (apiserver_request:burnrate30m) > (6.00 * 0.01000)
       for: 15m
       labels:
         long: 6h
         severity: critical
         short: 30m
     - alert: KubeAPIErrorBudgetBurn
       annotations:
         description: The API server is burning too much error budget.
         runbook_url: https://runbooks.prometheus-operator.dev/runbooks/kubernetes/kubeapierrorbudgetburn
         summary: The API server is burning too much error budget.
       expr: |-
-        sum(apiserver_request:burnrate1d) > (3.00 * 0.01000)
-        and
-        sum(apiserver_request:burnrate2h) > (3.00 * 0.01000)
+        sum by (cluster) (apiserver_request:burnrate1d) > (3.00 * 0.01000)
+        and on (cluster)
+        sum by (cluster) (apiserver_request:burnrate2h) > (3.00 * 0.01000)
       for: 1h
       labels:
         long: 1d
         severity: warning
         short: 2h
     - alert: KubeAPIErrorBudgetBurn
       annotations:
         description: The API server is burning too much error budget.
         runbook_url: https://runbooks.prometheus-operator.dev/runbooks/kubernetes/kubeapierrorbudgetburn
         summary: The API server is burning too much error budget.
       expr: |-
-        sum(apiserver_request:burnrate3d) > (1.00 * 0.01000)
-        and
-        sum(apiserver_request:burnrate6h) > (1.00 * 0.01000)
+        sum by (cluster) (apiserver_request:burnrate3d) > (1.00 * 0.01000)
+        and on (cluster)
+        sum by (cluster) (apiserver_request:burnrate6h) > (1.00 * 0.01000)
       for: 3h
       labels:
         long: 3d
         severity: warning
         short: 6h
 
--- HelmRelease: monitoring/kube-prometheus-stack PrometheusRule: monitoring/kube-prometheus-stack-kubernetes-apps

+++ HelmRelease: monitoring/kube-prometheus-stack PrometheusRule: monitoring/kube-prometheus-stack-kubernetes-apps

@@ -126,13 +126,13 @@

         description: StatefulSet {{ $labels.namespace }}/{{ $labels.statefulset }}
           update has not been rolled out.
         runbook_url: https://runbooks.prometheus-operator.dev/runbooks/kubernetes/kubestatefulsetupdatenotrolledout
         summary: StatefulSet update has not been rolled out.
       expr: |-
         (
-          max by (namespace, statefulset) (
+          max by (namespace, statefulset, job, cluster) (
             kube_statefulset_status_current_revision{job="kube-state-metrics", namespace=~".*"}
               unless
             kube_statefulset_status_update_revision{job="kube-state-metrics", namespace=~".*"}
           )
             *
           (
@@ -148,13 +148,13 @@

       for: 15m
       labels:
         severity: warning
     - alert: KubeDaemonSetRolloutStuck
       annotations:
         description: DaemonSet {{ $labels.namespace }}/{{ $labels.daemonset }} has
-          not finished or progressed for at least 15 minutes.
+          not finished or progressed for at least 15m.
         runbook_url: https://runbooks.prometheus-operator.dev/runbooks/kubernetes/kubedaemonsetrolloutstuck
         summary: DaemonSet rollout is stuck.
       expr: |-
         (
           (
             kube_daemonset_status_current_number_scheduled{job="kube-state-metrics", namespace=~".*"}
@@ -180,19 +180,19 @@

         )
       for: 15m
       labels:
         severity: warning
     - alert: KubeContainerWaiting
       annotations:
-        description: pod/{{ $labels.pod }} in namespace {{ $labels.namespace }} on
+        description: 'pod/{{ $labels.pod }} in namespace {{ $labels.namespace }} on
           container {{ $labels.container}} has been in waiting state for longer than
-          1 hour.
+          1 hour. (reason: "{{ $labels.reason }}").'
         runbook_url: https://runbooks.prometheus-operator.dev/runbooks/kubernetes/kubecontainerwaiting
         summary: Pod container waiting longer than 1 hour
-      expr: sum by (namespace, pod, container, cluster) (kube_pod_container_status_waiting_reason{job="kube-state-metrics",
-        namespace=~".*"}) > 0
+      expr: kube_pod_container_status_waiting_reason{reason!="CrashLoopBackOff", job="kube-state-metrics",
+        namespace=~".*"} > 0
       for: 1h
       labels:
         severity: warning
     - alert: KubeDaemonSetNotScheduled
       annotations:
         description: '{{ $value }} Pods of DaemonSet {{ $labels.namespace }}/{{ $labels.daemonset
--- HelmRelease: monitoring/kube-prometheus-stack PrometheusRule: monitoring/kube-prometheus-stack-kubernetes-resources

+++ HelmRelease: monitoring/kube-prometheus-stack PrometheusRule: monitoring/kube-prometheus-stack-kubernetes-resources

@@ -117,14 +117,14 @@

         description: '{{ $value | humanizePercentage }} throttling of CPU in namespace
           {{ $labels.namespace }} for container {{ $labels.container }} in pod {{
           $labels.pod }}.'
         runbook_url: https://runbooks.prometheus-operator.dev/runbooks/kubernetes/cputhrottlinghigh
         summary: Processes experience elevated CPU throttling.
       expr: |-
-        sum(increase(container_cpu_cfs_throttled_periods_total{container!="", }[5m])) by (cluster, container, pod, namespace)
+        sum(increase(container_cpu_cfs_throttled_periods_total{container!="", job="kubelet", metrics_path="/metrics/cadvisor", }[5m])) without (id, metrics_path, name, image, endpoint, job, node)
           /
-        sum(increase(container_cpu_cfs_periods_total{}[5m])) by (cluster, container, pod, namespace)
+        sum(increase(container_cpu_cfs_periods_total{job="kubelet", metrics_path="/metrics/cadvisor", }[5m])) without (id, metrics_path, name, image, endpoint, job, node)
           > ( 25 / 100 )
       for: 15m
       labels:
         severity: info
 
--- HelmRelease: monitoring/kube-prometheus-stack PrometheusRule: monitoring/kube-prometheus-stack-kubernetes-system-apiserver

+++ HelmRelease: monitoring/kube-prometheus-stack PrometheusRule: monitoring/kube-prometheus-stack-kubernetes-system-apiserver

@@ -15,42 +15,45 @@

   groups:
   - name: kubernetes-system-apiserver
     rules:
     - alert: KubeClientCertificateExpiration
       annotations:
         description: A client certificate used to authenticate to kubernetes apiserver
-          is expiring in less than 7.0 days.
+          is expiring in less than 7.0 days on cluster {{ $labels.cluster }}.
         runbook_url: https://runbooks.prometheus-operator.dev/runbooks/kubernetes/kubeclientcertificateexpiration
         summary: Client certificate is about to expire.
-      expr: apiserver_client_certificate_expiration_seconds_count{job="apiserver"}
-        > 0 and on (job) histogram_quantile(0.01, sum by (job, le) (rate(apiserver_client_certificate_expiration_seconds_bucket{job="apiserver"}[5m])))
-        < 604800
+      expr: |-
+        histogram_quantile(0.01, sum without (namespace, service, endpoint) (rate(apiserver_client_certificate_expiration_seconds_bucket{job="apiserver"}[5m]))) < 604800
+        and
+        on (job, cluster, instance) apiserver_client_certificate_expiration_seconds_count{job="apiserver"} > 0
       for: 5m
       labels:
         severity: warning
     - alert: KubeClientCertificateExpiration
       annotations:
         description: A client certificate used to authenticate to kubernetes apiserver
-          is expiring in less than 24.0 hours.
+          is expiring in less than 24.0 hours on cluster {{ $labels.cluster }}.
         runbook_url: https://runbooks.prometheus-operator.dev/runbooks/kubernetes/kubeclientcertificateexpiration
         summary: Client certificate is about to expire.
-      expr: apiserver_client_certificate_expiration_seconds_count{job="apiserver"}
-        > 0 and on (job) histogram_quantile(0.01, sum by (job, le) (rate(apiserver_client_certificate_expiration_seconds_bucket{job="apiserver"}[5m])))
-        < 86400
+      expr: |-
+        histogram_quantile(0.01, sum without (namespace, service, endpoint) (rate(apiserver_client_certificate_expiration_seconds_bucket{job="apiserver"}[5m]))) < 86400
+        and
+        on (job, cluster, instance) apiserver_client_certificate_expiration_seconds_count{job="apiserver"} > 0
       for: 5m
       labels:
         severity: critical
     - alert: KubeAggregatedAPIErrors
       annotations:
-        description: Kubernetes aggregated API {{ $labels.name }}/{{ $labels.namespace
-          }} has reported errors. It has appeared unavailable {{ $value | humanize
-          }} times averaged over the past 10m.
+        description: Kubernetes aggregated API {{ $labels.instance }}/{{ $labels.name
+          }} has reported {{ $labels.reason }} errors on cluster {{ $labels.cluster
+          }}.
         runbook_url: https://runbooks.prometheus-operator.dev/runbooks/kubernetes/kubeaggregatedapierrors
         summary: Kubernetes aggregated API has reported errors.
-      expr: sum by (name, namespace, cluster)(increase(aggregator_unavailable_apiservice_total{job="apiserver"}[10m]))
-        > 4
+      expr: sum by (cluster, instance, name, reason)(increase(aggregator_unavailable_apiservice_total{job="apiserver"}[1m]))
+        > 0
+      for: 10m
       labels:
         severity: warning
     - alert: KubeAggregatedAPIDown
       annotations:
         description: Kubernetes aggregated API {{ $labels.name }}/{{ $labels.namespace
           }} has been only {{ $value | humanize }}% available over the last 10m.
@@ -74,13 +77,14 @@

       annotations:
         description: The kubernetes apiserver has terminated {{ $value | humanizePercentage
           }} of its incoming requests.
         runbook_url: https://runbooks.prometheus-operator.dev/runbooks/kubernetes/kubeapiterminatedrequests
         summary: The kubernetes apiserver has terminated {{ $value | humanizePercentage
           }} of its incoming requests.
-      expr: sum(rate(apiserver_request_terminations_total{job="apiserver"}[10m]))  /
-        (  sum(rate(apiserver_request_total{job="apiserver"}[10m])) + sum(rate(apiserver_request_terminations_total{job="apiserver"}[10m]))
+      expr: sum by (cluster) (rate(apiserver_request_terminations_total{job="apiserver"}[10m]))
+        / ( sum by (cluster) (rate(apiserver_request_total{job="apiserver"}[10m]))
+        + sum by (cluster) (rate(apiserver_request_terminations_total{job="apiserver"}[10m]))
         ) > 0.20
       for: 5m
       labels:
         severity: warning
 
--- HelmRelease: monitoring/kube-prometheus-stack PrometheusRule: monitoring/kube-prometheus-stack-node-exporter

+++ HelmRelease: monitoring/kube-prometheus-stack PrometheusRule: monitoring/kube-prometheus-stack-node-exporter

@@ -172,13 +172,14 @@

         > 0.01
       for: 1h
       labels:
         severity: warning
     - alert: NodeHighNumberConntrackEntriesUsed
       annotations:
-        description: '{{ $value | humanizePercentage }} of conntrack entries are used.'
+        description: '{{ $labels.instance }} {{ $value | humanizePercentage }} of
+          conntrack entries are used.'
         runbook_url: https://runbooks.prometheus-operator.dev/runbooks/node/nodehighnumberconntrackentriesused
         summary: Number of conntrack are getting close to the limit.
       expr: (node_nf_conntrack_entries{job="node-exporter"} / node_nf_conntrack_entries_limit)
         > 0.75
       labels:
         severity: warning
@@ -278,13 +279,13 @@

       annotations:
         description: |
           CPU usage at {{ $labels.instance }} has been above 90% for the last 15 minutes, is currently at {{ printf "%.2f" $value }}%.
         runbook_url: https://runbooks.prometheus-operator.dev/runbooks/node/nodecpuhighusage
         summary: High CPU usage.
       expr: sum without(mode) (avg without (cpu) (rate(node_cpu_seconds_total{job="node-exporter",
-        mode!="idle"}[2m]))) * 100 > 90
+        mode!~"idle|iowait"}[2m]))) * 100 > 90
       for: 15m
       labels:
         severity: info
     - alert: NodeSystemSaturation
       annotations:
         description: |
--- HelmRelease: monitoring/kube-prometheus-stack PrometheusRule: monitoring/kube-prometheus-stack-prometheus

+++ HelmRelease: monitoring/kube-prometheus-stack PrometheusRule: monitoring/kube-prometheus-stack-prometheus

@@ -68,17 +68,17 @@

         )
       for: 15m
       labels:
         severity: warning
     - alert: PrometheusErrorSendingAlertsToSomeAlertmanagers
       annotations:
-        description: '{{ printf "%.1f" $value }}% errors while sending alerts from
-          Prometheus {{$labels.namespace}}/{{$labels.pod}} to Alertmanager {{$labels.alertmanager}}.'
+        description: '{{ printf "%.1f" $value }}% of alerts sent by Prometheus {{$labels.namespace}}/{{$labels.pod}}
+          to Alertmanager {{$labels.alertmanager}} were affected by errors.'
         runbook_url: https://runbooks.prometheus-operator.dev/runbooks/prometheus/prometheuserrorsendingalertstosomealertmanagers
-        summary: Prometheus has encountered more than 1% errors sending alerts to
-          a specific Alertmanager.
+        summary: More than 1% of alerts sent by Prometheus to a specific Alertmanager
+          were affected by errors.
       expr: |-
         (
           rate(prometheus_notifications_errors_total{job="kube-prometheus-stack-prometheus",namespace="monitoring"}[5m])
         /
           rate(prometheus_notifications_sent_total{job="kube-prometheus-stack-prometheus",namespace="monitoring"}[5m])
         )
--- HelmRelease: monitoring/kube-prometheus-stack ServiceMonitor: monitoring/kube-prometheus-stack-kubelet

+++ HelmRelease: monitoring/kube-prometheus-stack ServiceMonitor: monitoring/kube-prometheus-stack-kubelet

@@ -11,12 +11,20 @@

     app.kubernetes.io/part-of: kube-prometheus-stack
     release: kube-prometheus-stack
     heritage: Helm
 spec:
   attachMetadata:
     node: false
+  jobLabel: k8s-app
+  namespaceSelector:
+    matchNames:
+    - kube-system
+  selector:
+    matchLabels:
+      app.kubernetes.io/name: kubelet
+      k8s-app: kubelet
   endpoints:
   - port: https-metrics
     scheme: https
     tlsConfig:
       caFile: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
       insecureSkipVerify: true
@@ -33,14 +41,16 @@

       sourceLabels:
       - __metrics_path__
       targetLabel: metrics_path
   - port: https-metrics
     scheme: https
     path: /metrics/cadvisor
+    interval: 10s
     honorLabels: true
     honorTimestamps: true
+    trackTimestampsStaleness: true
     tlsConfig:
       caFile: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
       insecureSkipVerify: true
     bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
     metricRelabelings:
     - action: drop
@@ -84,15 +94,7 @@

     bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
     relabelings:
     - action: replace
       sourceLabels:
       - __metrics_path__
       targetLabel: metrics_path
-  jobLabel: k8s-app
-  namespaceSelector:
-    matchNames:
-    - kube-system
-  selector:
-    matchLabels:
-      app.kubernetes.io/name: kubelet
-      k8s-app: kubelet
 

Copy link

github-actions bot commented Dec 16, 2024

--- kubernetes/apps/monitoring/kube-prometheus-stack/app Kustomization: flux-system/cluster-apps-kube-prometheus-stack HelmRelease: monitoring/kube-prometheus-stack

+++ kubernetes/apps/monitoring/kube-prometheus-stack/app Kustomization: flux-system/cluster-apps-kube-prometheus-stack HelmRelease: monitoring/kube-prometheus-stack

@@ -14,13 +14,13 @@

     spec:
       chart: kube-prometheus-stack
       sourceRef:
         kind: HelmRepository
         name: prometheus-community
         namespace: flux-system
-      version: 62.7.0
+      version: 67.11.0
   install:
     crds: CreateReplace
     createNamespace: true
     remediation:
       retries: 3
     timeout: 30m

@renovate renovate bot changed the title feat(helm)!: Update chart kube-prometheus-stack to 67.0.0 feat(helm)!: Update chart kube-prometheus-stack to 67.1.0 Dec 16, 2024
@renovate renovate bot force-pushed the renovate/kube-prometheus-stack-67.x branch 2 times, most recently from 8813243 to b504192 Compare December 17, 2024 01:32
@renovate renovate bot changed the title feat(helm)!: Update chart kube-prometheus-stack to 67.1.0 feat(helm)!: Update chart kube-prometheus-stack to 67.2.0 Dec 17, 2024
@renovate renovate bot force-pushed the renovate/kube-prometheus-stack-67.x branch from b504192 to cf869d9 Compare December 17, 2024 23:09
@renovate renovate bot changed the title feat(helm)!: Update chart kube-prometheus-stack to 67.2.0 feat(helm)!: Update chart kube-prometheus-stack to 67.2.1 Dec 17, 2024
@renovate renovate bot force-pushed the renovate/kube-prometheus-stack-67.x branch from cf869d9 to c047dc7 Compare December 18, 2024 01:14
@renovate renovate bot changed the title feat(helm)!: Update chart kube-prometheus-stack to 67.2.1 feat(helm)!: Update chart kube-prometheus-stack to 67.3.0 Dec 18, 2024
@renovate renovate bot changed the title feat(helm)!: Update chart kube-prometheus-stack to 67.3.0 feat(helm)!: Update chart kube-prometheus-stack to 67.3.1 Dec 18, 2024
@renovate renovate bot force-pushed the renovate/kube-prometheus-stack-67.x branch 2 times, most recently from a67ea23 to 615b86a Compare December 19, 2024 19:32
@renovate renovate bot changed the title feat(helm)!: Update chart kube-prometheus-stack to 67.3.1 feat(helm)!: Update chart kube-prometheus-stack to 67.4.0 Dec 19, 2024
@renovate renovate bot force-pushed the renovate/kube-prometheus-stack-67.x branch from 615b86a to 6600f03 Compare December 25, 2024 10:40
@renovate renovate bot changed the title feat(helm)!: Update chart kube-prometheus-stack to 67.4.0 feat(helm)!: Update chart kube-prometheus-stack to 67.5.0 Dec 25, 2024
@renovate renovate bot force-pushed the renovate/kube-prometheus-stack-67.x branch from 6600f03 to 6cda2c5 Compare January 3, 2025 10:08
@renovate renovate bot changed the title feat(helm)!: Update chart kube-prometheus-stack to 67.5.0 feat(helm)!: Update chart kube-prometheus-stack to 67.6.0 Jan 3, 2025
@renovate renovate bot force-pushed the renovate/kube-prometheus-stack-67.x branch from 6cda2c5 to 142e72e Compare January 3, 2025 16:36
@renovate renovate bot changed the title feat(helm)!: Update chart kube-prometheus-stack to 67.6.0 feat(helm)!: Update chart kube-prometheus-stack to 67.7.0 Jan 3, 2025
@renovate renovate bot changed the title feat(helm)!: Update chart kube-prometheus-stack to 67.7.0 feat(helm)!: Update chart kube-prometheus-stack to 67.8.0 Jan 6, 2025
@renovate renovate bot force-pushed the renovate/kube-prometheus-stack-67.x branch from 142e72e to 99c2f5d Compare January 6, 2025 11:06
@renovate renovate bot changed the title feat(helm)!: Update chart kube-prometheus-stack to 67.8.0 feat(helm)!: Update chart kube-prometheus-stack to 67.9.0 Jan 8, 2025
@renovate renovate bot force-pushed the renovate/kube-prometheus-stack-67.x branch 2 times, most recently from ddeab83 to 73008d2 Compare January 11, 2025 13:18
@renovate renovate bot changed the title feat(helm)!: Update chart kube-prometheus-stack to 67.9.0 feat(helm)!: Update chart kube-prometheus-stack to 67.10.0 Jan 11, 2025
@renovate renovate bot force-pushed the renovate/kube-prometheus-stack-67.x branch from 73008d2 to d2c9979 Compare January 13, 2025 14:22
@renovate renovate bot changed the title feat(helm)!: Update chart kube-prometheus-stack to 67.10.0 feat(helm)!: Update chart kube-prometheus-stack to 67.11.0 Jan 13, 2025
@renovate renovate bot changed the title feat(helm)!: Update chart kube-prometheus-stack to 67.11.0 feat(helm)!: Update chart kube-prometheus-stack to 67.11.0 - autoclosed Jan 13, 2025
@renovate renovate bot closed this Jan 13, 2025
@renovate renovate bot deleted the renovate/kube-prometheus-stack-67.x branch January 13, 2025 20:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

0 participants