feat(helm)!: Update chart kube-prometheus-stack to 67.11.0 - autoclosed #1144

renovate · 2024-12-16T04:12:53Z

This PR contains the following updates:

Package	Update	Change
kube-prometheus-stack (source)	major	`62.7.0` -> `67.11.0`

Warning

Some dependencies could not be looked up. Check the Dependency Dashboard for more information.

Release Notes

prometheus-community/helm-charts (kube-prometheus-stack)

`v67.11.0`

Compare Source

kube-prometheus-stack collects Kubernetes manifests, Grafana dashboards, and Prometheus rules combined with documentation and scripts to provide easy to operate end-to-end Kubernetes cluster monitoring with Prometheus using the Prometheus Operator.

What's Changed

[kube-prometheus-stack] Add kubelet scrape flag by @SuperQ in https://github.com/prometheus-community/helm-charts/pull/5136

Full Changelog: prometheus-community/helm-charts@kube-prometheus-stack-67.10.0...kube-prometheus-stack-67.11.0

`v67.10.0`

Compare Source

kube-prometheus-stack collects Kubernetes manifests, Grafana dashboards, and Prometheus rules combined with documentation and scripts to provide easy to operate end-to-end Kubernetes cluster monitoring with Prometheus using the Prometheus Operator.

What's Changed

[prometheus-kube-stack] Fix relabeling and metricRelabeling for additional serviceMonitor by @mehr74 in https://github.com/prometheus-community/helm-charts/pull/5133

New Contributors

@mehr74 made their first contribution in https://github.com/prometheus-community/helm-charts/pull/5133

Full Changelog: prometheus-community/helm-charts@prometheus-snmp-exporter-6.0.0...kube-prometheus-stack-67.10.0

`v67.9.0`

Compare Source

kube-prometheus-stack collects Kubernetes manifests, Grafana dashboards, and Prometheus rules combined with documentation and scripts to provide easy to operate end-to-end Kubernetes cluster monitoring with Prometheus using the Prometheus Operator.

kube-prometheus-stack collects Kubernetes manifests, Grafana dashboards, and Prometheus rules combined with documentation and scripts to provide easy to operate end-to-end Kubernetes cluster monitoring with Prometheus using the Prometheus Operator.

What's Changed

[kube-prometheus-stack] regenerate mixins by @DrFaust92 in https://github.com/prometheus-community/helm-charts/pull/5094

Full Changelog: prometheus-community/helm-charts@prom-label-proxy-0.10.1...kube-prometheus-stack-67.5.0

`v67.4.0`

Compare Source

`v67.3.1`

Compare Source

`v67.3.0`

Compare Source

`v67.2.1`

Compare Source

`v67.2.0`

Compare Source

`v67.1.0`

Compare Source

`v67.0.0`

Compare Source

`v66.7.1`

Compare Source

`v66.7.0`

Compare Source

`v66.6.0`

Compare Source

`v66.5.0`

Compare Source

`v66.4.0`

Compare Source

`v66.3.1`

Compare Source

`v66.3.0`

Compare Source

`v66.2.2`

Compare Source

`v66.2.1`

Compare Source

kube-prometheus-stack collects Kubernetes manifests, Grafana dashboards, and Prometheus rules combined with documentation and scripts to provide easy to operate end-to-end Kubernetes cluster monitoring with Prometheus using the Prometheus Operator.

What's Changed

[kube-prometheus-stack] Fix README.md by @muffl0n in https://github.com/prometheus-community/helm-charts/pull/4998

Full Changelog: prometheus-community/helm-charts@kube-prometheus-stack-66.2.0...kube-prometheus-stack-66.2.1

`v66.2.0`

Compare Source

kube-prometheus-stack collects Kubernetes manifests, Grafana dashboards, and Prometheus rules combined with documentation and scripts to provide easy to operate end-to-end Kubernetes cluster monitoring with Prometheus using the Prometheus Operator.

What's Changed

[kube-prometheus-stack] regenerate dashboards and alerts by @DrFaust92 in https://github.com/prometheus-community/helm-charts/pull/4997

Full Changelog: prometheus-community/helm-charts@kube-prometheus-stack-66.1.1...kube-prometheus-stack-66.2.0

`v66.1.1`

Compare Source

kube-prometheus-stack collects Kubernetes manifests, Grafana dashboards, and Prometheus rules combined with documentation and scripts to provide easy to operate end-to-end Kubernetes cluster monitoring with Prometheus using the Prometheus Operator.

What's Changed

[kube-prometheus-stack] Fix Helm template error by using hasKey to ch… by @traberph in https://github.com/prometheus-community/helm-charts/pull/4976

New Contributors

@traberph made their first contribution in https://github.com/prometheus-community/helm-charts/pull/4976

Full Changelog: prometheus-community/helm-charts@prometheus-25.30.0...kube-prometheus-stack-66.1.1

`v66.1.0`

Compare Source

kube-prometheus-stack collects Kubernetes manifests, Grafana dashboards, and Prometheus rules combined with documentation and scripts to provide easy to operate end-to-end Kubernetes cluster monitoring with Prometheus using the Prometheus Operator.

What's Changed

[kube-prometheus-stack] bump deps by @DrFaust92 in https://github.com/prometheus-community/helm-charts/pull/4989

Full Changelog: prometheus-community/helm-charts@kube-state-metrics-5.27.0...kube-prometheus-stack-66.1.0

`v66.0.0`

Compare Source

kube-prometheus-stack collects Kubernetes manifests, Grafana dashboards, and Prometheus rules combined with documentation and scripts to provide easy to operate end-to-end Kubernetes cluster monitoring with Prometheus using the Prometheus Operator.

What's Changed

[kube-prometheus-stack] bump prometheus-operator to 0.78.1 by @sebastiangaiser in https://github.com/prometheus-community/helm-charts/pull/4979

Full Changelog: prometheus-community/helm-charts@prometheus-operator-crds-16.0.0...kube-prometheus-stack-66.0.0

`v65.8.1`

Compare Source

kube-prometheus-stack collects Kubernetes manifests, Grafana dashboards, and Prometheus rules combined with documentation and scripts to provide easy to operate end-to-end Kubernetes cluster monitoring with Prometheus using the Prometheus Operator.

What's Changed

[kube-prometheus-stack] Add more informations to selector of additionalPodMonitors and additionalServiceMonitors by @zhifanggao in https://github.com/prometheus-community/helm-charts/pull/4974

New Contributors

@zhifanggao made their first contribution in https://github.com/prometheus-community/helm-charts/pull/4974

Full Changelog: prometheus-community/helm-charts@kube-prometheus-stack-65.8.0...kube-prometheus-stack-65.8.1

`v65.8.0`

Compare Source

kube-prometheus-stack collects Kubernetes manifests, Grafana dashboards, and Prometheus rules combined with documentation and scripts to provide easy to operate end-to-end Kubernetes cluster monitoring with Prometheus using the Prometheus Operator.

What's Changed

[kube-prometheus-stack] - targetLabels for all exporters by @kranthikirang in https://github.com/prometheus-community/helm-charts/pull/4973

Full Changelog: prometheus-community/helm-charts@prometheus-mongodb-exporter-3.9.0...kube-prometheus-stack-65.8.0

`v65.7.0`

Compare Source

kube-prometheus-stack collects Kubernetes manifests, Grafana dashboards, and Prometheus rules combined with documentation and scripts to provide easy to operate end-to-end Kubernetes cluster monitoring with Prometheus using the Prometheus Operator.

What's Changed

[kube-prometheus-stack] Implement Gateway API for AM/Thanos Ruler by @jkroepke in https://github.com/prometheus-community/helm-charts/pull/4971

Full Changelog: prometheus-community/helm-charts@prometheus-node-exporter-4.42.0...kube-prometheus-stack-65.7.0

`v65.6.0`

Compare Source

kube-prometheus-stack collects Kubernetes manifests, Grafana dashboards, and Prometheus rules combined with documentation and scripts to provide easy to operate end-to-end Kubernetes cluster monitoring with Prometheus using the Prometheus Operator.

What's Changed

[kube-prometheus-stack] Implement Gateway API by @jkroepke in https://github.com/prometheus-community/helm-charts/pull/4646

Full Changelog: prometheus-community/helm-charts@prometheus-25.29.0...kube-prometheus-stack-65.6.0

`v65.5.1`

Compare Source

kube-prometheus-stack collects Kubernetes manifests, Grafana dashboards, and Prometheus rules combined with documentation and scripts to provide easy to operate end-to-end Kubernetes cluster monitoring with Prometheus using the Prometheus Operator.

What's Changed

[kube-prometheus-stack] sets podAntiAffinity default to soft by @Pandry in https://github.com/prometheus-community/helm-charts/pull/4950

New Contributors

@Pandry made their first contribution in https://github.com/prometheus-community/helm-charts/pull/4950

Full Changelog: prometheus-community/helm-charts@prometheus-ipmi-exporter-0.5.0...kube-prometheus-stack-65.5.1

`v65.5.0`

Compare Source

kube-prometheus-stack collects Kubernetes manifests, Grafana dashboards, and Prometheus rules combined with documentation and scripts to provide easy to operate end-to-end Kubernetes cluster monitoring with Prometheus using the Prometheus Operator.

What's Changed

[kube-prometheus-stack] bump operator to 0.77.2 + deps by @DrFaust92 in https://github.com/prometheus-community/helm-charts/pull/4934

Full Changelog: prometheus-community/helm-charts@kube-prometheus-stack-65.4.0...kube-prometheus-stack-65.5.0

`v65.4.0`

Compare Source

kube-prometheus-stack collects Kubernetes manifests, Grafana dashboards, and Prometheus rules combined with documentation and scripts to provide easy to operate end-to-end Kubernetes cluster monitoring with Prometheus using the Prometheus Operator.

What's Changed

[kube-prometheus-stack] Add downward compat for Prom CRD by @schnatterer in https://github.com/prometheus-community/helm-charts/pull/4906

Full Changelog: prometheus-community/helm-charts@prometheus-node-exporter-4.40.0...kube-prometheus-stack-65.4.0

`v65.3.2`

Compare Source

kube-prometheus-stack collects Kubernetes manifests, Grafana dashboards, and Prometheus rules combined with documentation and scripts to provide easy to operate end-to-end Kubernetes cluster monitoring with Prometheus using the Prometheus Operator.

What's Changed

[kube-prometheus-stack] Allow disabling the nodes-aix dashboard by @tberreis in https://github.com/prometheus-community/helm-charts/pull/4925

Full Changelog: prometheus-community/helm-charts@alertmanager-snmp-notifier-0.4.0...kube-prometheus-stack-65.3.2

`v65.3.1`

Compare Source

kube-prometheus-stack collects Kubernetes manifests, Grafana dashboards, and Prometheus rules combined with documentation and scripts to provide easy to operate end-to-end Kubernetes cluster monitoring with Prometheus using the Prometheus Operator.

What's Changed

[kube-prometheus-stack] fix Provision Grafana comment example by @VergeDX in https://github.com/prometheus-community/helm-charts/pull/4919

New Contributors

@VergeDX made their first contribution in https://github.com/prometheus-community/helm-charts/pull/4919

Full Changelog: prometheus-community/helm-charts@prometheus-operator-admission-webhook-0.16.0...kube-prometheus-stack-65.3.1

`v65.3.0`

Compare Source

kube-prometheus-stack collects Kubernetes manifests, Grafana dashboards, and Prometheus rules combined with documentation and scripts to provide easy to operate end-to-end Kubernetes cluster monitoring with Prometheus using the Prometheus Operator.

What's Changed

[kube-prometheus-stack] support kubelet endpoint slices by @DrFaust92 in https://github.com/prometheus-community/helm-charts/pull/4899

Full Changelog: prometheus-community/helm-charts@prometheus-mongodb-exporter-3.7.2...kube-prometheus-stack-65.3.0

`v65.2.0`

Compare Source

kube-prometheus-stack collects Kubernetes manifests, Grafana dashboards, and Prometheus rules combined with documentation and scripts to provide easy to operate end-to-end Kubernetes cluster monitoring with Prometheus using the Prometheus Operator.

What's Changed

[kube-prometheus-stack] bump chart dependencies by @dotdc in https://github.com/prometheus-community/helm-charts/pull/4910

Full Changelog: prometheus-community/helm-charts@kube-state-metrics-5.26.0...kube-prometheus-stack-65.2.0

`v65.1.1`

Compare Source

kube-prometheus-stack collects Kubernetes manifests, Grafana dashboards, and Prometheus rules combined with documentation and scripts to provide easy to operate end-to-end Kubernetes cluster monitoring with Prometheus using the Prometheus Operator.

What's Changed

[kube-prometheus-stack] add prometheus label "pod" to etcd alerts by @jusch23 in https://github.com/prometheus-community/helm-charts/pull/4891

New Contributors

@jusch23 made their first contribution in https://github.com/prometheus-community/helm-charts/pull/4891

Full Changelog: prometheus-community/helm-charts@prometheus-stackdriver-exporter-4.6.1...kube-prometheus-stack-65.1.1

`v65.1.0`

Compare Source

kube-prometheus-stack collects Kubernetes manifests, Grafana dashboards, and Prometheus rules combined with documentation and scripts to provide easy to operate end-to-end Kubernetes cluster monitoring with Prometheus using the Prometheus Operator.

What's Changed

[kube-prometheus-stack] add tpl to prometheus service account by @denzhel in https://github.com/prometheus-community/helm-charts/pull/4900

Full Changelog: prometheus-community/helm-charts@prometheus-sql-exporter-0.1.1...kube-prometheus-stack-65.1.0

`v65.0.0`

Compare Source

kube-prometheus-stack collects Kubernetes manifests, Grafana dashboards, and Prometheus rules combined with documentation and scripts to provide easy to operate end-to-end Kubernetes cluster monitoring with Prometheus using the Prometheus Operator.

What's Changed

[kube-prometheus-stack] bump to 0.77.1 by @DrFaust92 in https://github.com/prometheus-community/helm-charts/pull/4889

Full Changelog: prometheus-community/helm-charts@prometheus-fastly-exporter-0.5.0...kube-prometheus-stack-65.0.0

`v64.0.0`

Compare Source

kube-prometheus-stack collects Kubernetes manifests, Grafana dashboards, and Prometheus rules combined with documentation and scripts to provide easy to operate end-to-end Kubernetes cluster monitoring with Prometheus using the Prometheus Operator.

What's Changed

[kube-prometheus-stack] Revert "Add downward compat for Prom CRD (#4818)" by @jkroepke in https://github.com/prometheus-community/helm-charts/pull/4883

Full Changelog: prometheus-community/helm-charts@prometheus-snmp-exporter-5.5.1...kube-prometheus-stack-64.0.0

`v63.1.0`

Compare Source

kube-prometheus-stack collects Kubernetes manifests, Grafana dashboards, and Prometheus rules combined with documentation and scripts to provide easy to operate end-to-end Kubernetes cluster monitoring with Prometheus using the Prometheus Operator.

What's Changed

[kube-prometheus-stack] Add support for alertmanager cluster.label by @mfinelli in https://github.com/prometheus-community/helm-charts/pull/4877

New Contributors

@mfinelli made their first contribution in https://github.com/prometheus-community/helm-charts/pull/4877

Full Changelog: prometheus-community/helm-charts@kube-prometheus-stack-63.0.0...kube-prometheus-stack-63.1.0

`v63.0.0`

Compare Source

kube-prometheus-stack collects Kubernetes manifests, Grafana dashboards, and Prometheus rules combined with documentation and scripts to provide easy to operate end-to-end Kubernetes cluster monitoring with Prometheus using the Prometheus Operator.

What's Changed

[kube-prometheus-stack] Add downward compat for Prom CRD by @schnatterer in https://github.com/prometheus-community/helm-charts/pull/4818

New Contributors

@schnatterer made their first contribution in https://github.com/prometheus-community/helm-charts/pull/4818

Full Changelog: prometheus-community/helm-charts@prometheus-conntrack-stats-exporter-0.5.11...kube-prometheus-stack-63.0.0

Configuration

📅 Schedule: Branch creation - "after 9am,before 5pm" (UTC), Automerge - At any time (no schedule defined).

🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

♻ Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 Ignore: Close this PR and you won't be reminded about this update again.

If you want to rebase/retry this PR, check this box

This PR was generated by Mend Renovate. View the repository job log.

github-actions · 2024-12-16T04:13:36Z

--- HelmRelease: monitoring/kube-prometheus-stack DaemonSet: monitoring/kube-prometheus-stack-prometheus-node-exporter

+++ HelmRelease: monitoring/kube-prometheus-stack DaemonSet: monitoring/kube-prometheus-stack-prometheus-node-exporter

@@ -95,12 +95,25 @@

           mountPath: /host/root
           mountPropagation: HostToContainer
           readOnly: true
       hostNetwork: true
       hostPID: true
       hostIPC: false
+      affinity:
+        nodeAffinity:
+          requiredDuringSchedulingIgnoredDuringExecution:
+            nodeSelectorTerms:
+            - matchExpressions:
+              - key: eks.amazonaws.com/compute-type
+                operator: NotIn
+                values:
+                - fargate
+              - key: type
+                operator: NotIn
+                values:
+                - virtual-kubelet
       nodeSelector:
         kubernetes.io/os: linux
       tolerations:
       - effect: NoExecute
         key: CriticalAddonsOnly
         operator: Exists
--- HelmRelease: monitoring/kube-prometheus-stack Deployment: monitoring/kube-prometheus-stack-kube-state-metrics

+++ HelmRelease: monitoring/kube-prometheus-stack Deployment: monitoring/kube-prometheus-stack-kube-state-metrics

@@ -44,13 +44,13 @@

       - name: kube-state-metrics
         args:
         - --port=8080
         - --resources=certificatesigningrequests,configmaps,cronjobs,daemonsets,deployments,endpoints,horizontalpodautoscalers,ingresses,jobs,leases,limitranges,mutatingwebhookconfigurations,namespaces,networkpolicies,nodes,persistentvolumeclaims,persistentvolumes,poddisruptionbudgets,pods,replicasets,replicationcontrollers,resourcequotas,secrets,services,statefulsets,storageclasses,validatingwebhookconfigurations,volumeattachments
         - --metric-labels-allowlist=persistentvolumeclaims=[*]
         imagePullPolicy: IfNotPresent
-        image: registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.13.0
+        image: registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.14.0
         ports:
         - containerPort: 8080
           name: http
         livenessProbe:
           failureThreshold: 3
           httpGet:
@@ -64,13 +64,13 @@

           timeoutSeconds: 5
         readinessProbe:
           failureThreshold: 3
           httpGet:
             httpHeaders: null
             path: /readyz
-            port: 8080
+            port: 8081
             scheme: HTTP
           initialDelaySeconds: 5
           periodSeconds: 10
           successThreshold: 1
           timeoutSeconds: 5
         resources: {}
--- HelmRelease: monitoring/kube-prometheus-stack Deployment: monitoring/kube-prometheus-stack-operator

+++ HelmRelease: monitoring/kube-prometheus-stack Deployment: monitoring/kube-prometheus-stack-operator

@@ -31,23 +31,25 @@

         app: kube-prometheus-stack-operator
         app.kubernetes.io/name: kube-prometheus-stack-prometheus-operator
         app.kubernetes.io/component: prometheus-operator
     spec:
       containers:
       - name: kube-prometheus-stack
-        image: quay.io/prometheus-operator/prometheus-operator:v0.76.1
+        image: quay.io/prometheus-operator/prometheus-operator:v0.79.2
         imagePullPolicy: IfNotPresent
         args:
         - --kubelet-service=kube-system/kube-prometheus-stack-kubelet
+        - --kubelet-endpoints=true
+        - --kubelet-endpointslice=false
         - --localhost=127.0.0.1
-        - --prometheus-config-reloader=quay.io/prometheus-operator/prometheus-config-reloader:v0.76.1
+        - --prometheus-config-reloader=quay.io/prometheus-operator/prometheus-config-reloader:v0.79.2
         - --config-reloader-cpu-request=0
         - --config-reloader-cpu-limit=0
         - --config-reloader-memory-request=0
         - --config-reloader-memory-limit=0
-        - --thanos-default-base-image=quay.io/thanos/thanos:v0.36.1
+        - --thanos-default-base-image=quay.io/thanos/thanos:v0.37.2
         - --secret-field-selector=type!=kubernetes.io/dockercfg,type!=kubernetes.io/service-account-token,type!=helm.sh/release.v1
         - --web.enable-tls=true
         - --web.cert-file=/cert/cert
         - --web.key-file=/cert/key
         - --web.listen-address=:10250
         - --web.tls-min-version=VersionTLS13
@@ -99,7 +101,8 @@

         runAsNonRoot: true
         runAsUser: 65534
         seccompProfile:
           type: RuntimeDefault
       serviceAccountName: kube-prometheus-stack-operator
       automountServiceAccountToken: true
+      terminationGracePeriodSeconds: 30
 
--- HelmRelease: monitoring/kube-prometheus-stack Prometheus: monitoring/kube-prometheus-stack

+++ HelmRelease: monitoring/kube-prometheus-stack Prometheus: monitoring/kube-prometheus-stack

@@ -17,14 +17,14 @@

     alertmanagers:
     - namespace: monitoring
       name: kube-prometheus-stack-alertmanager
       port: http-web
       pathPrefix: /
       apiVersion: v2
-  image: quay.io/prometheus/prometheus:v2.54.1
-  version: v2.54.1
+  image: quay.io/prometheus/prometheus:v3.1.0
+  version: v3.1.0
   externalUrl: http://prometheus.deangalvin.dev/
   paused: false
   replicas: 2
   shards: 1
   logLevel: info
   logFormat: logfmt
@@ -66,9 +66,25 @@

     volumeClaimTemplate:
       spec:
         resources:
           requests:
             storage: 50Gi
         storageClassName: ceph-rbd
+  affinity:
+    podAntiAffinity:
+      preferredDuringSchedulingIgnoredDuringExecution:
+      - weight: 100
+        podAffinityTerm:
+          topologyKey: kubernetes.io/hostname
+          labelSelector:
+            matchExpressions:
+            - key: app.kubernetes.io/name
+              operator: In
+              values:
+              - prometheus
+            - key: prometheus
+              operator: In
+              values:
+              - kube-prometheus-stack
   portName: http-web
   hostNetwork: false
 
--- HelmRelease: monitoring/kube-prometheus-stack PrometheusRule: monitoring/kube-prometheus-stack-etcd

+++ HelmRelease: monitoring/kube-prometheus-stack PrometheusRule: monitoring/kube-prometheus-stack-etcd

@@ -19,29 +19,29 @@

       annotations:
         description: 'etcd cluster "{{ $labels.job }}": members are down ({{ $value
           }}).'
         summary: etcd cluster members are down.
       expr: |-
         max without (endpoint) (
-          sum without (instance) (up{job=~".*etcd.*"} == bool 0)
+          sum without (instance, pod) (up{job=~".*etcd.*"} == bool 0)
         or
           count without (To) (
-            sum without (instance) (rate(etcd_network_peer_sent_failures_total{job=~".*etcd.*"}[120s])) > 0.01
+            sum without (instance, pod) (rate(etcd_network_peer_sent_failures_total{job=~".*etcd.*"}[120s])) > 0.01
           )
         )
         > 0
       for: 10m
       labels:
         severity: critical
     - alert: etcdInsufficientMembers
       annotations:
         description: 'etcd cluster "{{ $labels.job }}": insufficient members ({{ $value
           }}).'
         summary: etcd cluster has insufficient number of members.
-      expr: sum(up{job=~".*etcd.*"} == bool 1) without (instance) < ((count(up{job=~".*etcd.*"})
-        without (instance) + 1) / 2)
+      expr: sum(up{job=~".*etcd.*"} == bool 1) without (instance, pod) < ((count(up{job=~".*etcd.*"})
+        without (instance, pod) + 1) / 2)
       for: 3m
       labels:
         severity: critical
     - alert: etcdNoLeader
       annotations:
         description: 'etcd cluster "{{ $labels.job }}": member {{ $labels.instance
@@ -55,13 +55,13 @@

       annotations:
         description: 'etcd cluster "{{ $labels.job }}": {{ $value }} leader changes
           within the last 15 minutes. Frequent elections may be a sign of insufficient
           resources, high network latency, or disruptions by other components and
           should be investigated.'
         summary: etcd cluster has high number of leader changes.
-      expr: increase((max without (instance) (etcd_server_leader_changes_seen_total{job=~".*etcd.*"})
+      expr: increase((max without (instance, pod) (etcd_server_leader_changes_seen_total{job=~".*etcd.*"})
         or 0*absent(etcd_server_leader_changes_seen_total{job=~".*etcd.*"}))[15m:1m])
         >= 4
       for: 5m
       labels:
         severity: warning
     - alert: etcdHighNumberOfFailedGRPCRequests
--- HelmRelease: monitoring/kube-prometheus-stack PrometheusRule: monitoring/kube-prometheus-stack-kube-apiserver-availability.rules

+++ HelmRelease: monitoring/kube-prometheus-stack PrometheusRule: monitoring/kube-prometheus-stack-kube-apiserver-availability.rules

@@ -24,44 +24,43 @@

         verb: read
       record: code:apiserver_request_total:increase30d
     - expr: sum by (cluster, code) (code_verb:apiserver_request_total:increase30d{verb=~"POST|PUT|PATCH|DELETE"})
       labels:
         verb: write
       record: code:apiserver_request_total:increase30d
-    - expr: sum by (cluster, verb, scope) (increase(apiserver_request_sli_duration_seconds_count{job="apiserver"}[1h]))
-      record: cluster_verb_scope:apiserver_request_sli_duration_seconds_count:increase1h
-    - expr: sum by (cluster, verb, scope) (avg_over_time(cluster_verb_scope:apiserver_request_sli_duration_seconds_count:increase1h[30d])
-        * 24 * 30)
-      record: cluster_verb_scope:apiserver_request_sli_duration_seconds_count:increase30d
     - expr: sum by (cluster, verb, scope, le) (increase(apiserver_request_sli_duration_seconds_bucket[1h]))
       record: cluster_verb_scope_le:apiserver_request_sli_duration_seconds_bucket:increase1h
     - expr: sum by (cluster, verb, scope, le) (avg_over_time(cluster_verb_scope_le:apiserver_request_sli_duration_seconds_bucket:increase1h[30d])
         * 24 * 30)
       record: cluster_verb_scope_le:apiserver_request_sli_duration_seconds_bucket:increase30d
+    - expr: sum by (cluster, verb, scope) (cluster_verb_scope_le:apiserver_request_sli_duration_seconds_bucket:increase1h{le="+Inf"})
+      record: cluster_verb_scope:apiserver_request_sli_duration_seconds_count:increase1h
+    - expr: sum by (cluster, verb, scope) (cluster_verb_scope_le:apiserver_request_sli_duration_seconds_bucket:increase30d{le="+Inf"})
+      record: cluster_verb_scope:apiserver_request_sli_duration_seconds_count:increase30d
     - expr: |-
         1 - (
           (
             # write too slow
             sum by (cluster) (cluster_verb_scope:apiserver_request_sli_duration_seconds_count:increase30d{verb=~"POST|PUT|PATCH|DELETE"})
             -
-            sum by (cluster) (cluster_verb_scope_le:apiserver_request_sli_duration_seconds_bucket:increase30d{verb=~"POST|PUT|PATCH|DELETE",le="1"})
+            sum by (cluster) (cluster_verb_scope_le:apiserver_request_sli_duration_seconds_bucket:increase30d{verb=~"POST|PUT|PATCH|DELETE",le=~"1(\\.0)?"})
           ) +
           (
             # read too slow
             sum by (cluster) (cluster_verb_scope:apiserver_request_sli_duration_seconds_count:increase30d{verb=~"LIST|GET"})
             -
             (
               (
-                sum by (cluster) (cluster_verb_scope_le:apiserver_request_sli_duration_seconds_bucket:increase30d{verb=~"LIST|GET",scope=~"resource|",le="1"})
+                sum by (cluster) (cluster_verb_scope_le:apiserver_request_sli_duration_seconds_bucket:increase30d{verb=~"LIST|GET",scope=~"resource|",le=~"1(\\.0)?"})
                 or
                 vector(0)
               )
               +
-              sum by (cluster) (cluster_verb_scope_le:apiserver_request_sli_duration_seconds_bucket:increase30d{verb=~"LIST|GET",scope="namespace",le="5"})
+              sum by (cluster) (cluster_verb_scope_le:apiserver_request_sli_duration_seconds_bucket:increase30d{verb=~"LIST|GET",scope="namespace",le=~"5(\\.0)?"})
               +
-              sum by (cluster) (cluster_verb_scope_le:apiserver_request_sli_duration_seconds_bucket:increase30d{verb=~"LIST|GET",scope="cluster",le="30"})
+              sum by (cluster) (cluster_verb_scope_le:apiserver_request_sli_duration_seconds_bucket:increase30d{verb=~"LIST|GET",scope="cluster",le=~"30(\\.0)?"})
             )
           ) +
           # errors
           sum by (cluster) (code:apiserver_request_total:increase30d{code=~"5.."} or vector(0))
         )
         /
@@ -73,20 +72,20 @@

         1 - (
           sum by (cluster) (cluster_verb_scope:apiserver_request_sli_duration_seconds_count:increase30d{verb=~"LIST|GET"})
           -
           (
             # too slow
             (
-              sum by (cluster) (cluster_verb_scope_le:apiserver_request_sli_duration_seconds_bucket:increase30d{verb=~"LIST|GET",scope=~"resource|",le="1"})
+              sum by (cluster) (cluster_verb_scope_le:apiserver_request_sli_duration_seconds_bucket:increase30d{verb=~"LIST|GET",scope=~"resource|",le=~"1(\\.0)?"})
               or
               vector(0)
             )
             +
-            sum by (cluster) (cluster_verb_scope_le:apiserver_request_sli_duration_seconds_bucket:increase30d{verb=~"LIST|GET",scope="namespace",le="5"})
+            sum by (cluster) (cluster_verb_scope_le:apiserver_request_sli_duration_seconds_bucket:increase30d{verb=~"LIST|GET",scope="namespace",le=~"5(\\.0)?"})
             +
-            sum by (cluster) (cluster_verb_scope_le:apiserver_request_sli_duration_seconds_bucket:increase30d{verb=~"LIST|GET",scope="cluster",le="30"})
+            sum by (cluster) (cluster_verb_scope_le:apiserver_request_sli_duration_seconds_bucket:increase30d{verb=~"LIST|GET",scope="cluster",le=~"30(\\.0)?"})
           )
           +
           # errors
           sum by (cluster) (code:apiserver_request_total:increase30d{verb="read",code=~"5.."} or vector(0))
         )
         /
@@ -97,13 +96,13 @@

     - expr: |-
         1 - (
           (
             # too slow
             sum by (cluster) (cluster_verb_scope:apiserver_request_sli_duration_seconds_count:increase30d{verb=~"POST|PUT|PATCH|DELETE"})
             -
-            sum by (cluster) (cluster_verb_scope_le:apiserver_request_sli_duration_seconds_bucket:increase30d{verb=~"POST|PUT|PATCH|DELETE",le="1"})
+            sum by (cluster) (cluster_verb_scope_le:apiserver_request_sli_duration_seconds_bucket:increase30d{verb=~"POST|PUT|PATCH|DELETE",le=~"1(\\.0)?"})
           )
           +
           # errors
           sum by (cluster) (code:apiserver_request_total:increase30d{verb="write",code=~"5.."} or vector(0))
         )
         /
--- HelmRelease: monitoring/kube-prometheus-stack PrometheusRule: monitoring/kube-prometheus-stack-kube-apiserver-burnrate.rules

+++ HelmRelease: monitoring/kube-prometheus-stack PrometheusRule: monitoring/kube-prometheus-stack-kube-apiserver-burnrate.rules

@@ -20,20 +20,20 @@

           (
             # too slow
             sum by (cluster) (rate(apiserver_request_sli_duration_seconds_count{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward"}[1d]))
             -
             (
               (
-                sum by (cluster) (rate(apiserver_request_sli_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward",scope=~"resource|",le="1"}[1d]))
-                or
-                vector(0)
-              )
-              +
-              sum by (cluster) (rate(apiserver_request_sli_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward",scope="namespace",le="5"}[1d]))
-              +
-              sum by (cluster) (rate(apiserver_request_sli_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward",scope="cluster",le="30"}[1d]))
+                sum by (cluster) (rate(apiserver_request_sli_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward",scope=~"resource|",le=~"1(\\.0)?"}[1d]))
+                or
+                vector(0)
+              )
+              +
+              sum by (cluster) (rate(apiserver_request_sli_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward",scope="namespace",le=~"5(\\.0)?"}[1d]))
+              +
+              sum by (cluster) (rate(apiserver_request_sli_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward",scope="cluster",le=~"30(\\.0)?"}[1d]))
             )
           )
           +
           # errors
           sum by (cluster) (rate(apiserver_request_total{job="apiserver",verb=~"LIST|GET",code=~"5.."}[1d]))
         )
@@ -47,20 +47,20 @@

           (
             # too slow
             sum by (cluster) (rate(apiserver_request_sli_duration_seconds_count{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward"}[1h]))
             -
             (
               (
-                sum by (cluster) (rate(apiserver_request_sli_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward",scope=~"resource|",le="1"}[1h]))
-                or
-                vector(0)
-              )
-              +
-              sum by (cluster) (rate(apiserver_request_sli_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward",scope="namespace",le="5"}[1h]))
-              +
-              sum by (cluster) (rate(apiserver_request_sli_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward",scope="cluster",le="30"}[1h]))
+                sum by (cluster) (rate(apiserver_request_sli_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward",scope=~"resource|",le=~"1(\\.0)?"}[1h]))
+                or
+                vector(0)
+              )
+              +
+              sum by (cluster) (rate(apiserver_request_sli_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward",scope="namespace",le=~"5(\\.0)?"}[1h]))
+              +
+              sum by (cluster) (rate(apiserver_request_sli_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward",scope="cluster",le=~"30(\\.0)?"}[1h]))
             )
           )
           +
           # errors
           sum by (cluster) (rate(apiserver_request_total{job="apiserver",verb=~"LIST|GET",code=~"5.."}[1h]))
         )
@@ -74,20 +74,20 @@

           (
             # too slow
             sum by (cluster) (rate(apiserver_request_sli_duration_seconds_count{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward"}[2h]))
             -
             (
               (
-                sum by (cluster) (rate(apiserver_request_sli_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward",scope=~"resource|",le="1"}[2h]))
-                or
-                vector(0)
-              )
-              +
-              sum by (cluster) (rate(apiserver_request_sli_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward",scope="namespace",le="5"}[2h]))
-              +
-              sum by (cluster) (rate(apiserver_request_sli_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward",scope="cluster",le="30"}[2h]))
+                sum by (cluster) (rate(apiserver_request_sli_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward",scope=~"resource|",le=~"1(\\.0)?"}[2h]))
+                or
+                vector(0)
+              )
+              +
+              sum by (cluster) (rate(apiserver_request_sli_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward",scope="namespace",le=~"5(\\.0)?"}[2h]))
+              +
+              sum by (cluster) (rate(apiserver_request_sli_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward",scope="cluster",le=~"30(\\.0)?"}[2h]))
             )
           )
           +
           # errors
           sum by (cluster) (rate(apiserver_request_total{job="apiserver",verb=~"LIST|GET",code=~"5.."}[2h]))
         )
@@ -101,20 +101,20 @@

           (
             # too slow
             sum by (cluster) (rate(apiserver_request_sli_duration_seconds_count{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward"}[30m]))
             -
             (
               (
-                sum by (cluster) (rate(apiserver_request_sli_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward",scope=~"resource|",le="1"}[30m]))
-                or
-                vector(0)
-              )
-              +
-              sum by (cluster) (rate(apiserver_request_sli_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward",scope="namespace",le="5"}[30m]))
-              +
-              sum by (cluster) (rate(apiserver_request_sli_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward",scope="cluster",le="30"}[30m]))
+                sum by (cluster) (rate(apiserver_request_sli_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward",scope=~"resource|",le=~"1(\\.0)?"}[30m]))
+                or
+                vector(0)
+              )
+              +
+              sum by (cluster) (rate(apiserver_request_sli_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward",scope="namespace",le=~"5(\\.0)?"}[30m]))
+              +
+              sum by (cluster) (rate(apiserver_request_sli_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward",scope="cluster",le=~"30(\\.0)?"}[30m]))
             )
           )
           +
           # errors
           sum by (cluster) (rate(apiserver_request_total{job="apiserver",verb=~"LIST|GET",code=~"5.."}[30m]))
         )
@@ -128,20 +128,20 @@

           (
             # too slow
             sum by (cluster) (rate(apiserver_request_sli_duration_seconds_count{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward"}[3d]))
             -
             (
               (
-                sum by (cluster) (rate(apiserver_request_sli_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward",scope=~"resource|",le="1"}[3d]))
-                or
-                vector(0)
-              )
-              +
-              sum by (cluster) (rate(apiserver_request_sli_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward",scope="namespace",le="5"}[3d]))
-              +
-              sum by (cluster) (rate(apiserver_request_sli_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward",scope="cluster",le="30"}[3d]))
+                sum by (cluster) (rate(apiserver_request_sli_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward",scope=~"resource|",le=~"1(\\.0)?"}[3d]))
+                or
+                vector(0)
+              )
+              +
+              sum by (cluster) (rate(apiserver_request_sli_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward",scope="namespace",le=~"5(\\.0)?"}[3d]))
+              +
+              sum by (cluster) (rate(apiserver_request_sli_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward",scope="cluster",le=~"30(\\.0)?"}[3d]))
             )
           )
           +
           # errors
           sum by (cluster) (rate(apiserver_request_total{job="apiserver",verb=~"LIST|GET",code=~"5.."}[3d]))
         )
@@ -155,20 +155,20 @@

           (
             # too slow
             sum by (cluster) (rate(apiserver_request_sli_duration_seconds_count{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward"}[5m]))
             -
             (
               (
-                sum by (cluster) (rate(apiserver_request_sli_duration_seconds_bucket{job="apiserver",verb=~"LIST|GET",subresource!~"proxy|attach|log|exec|portforward",scope=~"resource|",le="1"}[5m]))
-                or
-                vector(0)
[Diff truncated by flux-local]
--- HelmRelease: monitoring/kube-prometheus-stack PrometheusRule: monitoring/kube-prometheus-stack-kube-apiserver-slos

+++ HelmRelease: monitoring/kube-prometheus-stack PrometheusRule: monitoring/kube-prometheus-stack-kube-apiserver-slos

@@ -18,57 +18,57 @@

     - alert: KubeAPIErrorBudgetBurn
       annotations:
         description: The API server is burning too much error budget.
         runbook_url: https://runbooks.prometheus-operator.dev/runbooks/kubernetes/kubeapierrorbudgetburn
         summary: The API server is burning too much error budget.
       expr: |-
-        sum(apiserver_request:burnrate1h) > (14.40 * 0.01000)
-        and
-        sum(apiserver_request:burnrate5m) > (14.40 * 0.01000)
+        sum by (cluster) (apiserver_request:burnrate1h) > (14.40 * 0.01000)
+        and on (cluster)
+        sum by (cluster) (apiserver_request:burnrate5m) > (14.40 * 0.01000)
       for: 2m
       labels:
         long: 1h
         severity: critical
         short: 5m
     - alert: KubeAPIErrorBudgetBurn
       annotations:
         description: The API server is burning too much error budget.
         runbook_url: https://runbooks.prometheus-operator.dev/runbooks/kubernetes/kubeapierrorbudgetburn
         summary: The API server is burning too much error budget.
       expr: |-
-        sum(apiserver_request:burnrate6h) > (6.00 * 0.01000)
-        and
-        sum(apiserver_request:burnrate30m) > (6.00 * 0.01000)
+        sum by (cluster) (apiserver_request:burnrate6h) > (6.00 * 0.01000)
+        and on (cluster)
+        sum by (cluster) (apiserver_request:burnrate30m) > (6.00 * 0.01000)
       for: 15m
       labels:
         long: 6h
         severity: critical
         short: 30m
     - alert: KubeAPIErrorBudgetBurn
       annotations:
         description: The API server is burning too much error budget.
         runbook_url: https://runbooks.prometheus-operator.dev/runbooks/kubernetes/kubeapierrorbudgetburn
         summary: The API server is burning too much error budget.
       expr: |-
-        sum(apiserver_request:burnrate1d) > (3.00 * 0.01000)
-        and
-        sum(apiserver_request:burnrate2h) > (3.00 * 0.01000)
+        sum by (cluster) (apiserver_request:burnrate1d) > (3.00 * 0.01000)
+        and on (cluster)
+        sum by (cluster) (apiserver_request:burnrate2h) > (3.00 * 0.01000)
       for: 1h
       labels:
         long: 1d
         severity: warning
         short: 2h
     - alert: KubeAPIErrorBudgetBurn
       annotations:
         description: The API server is burning too much error budget.
         runbook_url: https://runbooks.prometheus-operator.dev/runbooks/kubernetes/kubeapierrorbudgetburn
         summary: The API server is burning too much error budget.
       expr: |-
-        sum(apiserver_request:burnrate3d) > (1.00 * 0.01000)
-        and
-        sum(apiserver_request:burnrate6h) > (1.00 * 0.01000)
+        sum by (cluster) (apiserver_request:burnrate3d) > (1.00 * 0.01000)
+        and on (cluster)
+        sum by (cluster) (apiserver_request:burnrate6h) > (1.00 * 0.01000)
       for: 3h
       labels:
         long: 3d
         severity: warning
         short: 6h
 
--- HelmRelease: monitoring/kube-prometheus-stack PrometheusRule: monitoring/kube-prometheus-stack-kubernetes-apps

+++ HelmRelease: monitoring/kube-prometheus-stack PrometheusRule: monitoring/kube-prometheus-stack-kubernetes-apps

@@ -126,13 +126,13 @@

         description: StatefulSet {{ $labels.namespace }}/{{ $labels.statefulset }}
           update has not been rolled out.
         runbook_url: https://runbooks.prometheus-operator.dev/runbooks/kubernetes/kubestatefulsetupdatenotrolledout
         summary: StatefulSet update has not been rolled out.
       expr: |-
         (
-          max by (namespace, statefulset) (
+          max by (namespace, statefulset, job, cluster) (
             kube_statefulset_status_current_revision{job="kube-state-metrics", namespace=~".*"}
               unless
             kube_statefulset_status_update_revision{job="kube-state-metrics", namespace=~".*"}
           )
             *
           (
@@ -148,13 +148,13 @@

       for: 15m
       labels:
         severity: warning
     - alert: KubeDaemonSetRolloutStuck
       annotations:
         description: DaemonSet {{ $labels.namespace }}/{{ $labels.daemonset }} has
-          not finished or progressed for at least 15 minutes.
+          not finished or progressed for at least 15m.
         runbook_url: https://runbooks.prometheus-operator.dev/runbooks/kubernetes/kubedaemonsetrolloutstuck
         summary: DaemonSet rollout is stuck.
       expr: |-
         (
           (
             kube_daemonset_status_current_number_scheduled{job="kube-state-metrics", namespace=~".*"}
@@ -180,19 +180,19 @@

         )
       for: 15m
       labels:
         severity: warning
     - alert: KubeContainerWaiting
       annotations:
-        description: pod/{{ $labels.pod }} in namespace {{ $labels.namespace }} on
+        description: 'pod/{{ $labels.pod }} in namespace {{ $labels.namespace }} on
           container {{ $labels.container}} has been in waiting state for longer than
-          1 hour.
+          1 hour. (reason: "{{ $labels.reason }}").'
         runbook_url: https://runbooks.prometheus-operator.dev/runbooks/kubernetes/kubecontainerwaiting
         summary: Pod container waiting longer than 1 hour
-      expr: sum by (namespace, pod, container, cluster) (kube_pod_container_status_waiting_reason{job="kube-state-metrics",
-        namespace=~".*"}) > 0
+      expr: kube_pod_container_status_waiting_reason{reason!="CrashLoopBackOff", job="kube-state-metrics",
+        namespace=~".*"} > 0
       for: 1h
       labels:
         severity: warning
     - alert: KubeDaemonSetNotScheduled
       annotations:
         description: '{{ $value }} Pods of DaemonSet {{ $labels.namespace }}/{{ $labels.daemonset
--- HelmRelease: monitoring/kube-prometheus-stack PrometheusRule: monitoring/kube-prometheus-stack-kubernetes-resources

+++ HelmRelease: monitoring/kube-prometheus-stack PrometheusRule: monitoring/kube-prometheus-stack-kubernetes-resources

@@ -117,14 +117,14 @@

         description: '{{ $value | humanizePercentage }} throttling of CPU in namespace
           {{ $labels.namespace }} for container {{ $labels.container }} in pod {{
           $labels.pod }}.'
         runbook_url: https://runbooks.prometheus-operator.dev/runbooks/kubernetes/cputhrottlinghigh
         summary: Processes experience elevated CPU throttling.
       expr: |-
-        sum(increase(container_cpu_cfs_throttled_periods_total{container!="", }[5m])) by (cluster, container, pod, namespace)
+        sum(increase(container_cpu_cfs_throttled_periods_total{container!="", job="kubelet", metrics_path="/metrics/cadvisor", }[5m])) without (id, metrics_path, name, image, endpoint, job, node)
           /
-        sum(increase(container_cpu_cfs_periods_total{}[5m])) by (cluster, container, pod, namespace)
+        sum(increase(container_cpu_cfs_periods_total{job="kubelet", metrics_path="/metrics/cadvisor", }[5m])) without (id, metrics_path, name, image, endpoint, job, node)
           > ( 25 / 100 )
       for: 15m
       labels:
         severity: info
 
--- HelmRelease: monitoring/kube-prometheus-stack PrometheusRule: monitoring/kube-prometheus-stack-kubernetes-system-apiserver

+++ HelmRelease: monitoring/kube-prometheus-stack PrometheusRule: monitoring/kube-prometheus-stack-kubernetes-system-apiserver

@@ -15,42 +15,45 @@

   groups:
   - name: kubernetes-system-apiserver
     rules:
     - alert: KubeClientCertificateExpiration
       annotations:
         description: A client certificate used to authenticate to kubernetes apiserver
-          is expiring in less than 7.0 days.
+          is expiring in less than 7.0 days on cluster {{ $labels.cluster }}.
         runbook_url: https://runbooks.prometheus-operator.dev/runbooks/kubernetes/kubeclientcertificateexpiration
         summary: Client certificate is about to expire.
-      expr: apiserver_client_certificate_expiration_seconds_count{job="apiserver"}
-        > 0 and on (job) histogram_quantile(0.01, sum by (job, le) (rate(apiserver_client_certificate_expiration_seconds_bucket{job="apiserver"}[5m])))
-        < 604800
+      expr: |-
+        histogram_quantile(0.01, sum without (namespace, service, endpoint) (rate(apiserver_client_certificate_expiration_seconds_bucket{job="apiserver"}[5m]))) < 604800
+        and
+        on (job, cluster, instance) apiserver_client_certificate_expiration_seconds_count{job="apiserver"} > 0
       for: 5m
       labels:
         severity: warning
     - alert: KubeClientCertificateExpiration
       annotations:
         description: A client certificate used to authenticate to kubernetes apiserver
-          is expiring in less than 24.0 hours.
+          is expiring in less than 24.0 hours on cluster {{ $labels.cluster }}.
         runbook_url: https://runbooks.prometheus-operator.dev/runbooks/kubernetes/kubeclientcertificateexpiration
         summary: Client certificate is about to expire.
-      expr: apiserver_client_certificate_expiration_seconds_count{job="apiserver"}
-        > 0 and on (job) histogram_quantile(0.01, sum by (job, le) (rate(apiserver_client_certificate_expiration_seconds_bucket{job="apiserver"}[5m])))
-        < 86400
+      expr: |-
+        histogram_quantile(0.01, sum without (namespace, service, endpoint) (rate(apiserver_client_certificate_expiration_seconds_bucket{job="apiserver"}[5m]))) < 86400
+        and
+        on (job, cluster, instance) apiserver_client_certificate_expiration_seconds_count{job="apiserver"} > 0
       for: 5m
       labels:
         severity: critical
     - alert: KubeAggregatedAPIErrors
       annotations:
-        description: Kubernetes aggregated API {{ $labels.name }}/{{ $labels.namespace
-          }} has reported errors. It has appeared unavailable {{ $value | humanize
-          }} times averaged over the past 10m.
+        description: Kubernetes aggregated API {{ $labels.instance }}/{{ $labels.name
+          }} has reported {{ $labels.reason }} errors on cluster {{ $labels.cluster
+          }}.
         runbook_url: https://runbooks.prometheus-operator.dev/runbooks/kubernetes/kubeaggregatedapierrors
         summary: Kubernetes aggregated API has reported errors.
-      expr: sum by (name, namespace, cluster)(increase(aggregator_unavailable_apiservice_total{job="apiserver"}[10m]))
-        > 4
+      expr: sum by (cluster, instance, name, reason)(increase(aggregator_unavailable_apiservice_total{job="apiserver"}[1m]))
+        > 0
+      for: 10m
       labels:
         severity: warning
     - alert: KubeAggregatedAPIDown
       annotations:
         description: Kubernetes aggregated API {{ $labels.name }}/{{ $labels.namespace
           }} has been only {{ $value | humanize }}% available over the last 10m.
@@ -74,13 +77,14 @@

       annotations:
         description: The kubernetes apiserver has terminated {{ $value | humanizePercentage
           }} of its incoming requests.
         runbook_url: https://runbooks.prometheus-operator.dev/runbooks/kubernetes/kubeapiterminatedrequests
         summary: The kubernetes apiserver has terminated {{ $value | humanizePercentage
           }} of its incoming requests.
-      expr: sum(rate(apiserver_request_terminations_total{job="apiserver"}[10m]))  /
-        (  sum(rate(apiserver_request_total{job="apiserver"}[10m])) + sum(rate(apiserver_request_terminations_total{job="apiserver"}[10m]))
+      expr: sum by (cluster) (rate(apiserver_request_terminations_total{job="apiserver"}[10m]))
+        / ( sum by (cluster) (rate(apiserver_request_total{job="apiserver"}[10m]))
+        + sum by (cluster) (rate(apiserver_request_terminations_total{job="apiserver"}[10m]))
         ) > 0.20
       for: 5m
       labels:
         severity: warning
 
--- HelmRelease: monitoring/kube-prometheus-stack PrometheusRule: monitoring/kube-prometheus-stack-node-exporter

+++ HelmRelease: monitoring/kube-prometheus-stack PrometheusRule: monitoring/kube-prometheus-stack-node-exporter

@@ -172,13 +172,14 @@

         > 0.01
       for: 1h
       labels:
         severity: warning
     - alert: NodeHighNumberConntrackEntriesUsed
       annotations:
-        description: '{{ $value | humanizePercentage }} of conntrack entries are used.'
+        description: '{{ $labels.instance }} {{ $value | humanizePercentage }} of
+          conntrack entries are used.'
         runbook_url: https://runbooks.prometheus-operator.dev/runbooks/node/nodehighnumberconntrackentriesused
         summary: Number of conntrack are getting close to the limit.
       expr: (node_nf_conntrack_entries{job="node-exporter"} / node_nf_conntrack_entries_limit)
         > 0.75
       labels:
         severity: warning
@@ -278,13 +279,13 @@

       annotations:
         description: |
           CPU usage at {{ $labels.instance }} has been above 90% for the last 15 minutes, is currently at {{ printf "%.2f" $value }}%.
         runbook_url: https://runbooks.prometheus-operator.dev/runbooks/node/nodecpuhighusage
         summary: High CPU usage.
       expr: sum without(mode) (avg without (cpu) (rate(node_cpu_seconds_total{job="node-exporter",
-        mode!="idle"}[2m]))) * 100 > 90
+        mode!~"idle|iowait"}[2m]))) * 100 > 90
       for: 15m
       labels:
         severity: info
     - alert: NodeSystemSaturation
       annotations:
         description: |
--- HelmRelease: monitoring/kube-prometheus-stack PrometheusRule: monitoring/kube-prometheus-stack-prometheus

+++ HelmRelease: monitoring/kube-prometheus-stack PrometheusRule: monitoring/kube-prometheus-stack-prometheus

@@ -68,17 +68,17 @@

         )
       for: 15m
       labels:
         severity: warning
     - alert: PrometheusErrorSendingAlertsToSomeAlertmanagers
       annotations:
-        description: '{{ printf "%.1f" $value }}% errors while sending alerts from
-          Prometheus {{$labels.namespace}}/{{$labels.pod}} to Alertmanager {{$labels.alertmanager}}.'
+        description: '{{ printf "%.1f" $value }}% of alerts sent by Prometheus {{$labels.namespace}}/{{$labels.pod}}
+          to Alertmanager {{$labels.alertmanager}} were affected by errors.'
         runbook_url: https://runbooks.prometheus-operator.dev/runbooks/prometheus/prometheuserrorsendingalertstosomealertmanagers
-        summary: Prometheus has encountered more than 1% errors sending alerts to
-          a specific Alertmanager.
+        summary: More than 1% of alerts sent by Prometheus to a specific Alertmanager
+          were affected by errors.
       expr: |-
         (
           rate(prometheus_notifications_errors_total{job="kube-prometheus-stack-prometheus",namespace="monitoring"}[5m])
         /
           rate(prometheus_notifications_sent_total{job="kube-prometheus-stack-prometheus",namespace="monitoring"}[5m])
         )
--- HelmRelease: monitoring/kube-prometheus-stack ServiceMonitor: monitoring/kube-prometheus-stack-kubelet

+++ HelmRelease: monitoring/kube-prometheus-stack ServiceMonitor: monitoring/kube-prometheus-stack-kubelet

@@ -11,12 +11,20 @@

     app.kubernetes.io/part-of: kube-prometheus-stack
     release: kube-prometheus-stack
     heritage: Helm
 spec:
   attachMetadata:
     node: false
+  jobLabel: k8s-app
+  namespaceSelector:
+    matchNames:
+    - kube-system
+  selector:
+    matchLabels:
+      app.kubernetes.io/name: kubelet
+      k8s-app: kubelet
   endpoints:
   - port: https-metrics
     scheme: https
     tlsConfig:
       caFile: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
       insecureSkipVerify: true
@@ -33,14 +41,16 @@

       sourceLabels:
       - __metrics_path__
       targetLabel: metrics_path
   - port: https-metrics
     scheme: https
     path: /metrics/cadvisor
+    interval: 10s
     honorLabels: true
     honorTimestamps: true
+    trackTimestampsStaleness: true
     tlsConfig:
       caFile: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
       insecureSkipVerify: true
     bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
     metricRelabelings:
     - action: drop
@@ -84,15 +94,7 @@

     bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
     relabelings:
     - action: replace
       sourceLabels:
       - __metrics_path__
       targetLabel: metrics_path
-  jobLabel: k8s-app
-  namespaceSelector:
-    matchNames:
-    - kube-system
-  selector:
-    matchLabels:
-      app.kubernetes.io/name: kubelet
-      k8s-app: kubelet

github-actions · 2024-12-16T04:13:37Z

--- kubernetes/apps/monitoring/kube-prometheus-stack/app Kustomization: flux-system/cluster-apps-kube-prometheus-stack HelmRelease: monitoring/kube-prometheus-stack

+++ kubernetes/apps/monitoring/kube-prometheus-stack/app Kustomization: flux-system/cluster-apps-kube-prometheus-stack HelmRelease: monitoring/kube-prometheus-stack

@@ -14,13 +14,13 @@

     spec:
       chart: kube-prometheus-stack
       sourceRef:
         kind: HelmRepository
         name: prometheus-community
         namespace: flux-system
-      version: 62.7.0
+      version: 67.11.0
   install:
     crds: CreateReplace
     createNamespace: true
     remediation:
       retries: 3
     timeout: 30m

renovate bot added renovate/helm type/major labels Dec 16, 2024

github-actions bot added the area/kubernetes label Dec 16, 2024

renovate bot changed the title ~~feat(helm)!: Update chart kube-prometheus-stack to 67.0.0~~ feat(helm)!: Update chart kube-prometheus-stack to 67.1.0 Dec 16, 2024

renovate bot force-pushed the renovate/kube-prometheus-stack-67.x branch 2 times, most recently from 8813243 to b504192 Compare December 17, 2024 01:32

renovate bot changed the title ~~feat(helm)!: Update chart kube-prometheus-stack to 67.1.0~~ feat(helm)!: Update chart kube-prometheus-stack to 67.2.0 Dec 17, 2024

renovate bot force-pushed the renovate/kube-prometheus-stack-67.x branch from b504192 to cf869d9 Compare December 17, 2024 23:09

renovate bot changed the title ~~feat(helm)!: Update chart kube-prometheus-stack to 67.2.0~~ feat(helm)!: Update chart kube-prometheus-stack to 67.2.1 Dec 17, 2024

renovate bot force-pushed the renovate/kube-prometheus-stack-67.x branch from cf869d9 to c047dc7 Compare December 18, 2024 01:14

renovate bot changed the title ~~feat(helm)!: Update chart kube-prometheus-stack to 67.2.1~~ feat(helm)!: Update chart kube-prometheus-stack to 67.3.0 Dec 18, 2024

renovate bot changed the title ~~feat(helm)!: Update chart kube-prometheus-stack to 67.3.0~~ feat(helm)!: Update chart kube-prometheus-stack to 67.3.1 Dec 18, 2024

renovate bot force-pushed the renovate/kube-prometheus-stack-67.x branch 2 times, most recently from a67ea23 to 615b86a Compare December 19, 2024 19:32

renovate bot changed the title ~~feat(helm)!: Update chart kube-prometheus-stack to 67.3.1~~ feat(helm)!: Update chart kube-prometheus-stack to 67.4.0 Dec 19, 2024

renovate bot force-pushed the renovate/kube-prometheus-stack-67.x branch from 615b86a to 6600f03 Compare December 25, 2024 10:40

renovate bot changed the title ~~feat(helm)!: Update chart kube-prometheus-stack to 67.4.0~~ feat(helm)!: Update chart kube-prometheus-stack to 67.5.0 Dec 25, 2024

renovate bot force-pushed the renovate/kube-prometheus-stack-67.x branch from 6600f03 to 6cda2c5 Compare January 3, 2025 10:08

renovate bot changed the title ~~feat(helm)!: Update chart kube-prometheus-stack to 67.5.0~~ feat(helm)!: Update chart kube-prometheus-stack to 67.6.0 Jan 3, 2025

renovate bot force-pushed the renovate/kube-prometheus-stack-67.x branch from 6cda2c5 to 142e72e Compare January 3, 2025 16:36

renovate bot changed the title ~~feat(helm)!: Update chart kube-prometheus-stack to 67.6.0~~ feat(helm)!: Update chart kube-prometheus-stack to 67.7.0 Jan 3, 2025

renovate bot changed the title ~~feat(helm)!: Update chart kube-prometheus-stack to 67.7.0~~ feat(helm)!: Update chart kube-prometheus-stack to 67.8.0 Jan 6, 2025

renovate bot force-pushed the renovate/kube-prometheus-stack-67.x branch from 142e72e to 99c2f5d Compare January 6, 2025 11:06

renovate bot changed the title ~~feat(helm)!: Update chart kube-prometheus-stack to 67.8.0~~ feat(helm)!: Update chart kube-prometheus-stack to 67.9.0 Jan 8, 2025

renovate bot force-pushed the renovate/kube-prometheus-stack-67.x branch 2 times, most recently from ddeab83 to 73008d2 Compare January 11, 2025 13:18

renovate bot changed the title ~~feat(helm)!: Update chart kube-prometheus-stack to 67.9.0~~ feat(helm)!: Update chart kube-prometheus-stack to 67.10.0 Jan 11, 2025

feat(helm)!: Update chart kube-prometheus-stack to 67.11.0

d2c9979

renovate bot force-pushed the renovate/kube-prometheus-stack-67.x branch from 73008d2 to d2c9979 Compare January 13, 2025 14:22

renovate bot changed the title ~~feat(helm)!: Update chart kube-prometheus-stack to 67.10.0~~ feat(helm)!: Update chart kube-prometheus-stack to 67.11.0 Jan 13, 2025

renovate bot changed the title ~~feat(helm)!: Update chart kube-prometheus-stack to 67.11.0~~ feat(helm)!: Update chart kube-prometheus-stack to 67.11.0 - autoclosed Jan 13, 2025

renovate bot closed this Jan 13, 2025

renovate bot deleted the renovate/kube-prometheus-stack-67.x branch January 13, 2025 20:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(helm)!: Update chart kube-prometheus-stack to 67.11.0 - autoclosed #1144

feat(helm)!: Update chart kube-prometheus-stack to 67.11.0 - autoclosed #1144

renovate bot commented Dec 16, 2024 •

edited

Loading

github-actions bot commented Dec 16, 2024 •

edited

Loading

github-actions bot commented Dec 16, 2024 •

edited

Loading

feat(helm)!: Update chart kube-prometheus-stack to 67.11.0 - autoclosed #1144

feat(helm)!: Update chart kube-prometheus-stack to 67.11.0 - autoclosed #1144

Conversation

renovate bot commented Dec 16, 2024 • edited Loading

Release Notes

What's Changed

What's Changed

New Contributors

What's Changed

What's Changed

What's Changed

What's Changed

What's Changed

New Contributors

What's Changed

What's Changed

What's Changed

New Contributors

What's Changed

What's Changed

What's Changed

What's Changed

New Contributors

What's Changed

What's Changed

What's Changed

What's Changed

New Contributors

What's Changed

What's Changed

What's Changed

New Contributors

What's Changed

What's Changed

What's Changed

What's Changed

New Contributors

What's Changed

New Contributors

Configuration

github-actions bot commented Dec 16, 2024 • edited Loading

github-actions bot commented Dec 16, 2024 • edited Loading

renovate bot commented Dec 16, 2024 •

edited

Loading

github-actions bot commented Dec 16, 2024 •

edited

Loading

github-actions bot commented Dec 16, 2024 •

edited

Loading