diff --git a/alerts/google-gke/cpu-limit-utilization-containers-within-cluster.v1.json b/alerts/google-gke/cpu-limit-utilization-containers-within-cluster.v1.json index 3d30531eba..082791ddf3 100644 --- a/alerts/google-gke/cpu-limit-utilization-containers-within-cluster.v1.json +++ b/alerts/google-gke/cpu-limit-utilization-containers-within-cluster.v1.json @@ -1,7 +1,7 @@ { "displayName": "GKE Container - High CPU Limit Utilization (${CLUSTER_NAME} cluster)", "documentation": { - "content": "- Containers that exceed CPU utilization limit are CPU throttled. To avoid application slowdown and unresponsiveness, keep CPU usage below the CPU utilization limit [View Documentation](https://cloud.google.com/blog/products/containers-kubernetes/kubernetes-best-practices-resource-requests-and-limits).\n- If alerts tend to be false positive or noisy, consider visiting the alert policy page and changing the threshold, the rolling (alignment) window, and the retest (duration) window. [View Documentation](https://cloud.google.com/monitoring/alerts/concepts-indepth)", + "content": "- Containers that exceed CPU utilization limit are CPU throttled. To avoid application slowdown and unresponsiveness, keep CPU usage below the CPU utilization limit [View Documentation](https://cloud.google.com/blog/products/containers-kubernetes/kubernetes-best-practices-resource-requests-and-limits).\n- If alerts tend to be false positive or noisy, consider visiting the alert policy page and changing the threshold, the rolling (alignment) window, and the retest (duration) window. [View Documentation](https://cloud.google.com/monitoring/alerts/concepts-indepth)\n- We recommend troubleshooting this incident with the [CPU Utilization interactive playbook](https://console.cloud.google.com/monitoring/dashboards/gke-troubleshooting/cpu?project=${PROJECT_ID}&f.sd_ts_playbook.cluster_name=${CLUSTER_NAME}&f.sd_ts_playbook.location=${CLUSTER_LOCATION}), which shows detailed instructions, metrics, and logs.", "mimeType": "text/markdown" }, "userLabels": {}, diff --git a/alerts/google-gke/failedscheduling-log-event-within-cluster.v1.json b/alerts/google-gke/failedscheduling-log-event-within-cluster.v1.json index 5ca696e217..1d67b09322 100644 --- a/alerts/google-gke/failedscheduling-log-event-within-cluster.v1.json +++ b/alerts/google-gke/failedscheduling-log-event-within-cluster.v1.json @@ -2,7 +2,7 @@ "displayName": "GKE Pod - FailedScheduling Log Event (${CLUSTER_NAME})", "documentation": { "content": - "- A \"FailedScheduling\" event occurs when a pending pod cannot be scheduled, This alert fires when an event with reason \"FailedSceduling\" occurs in the logs; limited to notifying once per hour.\n- We recommend troubleshooting this issue with the [Unschedulable Pods Interactive Playbook](https://console.cloud.google.com/monitoring/dashboards/gke-troubleshooting/unschedulable?project=${PROJECT_ID}&f.sd_ts_playbook.cluster_name=${CLUSTER_NAME}&f.sd_ts_playbook.location=${CLUSTER_LOCATION}) which shows detailed instructions, metrics, and logs.", + "- A \"FailedScheduling\" event occurs when a pending pod cannot be scheduled, This alert fires when an event with reason \"FailedSceduling\" occurs in the logs; limited to notifying once per hour.\n- We recommend troubleshooting this incident with the [Unschedulable Pods interactive playbook](https://console.cloud.google.com/monitoring/dashboards/gke-troubleshooting/unschedulable?project=${PROJECT_ID}&f.sd_ts_playbook.cluster_name=${CLUSTER_NAME}&f.sd_ts_playbook.location=${CLUSTER_LOCATION}), which shows detailed instructions, metrics, and logs.", "mimeType": "text/markdown" }, "userLabels": {}, diff --git a/alerts/google-gke/memory-limit-utilization-containers-within-cluster.v1.json b/alerts/google-gke/memory-limit-utilization-containers-within-cluster.v1.json index 34f06b9f95..31805b1207 100644 --- a/alerts/google-gke/memory-limit-utilization-containers-within-cluster.v1.json +++ b/alerts/google-gke/memory-limit-utilization-containers-within-cluster.v1.json @@ -1,7 +1,7 @@ { "displayName": "GKE Container - High Memory Limit Utilization (${CLUSTER_NAME} cluster)", "documentation": { - "content": "- Containers that exceed Memory utilization limit are terminated. To avoid Out of Memory (OOM) failures, keep memory usage below the memory utilization limit [View Documentation](https://cloud.google.com/blog/products/containers-kubernetes/kubernetes-best-practices-resource-requests-and-limits).\n- If alerts tend to be false positive or noisy, consider visiting the alert policy page and changing the threshold, the rolling (alignment) window, and the retest (duration) window. [View Documentation](https://cloud.google.com/monitoring/alerts/concepts-indepth)", + "content": "- Containers that exceed Memory utilization limit are terminated. To avoid Out of Memory (OOM) failures, keep memory usage below the memory utilization limit [View Documentation](https://cloud.google.com/blog/products/containers-kubernetes/kubernetes-best-practices-resource-requests-and-limits).\n- If alerts tend to be false positive or noisy, consider visiting the alert policy page and changing the threshold, the rolling (alignment) window, and the retest (duration) window. [View Documentation](https://cloud.google.com/monitoring/alerts/concepts-indepth)\n- We recommend troubleshooting this incident with the [Memory Utilization interactive playbook](https://console.cloud.google.com/monitoring/dashboards/gke-troubleshooting/memory?project=${PROJECT_ID}&f.sd_ts_playbook.cluster_name=${CLUSTER_NAME}&f.sd_ts_playbook.location=${CLUSTER_LOCATION}), which shows detailed instructions, metrics, and logs.", "mimeType": "text/markdown" }, "userLabels": {}, diff --git a/alerts/google-gke/restarts-containers-within-cluster.v1.json b/alerts/google-gke/restarts-containers-within-cluster.v1.json index 8fd24c0e1e..79adf3e864 100644 --- a/alerts/google-gke/restarts-containers-within-cluster.v1.json +++ b/alerts/google-gke/restarts-containers-within-cluster.v1.json @@ -1,7 +1,7 @@ { "displayName": "GKE Container - Restarts (${CLUSTER_NAME} cluster)", "documentation": { - "content": "- Container restarts are commonly caused by memory/cpu usage issues and application failures.\n- By default, this alert notifies an incident when there is more than 1 container restart in a 5 minute window. If alerts tend to be false positive or noisy, consider visiting the alert policy page and changing the threshold, the rolling (alignment) window, and the retest (duration) window. [View Documentation](https://cloud.google.com/monitoring/alerts/concepts-indepth).\n- We recommend troubleshooting this issue with the [Interactive Playbook](https://console.cloud.google.com/monitoring/dashboards/gke-troubleshooting/crashloop?project=${PROJECT_ID}&f.sd_ts_playbook.cluster_name=${CLUSTER_NAME}&f.sd_ts_playbook.location=${CLUSTER_LOCATION}) for restarting containers which shows detailed instructions, metrics, and logs.", + "content": "- Container restarts are commonly caused by memory/cpu usage issues and application failures.\n- By default, this alert notifies an incident when there is more than 1 container restart in a 5 minute window. If alerts tend to be false positive or noisy, consider visiting the alert policy page and changing the threshold, the rolling (alignment) window, and the retest (duration) window. [View Documentation](https://cloud.google.com/monitoring/alerts/concepts-indepth).\n- We recommend troubleshooting this incident with the [interactive playbook](https://console.cloud.google.com/monitoring/dashboards/gke-troubleshooting/crashloop?project=${PROJECT_ID}&f.sd_ts_playbook.cluster_name=${CLUSTER_NAME}&f.sd_ts_playbook.location=${CLUSTER_LOCATION}) for restarting containers, which shows detailed instructions, metrics, and logs.", "mimeType": "text/markdown" }, "userLabels": {},