add playbook links to documentation section for cpu memory limit util…

…ization alerts (GoogleCloudPlatform#612)
observIQ · Aug 22, 2023 · 4fa603e · 4fa603e
1 parent d808faf
commit 4fa603e
Show file tree

Hide file tree

Showing 4 changed files with 4 additions and 4 deletions.
diff --git a/alerts/google-gke/cpu-limit-utilization-containers-within-cluster.v1.json b/alerts/google-gke/cpu-limit-utilization-containers-within-cluster.v1.json
@@ -1,7 +1,7 @@
 {
   "displayName": "GKE Container - High CPU Limit Utilization (${CLUSTER_NAME} cluster)",
   "documentation": {
-    "content": "- Containers that exceed CPU utilization limit are CPU throttled. To avoid application slowdown and unresponsiveness, keep CPU usage below the CPU utilization limit [View Documentation](https://cloud.google.com/blog/products/containers-kubernetes/kubernetes-best-practices-resource-requests-and-limits).\n- If alerts tend to be false positive or noisy, consider visiting the alert policy page and changing the threshold, the rolling (alignment) window, and the retest (duration) window. [View Documentation](https://cloud.google.com/monitoring/alerts/concepts-indepth)",
+    "content": "- Containers that exceed CPU utilization limit are CPU throttled. To avoid application slowdown and unresponsiveness, keep CPU usage below the CPU utilization limit [View Documentation](https://cloud.google.com/blog/products/containers-kubernetes/kubernetes-best-practices-resource-requests-and-limits).\n- If alerts tend to be false positive or noisy, consider visiting the alert policy page and changing the threshold, the rolling (alignment) window, and the retest (duration) window. [View Documentation](https://cloud.google.com/monitoring/alerts/concepts-indepth)\n- We recommend troubleshooting this incident with the [CPU Utilization interactive playbook](https://console.cloud.google.com/monitoring/dashboards/gke-troubleshooting/cpu?project=${PROJECT_ID}&f.sd_ts_playbook.cluster_name=${CLUSTER_NAME}&f.sd_ts_playbook.location=${CLUSTER_LOCATION}), which shows detailed instructions, metrics, and logs.",
     "mimeType": "text/markdown"
   },
   "userLabels": {},

diff --git a/alerts/google-gke/failedscheduling-log-event-within-cluster.v1.json b/alerts/google-gke/failedscheduling-log-event-within-cluster.v1.json
@@ -2,7 +2,7 @@
   "displayName": "GKE Pod - FailedScheduling Log Event (${CLUSTER_NAME})",
   "documentation": {
     "content":
-        "- A \"FailedScheduling\" event occurs when a pending pod cannot be scheduled, This alert fires when an event with reason \"FailedSceduling\" occurs in the logs; limited to notifying once per hour.\n- We recommend troubleshooting this issue with the [Unschedulable Pods Interactive Playbook](https://console.cloud.google.com/monitoring/dashboards/gke-troubleshooting/unschedulable?project=${PROJECT_ID}&f.sd_ts_playbook.cluster_name=${CLUSTER_NAME}&f.sd_ts_playbook.location=${CLUSTER_LOCATION}) which shows detailed instructions, metrics, and logs.",
+        "- A \"FailedScheduling\" event occurs when a pending pod cannot be scheduled, This alert fires when an event with reason \"FailedSceduling\" occurs in the logs; limited to notifying once per hour.\n- We recommend troubleshooting this incident with the [Unschedulable Pods interactive playbook](https://console.cloud.google.com/monitoring/dashboards/gke-troubleshooting/unschedulable?project=${PROJECT_ID}&f.sd_ts_playbook.cluster_name=${CLUSTER_NAME}&f.sd_ts_playbook.location=${CLUSTER_LOCATION}), which shows detailed instructions, metrics, and logs.",
     "mimeType": "text/markdown"
   },
   "userLabels": {},

diff --git a/alerts/google-gke/memory-limit-utilization-containers-within-cluster.v1.json b/alerts/google-gke/memory-limit-utilization-containers-within-cluster.v1.json
@@ -1,7 +1,7 @@
 {
   "displayName": "GKE Container - High Memory Limit Utilization (${CLUSTER_NAME} cluster)",
   "documentation": {
-    "content": "- Containers that exceed Memory utilization limit are terminated. To avoid Out of Memory (OOM) failures, keep memory usage below the memory utilization limit [View Documentation](https://cloud.google.com/blog/products/containers-kubernetes/kubernetes-best-practices-resource-requests-and-limits).\n- If alerts tend to be false positive or noisy, consider visiting the alert policy page and changing the threshold, the rolling (alignment) window, and the retest (duration) window. [View Documentation](https://cloud.google.com/monitoring/alerts/concepts-indepth)",
+    "content": "- Containers that exceed Memory utilization limit are terminated. To avoid Out of Memory (OOM) failures, keep memory usage below the memory utilization limit [View Documentation](https://cloud.google.com/blog/products/containers-kubernetes/kubernetes-best-practices-resource-requests-and-limits).\n- If alerts tend to be false positive or noisy, consider visiting the alert policy page and changing the threshold, the rolling (alignment) window, and the retest (duration) window. [View Documentation](https://cloud.google.com/monitoring/alerts/concepts-indepth)\n- We recommend troubleshooting this incident with the [Memory Utilization interactive playbook](https://console.cloud.google.com/monitoring/dashboards/gke-troubleshooting/memory?project=${PROJECT_ID}&f.sd_ts_playbook.cluster_name=${CLUSTER_NAME}&f.sd_ts_playbook.location=${CLUSTER_LOCATION}), which shows detailed instructions, metrics, and logs.",
     "mimeType": "text/markdown"
   },
   "userLabels": {},

diff --git a/alerts/google-gke/restarts-containers-within-cluster.v1.json b/alerts/google-gke/restarts-containers-within-cluster.v1.json
@@ -1,7 +1,7 @@
 {
   "displayName": "GKE Container - Restarts (${CLUSTER_NAME} cluster)",
   "documentation": {
-    "content": "- Container restarts are commonly caused by memory/cpu usage issues and application failures.\n- By default, this alert notifies an incident when there is more than 1 container restart in a 5 minute window. If alerts tend to be false positive or noisy, consider visiting the alert policy page and changing the threshold, the rolling (alignment) window, and the retest (duration) window. [View Documentation](https://cloud.google.com/monitoring/alerts/concepts-indepth).\n- We recommend troubleshooting this issue with the [Interactive Playbook](https://console.cloud.google.com/monitoring/dashboards/gke-troubleshooting/crashloop?project=${PROJECT_ID}&f.sd_ts_playbook.cluster_name=${CLUSTER_NAME}&f.sd_ts_playbook.location=${CLUSTER_LOCATION}) for restarting containers which shows detailed instructions, metrics, and logs.",
+    "content": "- Container restarts are commonly caused by memory/cpu usage issues and application failures.\n- By default, this alert notifies an incident when there is more than 1 container restart in a 5 minute window. If alerts tend to be false positive or noisy, consider visiting the alert policy page and changing the threshold, the rolling (alignment) window, and the retest (duration) window. [View Documentation](https://cloud.google.com/monitoring/alerts/concepts-indepth).\n- We recommend troubleshooting this incident with the [interactive playbook](https://console.cloud.google.com/monitoring/dashboards/gke-troubleshooting/crashloop?project=${PROJECT_ID}&f.sd_ts_playbook.cluster_name=${CLUSTER_NAME}&f.sd_ts_playbook.location=${CLUSTER_LOCATION}) for restarting containers, which shows detailed instructions, metrics, and logs.",
     "mimeType": "text/markdown"
   },
   "userLabels": {},