Skip to content

Commit

Permalink
add playbook links to documentation section for cpu memory limit util…
Browse files Browse the repository at this point in the history
…ization alerts (GoogleCloudPlatform#612)
  • Loading branch information
stevezease authored Aug 22, 2023
1 parent d808faf commit 4fa603e
Show file tree
Hide file tree
Showing 4 changed files with 4 additions and 4 deletions.
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
{
"displayName": "GKE Container - High CPU Limit Utilization (${CLUSTER_NAME} cluster)",
"documentation": {
"content": "- Containers that exceed CPU utilization limit are CPU throttled. To avoid application slowdown and unresponsiveness, keep CPU usage below the CPU utilization limit [View Documentation](https://cloud.google.com/blog/products/containers-kubernetes/kubernetes-best-practices-resource-requests-and-limits).\n- If alerts tend to be false positive or noisy, consider visiting the alert policy page and changing the threshold, the rolling (alignment) window, and the retest (duration) window. [View Documentation](https://cloud.google.com/monitoring/alerts/concepts-indepth)",
"content": "- Containers that exceed CPU utilization limit are CPU throttled. To avoid application slowdown and unresponsiveness, keep CPU usage below the CPU utilization limit [View Documentation](https://cloud.google.com/blog/products/containers-kubernetes/kubernetes-best-practices-resource-requests-and-limits).\n- If alerts tend to be false positive or noisy, consider visiting the alert policy page and changing the threshold, the rolling (alignment) window, and the retest (duration) window. [View Documentation](https://cloud.google.com/monitoring/alerts/concepts-indepth)\n- We recommend troubleshooting this incident with the [CPU Utilization interactive playbook](https://console.cloud.google.com/monitoring/dashboards/gke-troubleshooting/cpu?project=${PROJECT_ID}&f.sd_ts_playbook.cluster_name=${CLUSTER_NAME}&f.sd_ts_playbook.location=${CLUSTER_LOCATION}), which shows detailed instructions, metrics, and logs.",
"mimeType": "text/markdown"
},
"userLabels": {},
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
"displayName": "GKE Pod - FailedScheduling Log Event (${CLUSTER_NAME})",
"documentation": {
"content":
"- A \"FailedScheduling\" event occurs when a pending pod cannot be scheduled, This alert fires when an event with reason \"FailedSceduling\" occurs in the logs; limited to notifying once per hour.\n- We recommend troubleshooting this issue with the [Unschedulable Pods Interactive Playbook](https://console.cloud.google.com/monitoring/dashboards/gke-troubleshooting/unschedulable?project=${PROJECT_ID}&f.sd_ts_playbook.cluster_name=${CLUSTER_NAME}&f.sd_ts_playbook.location=${CLUSTER_LOCATION}) which shows detailed instructions, metrics, and logs.",
"- A \"FailedScheduling\" event occurs when a pending pod cannot be scheduled, This alert fires when an event with reason \"FailedSceduling\" occurs in the logs; limited to notifying once per hour.\n- We recommend troubleshooting this incident with the [Unschedulable Pods interactive playbook](https://console.cloud.google.com/monitoring/dashboards/gke-troubleshooting/unschedulable?project=${PROJECT_ID}&f.sd_ts_playbook.cluster_name=${CLUSTER_NAME}&f.sd_ts_playbook.location=${CLUSTER_LOCATION}), which shows detailed instructions, metrics, and logs.",
"mimeType": "text/markdown"
},
"userLabels": {},
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
{
"displayName": "GKE Container - High Memory Limit Utilization (${CLUSTER_NAME} cluster)",
"documentation": {
"content": "- Containers that exceed Memory utilization limit are terminated. To avoid Out of Memory (OOM) failures, keep memory usage below the memory utilization limit [View Documentation](https://cloud.google.com/blog/products/containers-kubernetes/kubernetes-best-practices-resource-requests-and-limits).\n- If alerts tend to be false positive or noisy, consider visiting the alert policy page and changing the threshold, the rolling (alignment) window, and the retest (duration) window. [View Documentation](https://cloud.google.com/monitoring/alerts/concepts-indepth)",
"content": "- Containers that exceed Memory utilization limit are terminated. To avoid Out of Memory (OOM) failures, keep memory usage below the memory utilization limit [View Documentation](https://cloud.google.com/blog/products/containers-kubernetes/kubernetes-best-practices-resource-requests-and-limits).\n- If alerts tend to be false positive or noisy, consider visiting the alert policy page and changing the threshold, the rolling (alignment) window, and the retest (duration) window. [View Documentation](https://cloud.google.com/monitoring/alerts/concepts-indepth)\n- We recommend troubleshooting this incident with the [Memory Utilization interactive playbook](https://console.cloud.google.com/monitoring/dashboards/gke-troubleshooting/memory?project=${PROJECT_ID}&f.sd_ts_playbook.cluster_name=${CLUSTER_NAME}&f.sd_ts_playbook.location=${CLUSTER_LOCATION}), which shows detailed instructions, metrics, and logs.",
"mimeType": "text/markdown"
},
"userLabels": {},
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
{
"displayName": "GKE Container - Restarts (${CLUSTER_NAME} cluster)",
"documentation": {
"content": "- Container restarts are commonly caused by memory/cpu usage issues and application failures.\n- By default, this alert notifies an incident when there is more than 1 container restart in a 5 minute window. If alerts tend to be false positive or noisy, consider visiting the alert policy page and changing the threshold, the rolling (alignment) window, and the retest (duration) window. [View Documentation](https://cloud.google.com/monitoring/alerts/concepts-indepth).\n- We recommend troubleshooting this issue with the [Interactive Playbook](https://console.cloud.google.com/monitoring/dashboards/gke-troubleshooting/crashloop?project=${PROJECT_ID}&f.sd_ts_playbook.cluster_name=${CLUSTER_NAME}&f.sd_ts_playbook.location=${CLUSTER_LOCATION}) for restarting containers which shows detailed instructions, metrics, and logs.",
"content": "- Container restarts are commonly caused by memory/cpu usage issues and application failures.\n- By default, this alert notifies an incident when there is more than 1 container restart in a 5 minute window. If alerts tend to be false positive or noisy, consider visiting the alert policy page and changing the threshold, the rolling (alignment) window, and the retest (duration) window. [View Documentation](https://cloud.google.com/monitoring/alerts/concepts-indepth).\n- We recommend troubleshooting this incident with the [interactive playbook](https://console.cloud.google.com/monitoring/dashboards/gke-troubleshooting/crashloop?project=${PROJECT_ID}&f.sd_ts_playbook.cluster_name=${CLUSTER_NAME}&f.sd_ts_playbook.location=${CLUSTER_LOCATION}) for restarting containers, which shows detailed instructions, metrics, and logs.",
"mimeType": "text/markdown"
},
"userLabels": {},
Expand Down

0 comments on commit 4fa603e

Please sign in to comment.