Skip to content

Commit 2c80b7b

Browse files
Daniel Andrzejewskimajormoses
Daniel Andrzejewski
authored andcommitted
Added threshold for pods to be in the non ready state
1 parent bdfe039 commit 2c80b7b

File tree

2 files changed

+16
-1
lines changed

2 files changed

+16
-1
lines changed

CHANGELOG.md

+5
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,11 @@ This CHANGELOG follows the format listed [here ](https://github.com/sensu-plugin
55

66
## [Unreleased]
77

8+
### Chaned
9+
- `check-kube-pods-running.rb`: Skip a POD which is in the not ready state for shorter time than the specified time. Otherwise, the check alerts if we get lots of new PODs which are spawned every second and get up or get terminated longer than a minute. (@sys-ops)
10+
11+
12+
## [3.2.0] - 2018-11-21
813
### Changed
914
- `check-kube-service-available.rb`: Skip a service if its selector is empty. Otherwise all PODs in the cluster are listed with client.get_pods() call (including those that we do not want to monitor) (@sys-ops)
1015

bin/check-kube-pods-running.rb

+11-1
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,7 @@
3434
# Exclude wins when a node is in both include and exclude lists
3535
# --include-nodes Include the specified nodes (comma separated list), an
3636
# empty list includes all nodes
37+
# --time TIME Threshold for pods to be in the non ready state
3738
# -f, --filter FILTER Selector filter for pods to be checked
3839
# -p, --pods PODS List of pods to check
3940
# NOTES:
@@ -59,6 +60,12 @@ class AllPodsAreRunning < Sensu::Plugins::Kubernetes::CLI
5960
long: '--pods',
6061
default: 'all'
6162

63+
option :not_ready_time,
64+
description: 'Threshold for pods to be in the non ready state',
65+
long: '--time TIME',
66+
proc: proc(&:to_i),
67+
default: 300
68+
6269
option :pod_filter,
6370
description: 'Selector filter for pods to be checked',
6471
short: '-f FILTER',
@@ -110,7 +117,10 @@ def run
110117
next if should_exclude_node(pod.spec.nodeName)
111118
next unless pods_list.include?(pod.metadata.name) || pods_list.include?('all')
112119
next unless pod.status.phase != 'Succeeded' && !pod.status.conditions.nil?
113-
failed_pods << pod.metadata.name unless ready? pod
120+
pod_stamp = Time.parse(pod.status.startTime)
121+
if (Time.now.utc - pod_stamp.utc).to_i > config[:not_ready_time]
122+
failed_pods << pod.metadata.name unless ready? pod
123+
end
114124
end
115125

116126
if failed_pods.empty?

0 commit comments

Comments
 (0)