Skip to content

Latest commit

 

History

History
33 lines (19 loc) · 1000 Bytes

cluster-health-score.md

File metadata and controls

33 lines (19 loc) · 1000 Bytes

Cluster Health Score

The health score starts at 100. Penalties reduce the score. There are three penalty types:

SevereErrorPenalty = 50
ErrorPenalty       = 15
WarningPenalty     = 3

WarningPenalty is applied when:

  • Single Cluster (Master exists on Cluster - for kops based kubernetes deployments on AWS)
  • Single Region
  • Predictive Disk Growth crosses a 90% threshold

ErrorPenalty is applied:

  • Any Nodes in the Cluster are Not Ready
  • Any Nodes are under MemoryPressure

SevereErrorPenalty is applied:

  • Memory Usage exceeds 90% of Available Memory on the Cluster

Alert

The Cluster Health alert is based on a threshold of change. For example an alert on 14 would alert anytime an Error penalty was applied.

Edit this doc on GitHub