Skip to content

Commit

Permalink
fix(charts/istio-alerts): Remove source service from 5XX Alarm
Browse files Browse the repository at this point in the history
  • Loading branch information
corydolphin committed Apr 18, 2024
1 parent 70d0c3e commit 2b097c9
Show file tree
Hide file tree
Showing 4 changed files with 35 additions and 6 deletions.
2 changes: 1 addition & 1 deletion charts/istio-alerts/Chart.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ apiVersion: v2
name: istio-alerts
description: A Helm chart that provisions a series of alerts for istio VirtualServices
type: application
version: 0.3.2
version: 0.4.0
appVersion: 0.0.1
maintainers:
- name: diranged
Expand Down
21 changes: 18 additions & 3 deletions charts/istio-alerts/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# istio-alerts

![Version: 0.3.2](https://img.shields.io/badge/Version-0.3.2-informational?style=flat-square) ![Type: application](https://img.shields.io/badge/Type-application-informational?style=flat-square) ![AppVersion: 0.0.1](https://img.shields.io/badge/AppVersion-0.0.1-informational?style=flat-square)
![Version: 0.4.0](https://img.shields.io/badge/Version-0.4.0-informational?style=flat-square) ![Type: application](https://img.shields.io/badge/Type-application-informational?style=flat-square) ![AppVersion: 0.0.1](https://img.shields.io/badge/AppVersion-0.0.1-informational?style=flat-square)

A Helm chart that provisions a series of alerts for istio VirtualServices

Expand All @@ -13,6 +13,15 @@ A Helm chart that provisions a series of alerts for istio VirtualServices

## Upgrade Notes

### 0.3.x -> 0.4.x

**BREAKING: http5XXMonitor no longer alerts per source client workload.**

In version 0.2.x, there was a change to the default `http5XXMonitor` which
introduced calculation of the error rate per source workload. This 0.4.x
release reverts this behavior by default while allowing you to opt in to custom
selectors via the `monitorGroupingLabels` option.

### 0.2.x -> 0.3.x

**BREAKING: The DestinationServiceSelectorValidity alert rule requires kube-state-metrics.**
Expand All @@ -22,6 +31,12 @@ you do not have kube-state-metrics installed, you will need to disable the alert
`serviceRules.destinationServiceSelectorValidity.enabled` to `false`. This alert is used to detect
if the destinationServiceSelector is actually selecting series for a service that exists.

### 0.2.x

**BREAKING: http5XXMonitor now calculcates the 5XX error rate for each client**
source workload using the `source_workload` label, and will alert if any
`source_workload`'s error rate exceeds the specified `threshold`.

## Values

| Key | Type | Default | Description |
Expand All @@ -48,10 +63,10 @@ if the destinationServiceSelector is actually selecting series for a service tha
| serviceRules.highRequestLatency.percentile | float | `0.95` | Which percentile to monitor - should be between 0 and 1. Default is 95th percentile. |
| serviceRules.highRequestLatency.severity | string | `"warning"` | Severity of the latency monitor |
| serviceRules.highRequestLatency.threshold | float | `0.5` | The threshold for considering the latency monitor to be alarming. This is in seconds. |
| serviceRules.http5XXMonitor | object | `{"enabled":true,"for":"5m","monitorGroupingLabels":["destination_service_name","reporter","source_workload"],"severity":"critical","threshold":0.0005}` | Configuration related to the 5xx monitor for the VirtualService. |
| serviceRules.http5XXMonitor | object | `{"enabled":true,"for":"5m","monitorGroupingLabels":["destination_service_name","reporter"],"severity":"critical","threshold":0.0005}` | Configuration related to the 5xx monitor for the VirtualService. |
| serviceRules.http5XXMonitor.enabled | bool | `true` | Whether to enable the monitor on 5xxs returned by the VirtualService. |
| serviceRules.http5XXMonitor.for | string | `"5m"` | How long to evaluate the rate of 5xxs over. |
| serviceRules.http5XXMonitor.monitorGroupingLabels | list | `["destination_service_name","reporter","source_workload"]` | The set of labels to use when evaluating the ratio of the 5XX. |
| serviceRules.http5XXMonitor.monitorGroupingLabels[0] | string | `"destination_service_name"` | The set of labels to use when evaluating the ratio of the 5XX. |
| serviceRules.http5XXMonitor.severity | string | `"critical"` | Severity of the 5xx monitor |
| serviceRules.http5XXMonitor.threshold | float | `0.0005` | The threshold for considering the 5xx monitor to be alarming. Default is 0.05% error rate, i.e 99.95% reliability. |

Expand Down
15 changes: 15 additions & 0 deletions charts/istio-alerts/README.md.gotmpl
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,15 @@

## Upgrade Notes

### 0.3.x -> 0.4.x

**BREAKING: http5XXMonitor no longer alerts per source client workload.**

In version 0.2.x, there was a change to the default `http5XXMonitor` which
introduced calculation of the error rate per source workload. This 0.4.x
release reverts this behavior by default while allowing you to opt in to custom
selectors via the `monitorGroupingLabels` option.

### 0.2.x -> 0.3.x

**BREAKING: The DestinationServiceSelectorValidity alert rule requires kube-state-metrics.**
Expand All @@ -17,6 +26,12 @@ you do not have kube-state-metrics installed, you will need to disable the alert
`serviceRules.destinationServiceSelectorValidity.enabled` to `false`. This alert is used to detect
if the destinationServiceSelector is actually selecting series for a service that exists.

### 0.2.x

**BREAKING: http5XXMonitor now calculcates the 5XX error rate for each client**
source workload using the `source_workload` label, and will alert if any
`source_workload`'s error rate exceeds the specified `threshold`.

{{ template "chart.requirementsSection" . }}

{{ template "chart.valuesSection" . }}
Expand Down
3 changes: 1 addition & 2 deletions charts/istio-alerts/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -99,11 +99,10 @@ serviceRules:
# -- Severity of the 5xx monitor
severity: critical

# -- The set of labels to use when evaluating the ratio of the 5XX.
monitorGroupingLabels:
# -- The set of labels to use when evaluating the ratio of the 5XX.
- destination_service_name
- reporter
- source_workload

# -- Configuration related to the latency monitor for the VirtualService.
highRequestLatency:
Expand Down

0 comments on commit 2b097c9

Please sign in to comment.