Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable alert severity overrides #617

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

saswatamcode
Copy link
Contributor

This PR extends the Alerting configuration to enable alert severity label value overrides.

As Pyrra generates alerts of two severities, critical and warning, there are cases where users might want to change these values to something else due to alert routing/pager config in different envs (for eg, not critical but high for stage).

This allows setting highSeverity and lowSeverity for alerts, which default to critical and warning.

Wdyt? 🙂

Signed-off-by: Saswata Mukherjee <[email protected]>
@metalmatze
Copy link
Member

Very interesting and happy to add something along those lines. Currently, I'm not sure about the wording and will give this another look.

@metalmatze
Copy link
Member

Looking at this again, I wonder should we at least make all four alerts configurable in terms of setting custom severity label values? I can see that make sense overall.

warning and critical have been decided on by Prometheus team as default severity label value and hence Pyrra uses the same by default.

@saswatamcode
Copy link
Contributor Author

Does something like the config below seem better?

apiVersion: pyrra.dev/v1alpha1
kind: ServiceLevelObjective
metadata:
  name: custom-severity
  namespace: monitoring
  labels:
    prometheus: k8s
    role: alert-rules
spec:
  target: 99
  window: 2w
  indicator:
    ratio:
      errors:
        metric: prometheus_operator_reconcile_errors_total
      total:
        metric: prometheus_operator_reconcile_operations_total
  alerting:
    absentSeverity: high
    windowBurnRateSeverity: [high, high, low, low]

This way user can override the SLOMetricAbsent alerts as well as the four window-based alerts. Not specifying would mean falling back to default and the windowBurnRateSeverity if specified, must always be of length 4, to avoid confusion.

warning and critical have been decided on by Prometheus team as default severity label value and hence Pyrra uses the same by default.

Yup these are great defaults and work well. This option just addresses the cases which deviate from this. 🙂

@saswatamcode
Copy link
Contributor Author

@metalmatze should I update this PR with the config above then?

@metalmatze
Copy link
Member

Sorry for dropping the ball on this one...

How about something similar to this?

apiVersion: pyrra.dev/v1alpha1
kind: ServiceLevelObjective
metadata:
  name: custom-severity
  namespace: monitoring
  labels:
    prometheus: k8s
    role: alert-rules
spec:
  target: "99"
  window: 2w
  indicator:
    ratio:
      errors:
        metric: prometheus_operator_reconcile_errors_total
      total:
        metric: prometheus_operator_reconcile_operations_total
  alerting:
    disabled: false
    name: ErrorBudgetBurn
    severities:
      absent: high
      level1: critical
      level2: error
      level3: warning
      level4: info

I think putting them into a map of some sorts makes it more ergonomic to configure? Then again, I'm not sure about the level* key... There the array might be better.

@lindeskar
Copy link

This would be a great addition! How can we move this forward?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants