Include some optional prometheus alert rules in the chart #221

sunng87 · 2025-01-17T06:55:44Z

It would be nice to have some pre-built alert rules in this chart so user will have greptimedb specific alerts by default.

Stephan3555 · 2025-01-24T02:29:55Z

@sunng87 I made some research on this topic. Most of the critical issues like high cpu+memory, missing nodes, crashing pods etc are already covered by the Prometheus Alert Rules that comes witht the kube-prometheus-stack:

https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack/templates/prometheus/rules-1.14

The only greptime specific alerts that make sense are then use case specific. In my case i would like to have alerts when there is no ingestion of new rows or no prometheus queries (prometheus compatible api) are made against the cluster. But i dont rely on the mysql endpoint or the native promql http endpoint for example, so alerts for them would make no sense for me.

Providing them nevertheless leads to firing alerts in my monitoring solution.

I guess the better approach in this case would be to either have each individual alerts rule be toggable through the values.yaml or provide a way to easily add custom alert rules.

Attached an example for alert rules:

- alert: NoRowsIngestions
  expr: sum(rate(greptime_table_operator_ingest_rows[5m])) == 0
  for: 2m
  labels:
    severity: warning
  annotations:
    summary: 'No Rows were Ingested into Greptime'
    description: 'No Rows were Ingested into Greptime for the last 5 minutes'

- alert: NoPrometheusQueries
  expr: sum (rate(greptime_servers_http_prometheus_promql_elapsed_count{pod=~".*frontend.*"}[1h])) == 0
  for: 2m
  labels:
    severity: notice
  annotations:
    summary: 'No Prometheus Queries were made to Greptime'
    description: 'No Prometheus Queries were made to Greptime for the last 1 hour'

sunng87 · 2025-01-24T06:48:47Z

@Stephan3555 I agree most system level rules can be covered by kubernetes level alerts. For GreptimeDB, the traffic pattern may vary according to different use-cases. Since we want to offer a convenient approach for user to add alerting, there are several levels

Level	Description
0	Do nothing. The user will need to configure alert rules from scratch
1	Define something like `ConfigMap` for setting alert rules, we configure Prometheus to apply the rules
2	Provide prebuilt `ConfigMap` templates, say `common`, `mysql`, `promql`, etc, that user can pick some of them, edit them, and apply them to Level 1 mechanism.
...	...
N	Everything is built-in, the alert rules are all adaptive and agnostic to traffic patterns, zero setup required

I think we can get Level 2 as a start. But it still requires significant effort to setup. Let's see if there are levels between 2 and N

Stephan3555 · 2025-01-24T09:29:10Z

@sunng87 I can provide the helm chart component to generate dynamically the necessary configmaps. Would be nice if the rules itself can come from the Greptime team

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Include some optional prometheus alert rules in the chart #221

Include some optional prometheus alert rules in the chart #221

sunng87 commented Jan 17, 2025

Stephan3555 commented Jan 24, 2025 •

edited

Loading

sunng87 commented Jan 24, 2025

Stephan3555 commented Jan 24, 2025

Include some optional prometheus alert rules in the chart #221

Include some optional prometheus alert rules in the chart #221

Comments

sunng87 commented Jan 17, 2025

Stephan3555 commented Jan 24, 2025 • edited Loading

sunng87 commented Jan 24, 2025

Stephan3555 commented Jan 24, 2025

Stephan3555 commented Jan 24, 2025 •

edited

Loading