Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include some optional prometheus alert rules in the chart #221

Open
sunng87 opened this issue Jan 17, 2025 · 3 comments
Open

Include some optional prometheus alert rules in the chart #221

sunng87 opened this issue Jan 17, 2025 · 3 comments

Comments

@sunng87
Copy link
Member

sunng87 commented Jan 17, 2025

It would be nice to have some pre-built alert rules in this chart so user will have greptimedb specific alerts by default.

@Stephan3555
Copy link
Contributor

Stephan3555 commented Jan 24, 2025

@sunng87 I made some research on this topic. Most of the critical issues like high cpu+memory, missing nodes, crashing pods etc are already covered by the Prometheus Alert Rules that comes witht the kube-prometheus-stack:

https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack/templates/prometheus/rules-1.14

The only greptime specific alerts that make sense are then use case specific. In my case i would like to have alerts when there is no ingestion of new rows or no prometheus queries (prometheus compatible api) are made against the cluster. But i dont rely on the mysql endpoint or the native promql http endpoint for example, so alerts for them would make no sense for me.

Providing them nevertheless leads to firing alerts in my monitoring solution.

I guess the better approach in this case would be to either have each individual alerts rule be toggable through the values.yaml or provide a way to easily add custom alert rules.

Attached an example for alert rules:

- alert: NoRowsIngestions
  expr: sum(rate(greptime_table_operator_ingest_rows[5m])) == 0
  for: 2m
  labels:
    severity: warning
  annotations:
    summary: 'No Rows were Ingested into Greptime'
    description: 'No Rows were Ingested into Greptime for the last 5 minutes'

- alert: NoPrometheusQueries
  expr: sum (rate(greptime_servers_http_prometheus_promql_elapsed_count{pod=~".*frontend.*"}[1h])) == 0
  for: 2m
  labels:
    severity: notice
  annotations:
    summary: 'No Prometheus Queries were made to Greptime'
    description: 'No Prometheus Queries were made to Greptime for the last 1 hour'

@sunng87
Copy link
Member Author

sunng87 commented Jan 24, 2025

@Stephan3555 I agree most system level rules can be covered by kubernetes level alerts. For GreptimeDB, the traffic pattern may vary according to different use-cases. Since we want to offer a convenient approach for user to add alerting, there are several levels

Level Description
0 Do nothing. The user will need to configure alert rules from scratch
1 Define something like ConfigMap for setting alert rules, we configure Prometheus to apply the rules
2 Provide prebuilt ConfigMap templates, say common, mysql, promql, etc, that user can pick some of them, edit them, and apply them to Level 1 mechanism.
... ...
N Everything is built-in, the alert rules are all adaptive and agnostic to traffic patterns, zero setup required

I think we can get Level 2 as a start. But it still requires significant effort to setup. Let's see if there are levels between 2 and N

@Stephan3555
Copy link
Contributor

@sunng87 I can provide the helm chart component to generate dynamically the necessary configmaps. Would be nice if the rules itself can come from the Greptime team

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants