-
Notifications
You must be signed in to change notification settings - Fork 397
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] growth in alert_rule_version table #1639
Comments
@yaskinny Could this be versioning applied by Grafana (like it does with dashboards)? In this case, it's not an Operator issue. Or does this not happen when not using the Grafana Operator? |
@pb82 I haven't yet got time to dig deeper and find the root cause, but the obvious thing is that as soon as i stop operator my table size does not grow anymore. I have a doubt that there's a field in the alerts which operator is sending to grafana and that field is making grafana think that alert is updated and it is newer and causes a new record on the table. I'm not sure what that field is and where it should be handled(maybe its operator and has to either change that field to a dynamic data based on rule state not something random each time or it's grafana and does not check a field correctly). if i get time, I'll investigate more and share the results with you. here a sample alert that I'm using:
|
I'll try to reproduce the issue this week. If this is the case for all alerts, this should definetly be fixed soon |
I don't think this is an issue with the operator, but with grafana itself. We have had similar issues with this table using the sidecar provisioning: I never had the time to deep dive into the Grafana code to find the issue, but my gut says there's some logic issues when comparing alerts to their old versions, causing an infinite growth. We are seeing the same table growth now that we've switched to Operator, though much, much slower growth. Our admittedly bad solution is an automated truncation on the table itself. |
Hello, just to say we have the exact same problem here, using argocd and grafana-operator. |
As I see, the fix (grafana/grafana#89754) has been merged a few days ago, so now it's a matter of waiting for a new version of Grafana to be released. After that, from the grafana-operator side, we should think whether we want to only document how to configure the cleanup in grafana or we also should add some sane defaults to make it all automatic (although, it will only help with self-hosted Grafanas as we have no way to configure the same in "managed" Grafanas, so it's up to vendors to configure). |
v11.3.0 has just been released, it contains the configuration option that allows to enable the cleanup of the old rule versions. |
@weisdd |
@BinjaFan it was added in grafana/grafana#89754: [unified_alerting]
# Defines the limit of how many alert rule versions
# should be stored in the database for each alert rule in an organization including the current one.
# 0 value means no limit
rule_version_record_limit = 0 I haven't tried it myself yet, but I would assume it works once you adjust the setting accordingly. |
Describe the Bug
The operator appears to cause unexpected growth in the
alert_rule_version
table. I haven't investigated the root cause deeply, but the size of this table increases even without any updates. For example, I have set the re-evaluation interval for alerts to 10 minutes. Every 10 minutes, 500 new records are added to the table. I didn't check the diff between records to add more context but growth in records number is obvious. Additionally, when I delete agrafanaalertrule
Custom Resource (CR) from the cluster, a large number of records are removed from this table, depending on how long the rule has existed—since every 10 minutes, multiple records are added for that specificgrafanaalertrule
. After stopping the operator, the growth in the table ceased.I haven't updated to the latest version yet because I haven't found any mention of this issue in the release notes or in the repository's issue tracker.
Version
v5.9.1
To Reproduce
(I'm using PG 16 for database)
The text was updated successfully, but these errors were encountered: