type | title | status | owner | menu |
---|---|---|---|---|
proposal |
Global / Federated Rules API |
accepted |
s-urbaniak |
proposals-done |
Thanos allows a global query view for the Prometheus series. This means discovering, connecting various (often remote) "leafs" components and aggregating series data from them.
Since we work with metrics, evaluating recording rules and alerts is a very important part of our system. Some of our "StoreAPIs" like Prometheus and Thanos Ruler are designed for that. However, we currently don’t have a way to present those resources in a federated way e.g Alerts and Recording Rules. This document explores the potential way of solving this in the Prometheus Ecosystem.
Deployment scenarios with various levels of federated setups are not unusual. Some deployments leverage Thanos Ruler to query multiple data sources (i.e. via Thanos Querier) for rule and alert evaluation. This topology sacrifices availability. At the same time in the same deployment topology, a dedicated Prometheus instance is deployed for certain critical alerting rules implying high availability. Here, we are missing a federation middleware exposing a consolidated view of evaluated rules and alerts in the underlying Prometheus and Thanos Ruler instances.
Pitfalls of current solutions:
- It’s tedious to manually visit each leaf and we are lazy.
- Visiting leaf instances (be it Thanos Ruler or dedicated Prometheus endpoints) have to be implemented manually.
- Present other Prometheus/Thanos Resources in a global view.
- This also means an up-to-date view (e.g statuses)
- Simple to run if you already run Thanos.
- Use consistent protocols (if we suddenly switch to pure HTTP from gRPC it will break user’s proxies for auth, routing, monitoring, rate limiting, etc)
- Potentially use the same connections, discovered targets by Querier.
- Unit tests which would fire up a dummy Rules API and check different scenarios.
- Ad-hoc testing.
We propose to add following proto service and propagate all using existing components that have rules, so:
- Sidecar
- Querier (for federation)
- Ruler
The newly introduced Rules
service is designed to reflect Prometheus' Rules API allowing to retrieve recording and alerting rules.
service Rules {
/// Rules has info for all rules.
rpc Rules(RulesRequest) returns (stream RulesResponse);
}
Sidecar proxies the Rules request to its local Prometheus instance and synthesizes the response. Similarly, Thanos Ruler constructs the response based on its local state.
Thanos Querier fans-out to all know Rules endpoints configured via --rule
command line flag, merges and deduplicates the result. This new setting is meant to configure rules endpoints as a strict subset of store endpoints as specified with --store
and --store.sd-files
. If a user specifies a --rule
endpoint not matching --store
/--store.sd-files
endpoints the initial implementation would log out that fact. In the future (see below) it is planned to have this a more separate setting.
Generally the deduplication logic is less complex than with time series, specifically:
- Deduplication happens first at the rule group level. The identifier is the group name and the group file.
- Then, per group name deduplication happens on the rule level, where:
- the rule type (recording rule vs. alerting rule)
- the rule name
- the rule label names
- the rule expression
expr
field - the alerting rule
for
field
are being used as the deduplication identifier. Disjunct entries are simply merged by adding them to the result set.
Thanos Querier presents the result of the fan-out on a /api/v1/rules
endpoint which is compatible with the Prometheus' rules API endpoint. Additionally Thanos Querier gains a new --rule.replica-label
command line argument which falls back to --query.replica-label
if unset. The replica labels refer to the same labels as specified in the external_labels
section of Prometheus and the --label
command line argument of Thanos Ruler.
A stream of alerting rules is merged from different Thanos Ruler API instances. These may be remote Thanos Ruler, Prometheus Sidecar, or even Thanos Querier instances. The merging Thanos Querier has --rule.replica-label=replica
Scenario 1:
As specified, the rule type and then rule name is used for deduplication.
Given the following stream of incoming rule groups and containing recording/alerting rules:
group: a
recording:<name:"r1" last_evaluation:<seconds:1 > >
alert: <name:"a1" last_evaluation:<seconds:1 > >
group: b
recording:<name:"r1" last_evaluation:<seconds:1 > >
group: a
recording:<name:"r2" last_evaluation:<seconds:1 > >
The output becomes:
group: a
alert: <name:"a1" last_evaluation:<seconds:1 > >
recording:<name:"r1" last_evaluation:<seconds:1 > >
recording:<name:"r2" last_evaluation:<seconds:1 > >
group: b
recording:<name:"r1" last_evaluation:<seconds:1 > >
Note in the example above how the recording rule r1
is not deduplicated as it is contained in two different groups.
Scenario 2:
The next level of deduplication is governed by the label/value set of the underlying recording/alerting rule while respecting the replica label. For a given conflict, the youngest is preferred. For alerting rules, the youngest firing rule is preferred.
Given the following stream of incoming recording rules:
group: a
recording:<name:"r1" labels:<labels:<name:"replica" value:"thanos-ruler-1" > > last_evaluation:<2006-01-02T10:00:00> >
group: a
recording:<name:"r1" labels:<labels:<name:"replica" value:"thanos-ruler-2" > > last_evaluation:<2006-01-02T10:01:00> >
The output becomes:
group: a
recording:<name:"r1" labels:<labels:<name:"replica" value:"thanos-ruler-2" > > last_evaluation:<2006-01-02T10:01:00> >
Given the following stream of incoming alerting rules:
group: a
alert:<state:FIRING name:"a1" labels:<labels:<name:"replica" value:"thanos-ruler-1" > > last_evaluation:<2006-01-02T10:00:00> >
group: a
alert:<state:PENDING name:"a1" labels:<labels:<name:"replica" value:"thanos-ruler-2" > > last_evaluation:<2006-01-02T10:01:00> >
The output becomes:
group: a
alert:<state:FIRING name:"a1" labels:<labels:<name:"replica" value:"thanos-ruler-1" > > last_evaluation:<2006-01-02T10:00:00> >
Note how in the above output the firing alerting rule was preferred despite being older.
Scenario 3:
If, under the above conditions a rule is a candidate for deduplication, finally the rule expr
and for
fields are being considered for deduplication.
Given the following stream of incoming alerting rules will also result in two independent alerting rules as both the expr
and for
fields differ:
- alert: KubeAPIErrorBudgetBurn
annotations:
message: The API server is burning too much error budget
expr: |
sum(apiserver_request:burnrate1h) > (14.40 * 0.01000)
and
sum(apiserver_request:burnrate5m) > (14.40 * 0.01000)
for: 2m
labels:
severity: critical
- alert: KubeAPIErrorBudgetBurn
annotations:
message: The API server is burning too much error budget
expr: |
sum(apiserver_request:burnrate6h) > (6.00 * 0.01000)
and
sum(apiserver_request:burnrate30m) > (6.00 * 0.01000)
for: 15m
labels:
severity: critical
Scenario 4:
As specified, the group name and file fields are used for deduplication.
Given the following stream of incoming rule groups:
group: a/file1
group: b/file1
group: a/file2
The output becomes:
group: a/file1
group: a/file2
group: b/file1
Deduplication of included alerting/recording rules inside groups is described in the previous scenarios.
- Cortex contains a sharded Ruler. Assigning rules to shards is done via Consul, though a gossip implementation is under development. Shards do not communicate with other shards. Rules come from a store (e.g. a Postgres database).
- Implement a new flag
--rule
in Thanos Querier which registers RulesAPI endpoints. - Implement a new flag
--rule.replica-label
in Thanos Querier. - Implement RulesAPI backends in sidecar, query, rule.
- Feature branch: thanos-io#2200
These changes are suggestions which we will need to be discussed in future and are not part of the proposal implementation.
Currently, Info
is shared between the RulesAPI
proposed here and the existing StoreAPI
services. To accommodate for future additional APIs the following changes of the protobuf Info
and Type
structures are suggested.
The current StoreType
enum is renamed to Type
. This retains binary compatibility with older clients:
-enum StoreType {
+enum Type {
UNKNOWN = 0;
QUERY = 1;
RULE = 2;
The current fields in InfoResponse
connected to Store APIs are deprecated and dedicated new API sub types are proposed:
message InfoResponse {
// Deprecated. Use label_sets instead.
repeated Label labels = 1 [(gogoproto.nullable) = false];
+ // Deprecated. Will be removed in favor of StoreInfoResponse in the future.
int64 min_time = 2;
+ // Deprecated. Will be removed in favor of StoreInfoResponse in the future.
int64 max_time = 3;
- StoreType storeType = 4;
+ Type type = 4;
// label_sets is an unsorted list of `LabelSet`s.
+ // Deprecated. Will be removed in favor of StoreInfoResponse in the future.
repeated LabelSet label_sets = 5 [(gogoproto.nullable) = false];
+
+ StoreInfoResponse store = 6;
+ RulesInfoResponse rules = 7;
}
To ease the current implementation, rules endpoints are a strict subset of store endpoints. In the future these settings should be separate, i.e. the user could specify different endpoints for rules and different endpoints for store.