Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add type field to AlertingRule and RecordingRule struct #1558

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions api/prometheus/v1/api.go
Original file line number Diff line number Diff line change
Expand Up @@ -601,6 +601,7 @@ type AlertingRule struct {
EvaluationTime float64 `json:"evaluationTime"`
LastEvaluation time.Time `json:"lastEvaluation"`
State string `json:"state"`
Type string `json:"type"`
}

// RecordingRule models a recording rule.
Expand All @@ -612,6 +613,7 @@ type RecordingRule struct {
LastError string `json:"lastError,omitempty"`
EvaluationTime float64 `json:"evaluationTime"`
LastEvaluation time.Time `json:"lastEvaluation"`
Type string `json:"type"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we look on Rules type commentary you see the canonical way to identify if it's recording or alerting is through a type system. This means we don't need this type field here.

From the error you are getting, it sounds like you have problem with something marshalling this type back to JSON and then something unmarshalling (decoding) and expecting type field? In this case:

  1. For sure we should have regression test and agree it's a valid use case. Can you confirm that is the case? Simple call of API that uses unmarshalling should just work without Type in this code -- unless I am wrong, but reproduction test will verify (:
  2. To solve it, we could create MarshalJSON function that adds that field just for marshalling, WDYT? Alternatively we could keep this PR, but perhaps change the commentary on Rules slice? Otherwise we will have multiple ways of doing same thing.

To sum up, regression test that breaks on old code would be amazing, in minimum, WDYT?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in our use case we simply call the Rules endpoint using the prometheus API here https://github.com/prometheus/client_golang/blob/main/api/prometheus/v1/api.go#L1202. In doing so it calls the ruler endpoint and returns us a RulesResult struct which we just use as a data field in response json. But it seems like when internally in api.go we are trying to create unmarshall RuleResult struct and while doing so unmarshall alertingRule and recordingRule it expects the struct to have the type field to unmarshal the json successfully which it gets from the ruler endpoint.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it calls the ruler endpoint and returns us a RulesResult struct which we just use as a data field in response json. But it seems like when internally in api.go we are trying to create unmarshall RuleResult struct and while doing so unmarshall alertingRule and recordingRule it expects the struct to have the type field to unmarshal the json successfully which it gets from the ruler endpoint.

Cool. So literally Prometheus server is targeted, right? Then Prometheus server should give us type field and our code should work without this PR. In fact alertingRule.Type = string(RuleTypeAlerting) line does nothing given you added type field, but even without type field unmarshal should work because of anonymous struct with type we unmarshal first.

BUT, if you notice a problem there might be one, so the best is to create a very small unit test that e.g. mock Prometheus API returning recording and alerting with type specified. Then without your change it should fail, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have enough tests so it would be amazing anyway 🤗 - if not done I can try do it, but it will take time (busy).

}

// Alert models an active alert.
Expand Down Expand Up @@ -721,11 +723,13 @@ func (rg *RuleGroup) UnmarshalJSON(b []byte) error {

for _, rule := range v.Rules {
alertingRule := AlertingRule{}
alertingRule.Type = string(RuleTypeAlerting)
if err := json.Unmarshal(rule, &alertingRule); err == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So first of all this code was wrong, we don't even capture error anywhere no? How you got that type field not present in rule if it's not propagated here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are not justing the UnmarshalJSON function at all now that I see, we should be using UnMarshalJSON instead of json.UnMarshal what do you think?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think changing the commentary on the Rules might be a better way since then we have a clear way of checking how the type field exists which is not clear right now. But you know the code better than me so happy to comply with what you think :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, let's add unit test for your "broken" case and we will go from there (:

rg.Rules = append(rg.Rules, alertingRule)
continue
}
recordingRule := RecordingRule{}
recordingRule.Type = string(RuleTypeRecording)
if err := json.Unmarshal(rule, &recordingRule); err == nil {
rg.Rules = append(rg.Rules, recordingRule)
continue
Expand Down
30 changes: 17 additions & 13 deletions api/prometheus/v1/api_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -717,20 +717,22 @@ func TestAPIs(t *testing.T) {
"annotations": map[string]interface{}{
"summary": "High request latency",
},
"duration": 600,
"health": "ok",
"labels": map[string]interface{}{
"severity": "page",
},
"name": "HighRequestLatency",
"query": "job:request_latency_seconds:mean5m{job=\"myjob\"} > 0.5",
"type": "alerting",
"duration": 600,
"health": "ok",
"name": "HighRequestLatency",
"query": "job:request_latency_seconds:mean5m{job=\"myjob\"} > 0.5",
"lastError": "",
"type": "alerting",
},
{
"health": "ok",
"name": "job:http_inprogress_requests:sum",
"query": "sum(http_inprogress_requests) by (job)",
"type": "recording",
"health": "ok",
"name": "job:http_inprogress_requests:sum",
"query": "sum(http_inprogress_requests) by (job)",
"lastError": "",
"type": "recording",
},
},
},
Expand Down Expand Up @@ -769,12 +771,14 @@ func TestAPIs(t *testing.T) {
Name: "HighRequestLatency",
Query: "job:request_latency_seconds:mean5m{job=\"myjob\"} > 0.5",
LastError: "",
Type: string(RuleTypeAlerting),
},
RecordingRule{
Health: RuleHealthGood,
Name: "job:http_inprogress_requests:sum",
Query: "sum(http_inprogress_requests) by (job)",
LastError: "",
Type: string(RuleTypeRecording),
},
},
},
Expand Down Expand Up @@ -861,14 +865,14 @@ func TestAPIs(t *testing.T) {
Annotations: model.LabelSet{
"summary": "High request latency",
},
Duration: 600,
Health: RuleHealthGood,
Labels: model.LabelSet{
"severity": "page",
},
Duration: 600,
Health: RuleHealthGood,
Name: "HighRequestLatency",
Query: "job:request_latency_seconds:mean5m{job=\"myjob\"} > 0.5",
LastError: "",
Type: string(RuleTypeAlerting),
EvaluationTime: 0.5,
LastEvaluation: time.Date(2020, 5, 18, 15, 52, 53, 450311300, time.UTC),
State: "firing",
Expand All @@ -877,7 +881,7 @@ func TestAPIs(t *testing.T) {
Health: RuleHealthGood,
Name: "job:http_inprogress_requests:sum",
Query: "sum(http_inprogress_requests) by (job)",
LastError: "",
Type: string(RuleTypeRecording),
EvaluationTime: 0.3,
LastEvaluation: time.Date(2020, 5, 18, 15, 52, 53, 450311300, time.UTC),
},
Expand Down
Loading