-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature(ci): validate alert rules #22
base: main
Are you sure you want to change the base?
Conversation
wget https://github.com/canonical/cos-tool/releases/download/rel-20220621/cos-tool-amd64 | ||
chmod +x ./cos-tool-amd64 | ||
- name: Validate prometheus alert rules | ||
run: find . -path '*/prometheus_alert_rules/*' -type f -exec ./cos-tool-amd64 --format promql lint {} + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will pass all of the files in at once, which may yield... sort of unreadable results.
Maybe:
find . -path '*/prometheus_alert_rules/*' -type f -print0 | while read -d $'\0' file; do echo "Validating $file"; ./cos-tool-amd64 --format promql lint $file; done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to \;
in next commit.
This produces the following sample output:
$ find . -path '*/prometheus_alert_rules/*' -type f -exec ./cos-tool-amd64 --format promql lint {} + || echo failed
error validating: yaml: unmarshal errors:
line 1: field alert not found in type rulefmt.RuleGroups
line 2: field expr not found in type rulefmt.RuleGroups
line 3: field for not found in type rulefmt.RuleGroups
line 4: field labels not found in type rulefmt.RuleGroups
line 6: field annotations not found in type rulefmt.RuleGroups
failed
I think we should update cos-tool to also print the filename being inspected
Wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I missed this comment.
This is also due to the single-rule format. cos-tool
could print the filename being inspected, but... it's also not gonna be that useful. Because the single-rule format files need to be coerced into a parsable format, and in addition, the Python representation of Loki's rules doesn't quite actually add a name
to the groups
key, we do it in the CosTool
Python class before ever passing it over to cos-tool
itself at all.
It could be done in Go, but the honest truth is that inspecting and transforming arbitrary YAML/JSON is not a fun experience. The modules/libraries are good enough about giving options for "map this JSON/YAML onto this struct, and it's ok to ignore some of these fields and leave them uninitialized in the struct maybe, or throw and error if you can't exactly match it", but cos-tool
trying to guess what a single rule format alert should "become" versus a "normal" alert group file is pretty awful and Python is just flat-out a better tool for it. To do it in Go ends up being really terrible.
This action would be better off directly invoking the CosTool
class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
single-rule format files need to be coerced into a parsable format
Still, this seems like a cos-tool internal, after which it could report the original filename
Are you suggesting the CosTool class because %%juju_topology%%
could appear in our loki rules?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "original filename" passed to cos-tool
, generally, has no significance whatsoever. A temporary file is created from relation data and that's used. loki_push_api
actually mangles even that data so it will parse appropriately.
Taking the single rule per file format, which is already handled in Python, and having cos-tool
try to figure out what it is and what it's supposed to be, mangle it into that format (which will not be representative of what is actually in the file anyway), and report back a filename for a use case which is CI only while this is covered appropriately at runtime through a combination of tools is not a good use of engineering time.
%%juju_topology%%
would be a problem also, and even that is not covered in the CosTool
class, but in the AlertRules
class before sending it to CosTool
at all to keep separation of concerns.
To be clear, this is not an intuitive process in Go most of the time -- unmarshaling YAML/JSON does not give you a map[string]map[string]...
with any kind of intelligence, it is a map[string]interface{}
which then must be reflected or type asserted on. Imagine if, in Python, loading the rules looked like:
data: Dict[str, object]
data = yaml.safe_load(some_rule_data)
if "group" not in data.keys():
try:
data = ruleNode(data)
except ValueError as e:
# could not instantiate because the types are wrong
return someerr
else:
try:
# try to type assert `Any`
if not isinstance(str, data["groups"].get("name", 0)):
typing.cast(Dict[str, str], data["groups"]["name"]) = generate_some_name()
if not isinstance(Dict[str, Any], typing.cast(List[Dict[str, Any]], data["rules"])[0]):
return somerr
# we are now more or less sure that there is a "rules" key with a list under it, a "names" key, and a top-level "groups" key
validate_rules(typing.cast(ruleGroup, data)
except ValueError as e:
return someerr
The Go would be more verbose, but loading/unmarshaling arbitrary YAML/JSON into a walkable map without a huge amount of type assertions/casts, and/or blindly trying to map that data onto a struct and catching errors if it doesn't work is the "good" way to do this, idiomatically, in the standard library. It isn't what Go is good at, and an API which was designed for passing this to Go would have a field in the YAML which said "I am this type of object", or would be sent to an explicit endpoint, or in a different folder (and passed to a different func), or similar.
Obviously it can be done, but it's not a class of problems Go excels at, and Python does. Doing this is "good money after bad" when it's already capably handled in Python, and the use case for it is CI only, where we could just... do it in Python and let that invoke cos-tool
as normal, even keeping track of the filename it passed through and outputting that is an all-around better design.
The job of cos-tool
is to inject label matchers for topology and/or validate our transformed rules (after putting "single rule per file" rules into one group in relation data, substituting %%juju_topology%%
, and so on) will work with the actual runtime code of the workload, not to be a catch-all for the thing we're doing to the data. Python does part that, and cos-tool
ensures that either "this final ruleGroup
will not cause errors in Prom/Loki" or "labels can be injected without trying to fiddle with regexps to partially re-implement query parsing logic from Go in Python". The inverse of this statement is also that re-implementing Juju relation data/application domain rules (single rule file, for example) is the job of charm code, not cos-tool
.
1716e81
to
cc1c639
Compare
Making this a draft until we've decided on how to proceed. |
This came up again as part of canonical/grafana-agent-operator#199. A usable, partial, solution, is to install the prom snap and: promtool check rules src/prometheus_alert_rules/* |
This PR adds a workflow to automatically validate all alert rules in the repo.
Breaking change
At the moment this would be a breaking change because cos-tool (or promtool) does not support our custom "one alert rule per file" format.
We could update cos-tool to automatically wrap such files with
groups:
before attempting validation or abandon the "one alert rule per file" format altogether. Or both, incrementally.Fixes #12.