Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature(ci): validate alert rules #22

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .github/workflows/_quality-checks.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,9 @@ jobs:
linting:
name: Linting
uses: canonical/observability/.github/workflows/_linting.yaml@main
validate-rules:
name: Validate rules
uses: canonical/observability/.github/workflows/_validate_rules.yaml@main
unit-test:
name: Unit Tests
uses: canonical/observability/.github/workflows/_unit-tests.yaml@main
Expand Down
22 changes: 22 additions & 0 deletions .github/workflows/_validate_rules.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
name: Validate rules

on:
workflow_call: {}

jobs:
rules:
name: Lint alert rules
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v3
with:
fetch-depth: 1
- name: Install dependencies
run: |
wget https://github.com/canonical/cos-tool/releases/download/rel-20220621/cos-tool-amd64
chmod +x ./cos-tool-amd64
- name: Validate prometheus alert rules
run: find . -path '*/prometheus_alert_rules/*' -type f -exec ./cos-tool-amd64 --format promql lint {} +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will pass all of the files in at once, which may yield... sort of unreadable results.

Maybe:

find . -path '*/prometheus_alert_rules/*' -type f -print0 | while read -d $'\0' file; do echo "Validating $file"; ./cos-tool-amd64 --format promql lint $file; done

Copy link
Contributor Author

@sed-i sed-i Jan 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to \; in next commit.

This produces the following sample output:

$ find . -path '*/prometheus_alert_rules/*' -type f -exec ./cos-tool-amd64 --format promql lint {} + || echo failed 
error validating: yaml: unmarshal errors:
  line 1: field alert not found in type rulefmt.RuleGroups
  line 2: field expr not found in type rulefmt.RuleGroups
  line 3: field for not found in type rulefmt.RuleGroups
  line 4: field labels not found in type rulefmt.RuleGroups
  line 6: field annotations not found in type rulefmt.RuleGroups
failed

I think we should update cos-tool to also print the filename being inspected

Wdyt?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I missed this comment.

This is also due to the single-rule format. cos-tool could print the filename being inspected, but... it's also not gonna be that useful. Because the single-rule format files need to be coerced into a parsable format, and in addition, the Python representation of Loki's rules doesn't quite actually add a name to the groups key, we do it in the CosTool Python class before ever passing it over to cos-tool itself at all.

It could be done in Go, but the honest truth is that inspecting and transforming arbitrary YAML/JSON is not a fun experience. The modules/libraries are good enough about giving options for "map this JSON/YAML onto this struct, and it's ok to ignore some of these fields and leave them uninitialized in the struct maybe, or throw and error if you can't exactly match it", but cos-tool trying to guess what a single rule format alert should "become" versus a "normal" alert group file is pretty awful and Python is just flat-out a better tool for it. To do it in Go ends up being really terrible.

This action would be better off directly invoking the CosTool class

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

single-rule format files need to be coerced into a parsable format

Still, this seems like a cos-tool internal, after which it could report the original filename

Are you suggesting the CosTool class because %%juju_topology%% could appear in our loki rules?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "original filename" passed to cos-tool, generally, has no significance whatsoever. A temporary file is created from relation data and that's used. loki_push_api actually mangles even that data so it will parse appropriately.

Taking the single rule per file format, which is already handled in Python, and having cos-tool try to figure out what it is and what it's supposed to be, mangle it into that format (which will not be representative of what is actually in the file anyway), and report back a filename for a use case which is CI only while this is covered appropriately at runtime through a combination of tools is not a good use of engineering time.

%%juju_topology%% would be a problem also, and even that is not covered in the CosTool class, but in the AlertRules class before sending it to CosTool at all to keep separation of concerns.

To be clear, this is not an intuitive process in Go most of the time -- unmarshaling YAML/JSON does not give you a map[string]map[string]... with any kind of intelligence, it is a map[string]interface{} which then must be reflected or type asserted on. Imagine if, in Python, loading the rules looked like:

data: Dict[str, object]
data = yaml.safe_load(some_rule_data)

if "group" not in data.keys():
  try:
    data = ruleNode(data)
  except ValueError as e:
    # could not instantiate because the types are wrong
    return someerr
else:
   try:
     # try to type assert `Any` 
     if not isinstance(str, data["groups"].get("name", 0)):
       typing.cast(Dict[str, str], data["groups"]["name"]) = generate_some_name()
     if not isinstance(Dict[str, Any], typing.cast(List[Dict[str, Any]], data["rules"])[0]):
       return somerr
     # we are now more or less sure that there is a "rules" key with a list under it, a "names" key, and a top-level "groups" key
     validate_rules(typing.cast(ruleGroup, data)
   except ValueError as e:
     return someerr

The Go would be more verbose, but loading/unmarshaling arbitrary YAML/JSON into a walkable map without a huge amount of type assertions/casts, and/or blindly trying to map that data onto a struct and catching errors if it doesn't work is the "good" way to do this, idiomatically, in the standard library. It isn't what Go is good at, and an API which was designed for passing this to Go would have a field in the YAML which said "I am this type of object", or would be sent to an explicit endpoint, or in a different folder (and passed to a different func), or similar.

Obviously it can be done, but it's not a class of problems Go excels at, and Python does. Doing this is "good money after bad" when it's already capably handled in Python, and the use case for it is CI only, where we could just... do it in Python and let that invoke cos-tool as normal, even keeping track of the filename it passed through and outputting that is an all-around better design.

The job of cos-tool is to inject label matchers for topology and/or validate our transformed rules (after putting "single rule per file" rules into one group in relation data, substituting %%juju_topology%%, and so on) will work with the actual runtime code of the workload, not to be a catch-all for the thing we're doing to the data. Python does part that, and cos-tool ensures that either "this final ruleGroup will not cause errors in Prom/Loki" or "labels can be injected without trying to fiddle with regexps to partially re-implement query parsing logic from Go in Python". The inverse of this statement is also that re-implementing Juju relation data/application domain rules (single rule file, for example) is the job of charm code, not cos-tool.

- name: Validate loki alert rules
run: find . -path '*/loki_alert_rules/*' -type f -exec ./cos-tool-amd64 --format logql lint {} +