Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mimir testing #3578

Closed
4 tasks
Tracked by #3565
Rotfuks opened this issue Jul 15, 2024 · 12 comments
Closed
4 tasks
Tracked by #3565

Mimir testing #3578

Rotfuks opened this issue Jul 15, 2024 · 12 comments
Assignees
Labels
team/atlas Team Atlas

Comments

@Rotfuks
Copy link
Contributor

Rotfuks commented Jul 15, 2024

Motivation

n order to raise our confidence in the stability of our observability platform and be sure that our ongoing work and releases won't negatively impact our observability platform operations we need to create extensive tests giving us early feedback loops. As mimir is one of our core components, we should make sure it's thoroughly tested. 

Todo

  • Investigate some good initial test cases to give us feedback on the stability of a release on mimir on CI. This can be:

    • Validate Helm chart templating
    • Deploy chart on Kind (single instance)
  • Implement those CI test cases

  • Investigate some good initial test cases to give us feedback on the stability of a release before releasing. This can be:

    • Deploy chart on AWS, Azure, CAPI (later)
    • Integration test; Verify all component work together
    • Using canary to generate traffic
  • Implement those test cases

Make sure to stay with a set of minimal but valuable test cases, nothing to detailed and fancy.

Outcome

  • We have early, automatic feedback about the impact of a release of Mimir before releasing.
@github-project-automation github-project-automation bot moved this to Inbox 📥 in Roadmap Jul 15, 2024
@Rotfuks Rotfuks added the team/atlas Team Atlas label Jul 15, 2024
@QuantumEnigmaa QuantumEnigmaa self-assigned this Aug 27, 2024
@QuentinBisson
Copy link

I think it would be nice to run mimir continous testing (similar to loki canary) for e2e tests

@QuantumEnigmaa
Copy link

Yeah that's a nice idea :)

@QuentinBisson
Copy link

I am running mimir continuous_test on grizzly with the following config:

mimir:
  continuous_test:
    enabled: true
    auth:
      tenant: anonymous

and this renders the following metrics:

Image

and we could just use those alerts https://github.com/grafana/mimir/blob/f52911d917c8c52e0da6a59348a64dd7f7622072/operations/mimir-mixin-compiled/alerts.yaml#L1097

The only downside is that we need to wait for the next minor helm chart release or use a weekly version because this grafana/mimir#8654 is not yet released

@QuantumEnigmaa
Copy link

So what's the best plan of action IMO is to wait for the continuous testing to be a default config for our mimir before doing anything else.
In the meantime, I'll create a dashboard using the metrics from the rules' mixins and if it's good enough, I'll think about pushing it upstream as a mixins dashboard.

@QuentinBisson
Copy link

I think so yes, maybe we can have a pr ready with the alerts? The mixins contains some that could be useful

@QuentinBisson
Copy link

@QuantumEnigmaa we decided in retro to use the chart version rc0 for now but keep the old image of mimir 2.13

@QuantumEnigmaa
Copy link

All good with me 👍

@QuentinBisson
Copy link

We can start this again once we're done with multi-tenancy :)

@QuentinBisson
Copy link

Taken over the dashboard PR giantswarm/dashboards#624 to close the epic

@QuentinBisson QuentinBisson self-assigned this Nov 6, 2024
@QuentinBisson
Copy link

Blocked waiting for reviews

@QuentinBisson
Copy link

QuentinBisson commented Nov 14, 2024

Continuous test is enabled on all MCs

Added chart testing:

Dashboard PR has been merged:

Alert based on failures under review:

Test procedure tbd:

@QuentinBisson
Copy link

All is done. Thanks @hervenicol for the reviews

@github-project-automation github-project-automation bot moved this from Inbox 📥 to Done ✅ in Roadmap Nov 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team/atlas Team Atlas
Projects
Status: Done ✅
Development

No branches or pull requests

3 participants