Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mimir Support #1221

Merged
merged 12 commits into from
Oct 16, 2024
Merged

Mimir Support #1221

merged 12 commits into from
Oct 16, 2024

Conversation

msvechla
Copy link
Contributor

@msvechla msvechla commented Jul 18, 2024

Adds support for optionally provisioning Rules to Grafana Mimir, see: #1192

Configuration

The following new CLI options have been added:

      --mimir-url=                      The URL to the Mimir API. If specified provisions rules via Mimir instead of Prometheus
      --mimir-basic-auth-username=""    The HTTP basic authentication username
      --mimir-basic-auth-password=""    The HTTP basic authentication password
      --mimir-write-alerting-rules      If alerting rules should be provisioned to the Mimir Ruler.

This will provision recording rules and optionally alerting rules to Grafana Mimir, when a mimir-url is specified.

Implementation

I used the mimirtool client to provision the rules. Unfortunately this required me to use a fork of the prometheus package (see comments in this PR). We might have to use an alternative here.

A finalizer has been added to handle deletion of rules, when a servicelevelobjective is deleted. The rbac permissions in the examples have been adapted accordingly.

Testing

I added new targets to the Makefile to spin up a test cluster with a minimal Mimir setup.
I'm also already using this version for testing at work and its working great so far.

Open Questions

Pyrra UI

The Pyrra UI works just fine when configuring it against the Mimir API, when the /prometheus http prefix is specified. However of course the direct links to the prometheus UI are not working, as Mimir does not come with its own UI.

In a first release we could probably just live with that, but we might want to consider to link to Grafana instead?

Please let me know your thoughts, we probably also need to update the README a bit when this is finalized.

go.mod Outdated
k8s.io/utils v0.0.0-20240310230437-4693a0247e57 // indirect
sigs.k8s.io/json v0.0.0-20221116044647-bc3834ca7abd // indirect
sigs.k8s.io/structured-merge-diff/v4 v4.4.1 // indirect
)

replace github.com/prometheus/prometheus => github.com/grafana/mimir-prometheus v0.0.0-20240704133652-fb0cb30e280c
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used the Mimir API client from mimirtool. Unfortunately Mimir uses a fork of the prometheus module, this is why this replace has been added. I could not get it to work without it.

This is probably not ideal. If there is another solution or we need to switch to our own client, please let me know

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given we use the rulefmt wrapper by Mimir and only two methods from the mimir client itself, I wonder if we can get away with not having these extra dependencies and simply write the two API calls with the stdlib HTTP library. Shouldn't be too cumbersome.

This would avoid bringing in this big dependency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, I added our own simple Mimir client with basic auth support in mimir/. Let me know what you think!

@msvechla msvechla changed the title DRAFT: Mimir Support Mimir Support Aug 2, 2024
@msvechla
Copy link
Contributor Author

msvechla commented Aug 8, 2024

@metalmatze could you take a look at this and let me know if anything needs adjustment?
Thanks a lot!

@metalmatze
Copy link
Member

Hey @msvechla,
Thank you so much for sending this PR.
I think there are some things around Makefiles, etc, which I would like to change.

Shortly after you sent the PR I looked at it but dropped the ball since I'm super busy organizing PromCon after I'm done with my day job.

Hopefully I can find some more time soon. Thank you for your patience!

@msvechla
Copy link
Contributor Author

msvechla commented Aug 10, 2024

No worries, I totally understand. Enjoy PromCon and let me know when you find some time to continue here!

Copy link
Member

@metalmatze metalmatze left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super cool.
I haven't tried running against the mimir cluster yet. I'm not exactly sure about the command's flags to run with. But the mimir cluster is up and running. 😎

Left some comments around the directory structure and whether we really need the mimir Go dependency.

This should be enough to make some next changes.

By the way, if you want to separate the finalizer changes, we could get them in even quicker. Depends on how much you want to rework this PR.

Thank you for your patience and for sending this PR!

Makefile Outdated
Comment on lines 119 to 124
ensure-kind-cluster: kind
@if ! kind get clusters | grep -q "^mimir$$"; then \
kind create cluster --config $(KIND_CONFIG_FILE); \
else \
echo "Cluster is already configured"; \
fi
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this is nice if people use kind there are many different ways of running Kubernetes these days. minikube, k3s, or simply via Docker on Mac. Let's just assume there's going to be a Kubernetes cluster available.
It also makes sense to avoid including it in the Makefile and instead include these scripts in a Mimir helper script.

I'm thinking similar to this: https://github.com/pyrra-dev/pyrra/tree/main/examples/docker-compose

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I used kind because I could assign a fixed NodePort to easily route the traffic to the service, but I can also just add a port-forward command.

I was also torn between adding it to the Makefile vs a script, will do that!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added examples/mimir/ which includes a README with the instructions. Let me know what you think.


kitlog "github.com/go-kit/log"
"github.com/go-kit/log/level"
mimircli "github.com/grafana/mimir/pkg/mimirtool/client"
"github.com/grafana/mimir/pkg/mimirtool/rules/rwrulefmt"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given this simply wraps Prometheus' rulefmt. Do we really need that dependency?

// RuleGroup is a list of sequentially evaluated recording and alerting rules.
type RuleGroup struct {
	rulefmt.RuleGroup `yaml:",inline"`
	// RWConfigs is used by the remote write forwarding ruler
	RWConfigs []RemoteWriteConfig `yaml:"remote_write,omitempty"`
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 16 to 17
This will use [kind](https://kind.sigs.k8s.io/) to spin up a local kubernetes cluster and install the [Mimir Helm Chart](https://github.com/grafana/mimir/tree/main/operations/helm/charts/mimir-distributed) in a minimal configuration.
The mimir endpoint will be available at: <http://localhost:30950>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add instructions on how to run the api and kubernetes commands with which flags?
That would be super helpful!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, I outlined it in ./examples/mimir/README.md

testdata/kind-cluster.yaml Outdated Show resolved Hide resolved
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe move to the example/mimir folder that's to be created. See my comment above.

go.mod Outdated
k8s.io/utils v0.0.0-20240310230437-4693a0247e57 // indirect
sigs.k8s.io/json v0.0.0-20221116044647-bc3834ca7abd // indirect
sigs.k8s.io/structured-merge-diff/v4 v4.4.1 // indirect
)

replace github.com/prometheus/prometheus => github.com/grafana/mimir-prometheus v0.0.0-20240704133652-fb0cb30e280c
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given we use the rulefmt wrapper by Mimir and only two methods from the mimir client itself, I wonder if we can get away with not having these extra dependencies and simply write the two API calls with the stdlib HTTP library. Shouldn't be too cumbersome.

This would avoid bringing in this big dependency.

@msvechla
Copy link
Contributor Author

msvechla commented Sep 6, 2024

@metalmatze I think I incorporated most of your feedback, let me know what you think. I can also rebase the commits into logical blocks when this MR is ready

@metalmatze
Copy link
Member

Amazing work! 👏
I followed the README and got it up and running for the most part. 👍
Mimir seems to have a problem with its ring. I can start another cluster and see again.

{"status":"error","errorType":"internal","error":"{\"status\":\"error\",\"errorType\":\"internal\",\"error\":\"expanding series: too many unhealthy instances in the ring\"}"}

Regardless, I'm happy to merge this now once we rebase and and trigger CI again.

Thank you for your work and your patience in the last weeks!

@msvechla
Copy link
Contributor Author

Awesome, thanks! I might be able to take a quick look at whats causing this when I rebase.

Thanks for your feedback and help!

This moves the mimir test setup to the examples directory, removing the
Makefile targets and describing the required steps in the examples
README.
@msvechla
Copy link
Contributor Author

@metalmatze I rebased the PR.

Regarding the mimir issues, I can not reproduce them. When I run the commands as specified in the README, the cluster becomes healthy just fine:

❯ helm upgrade -i mimir grafana/mimir-distributed --version=5.4.0 -n mimir --create-namespace -f ./examples/mimir/mimir-values.yaml --wait
Release "mimir" does not exist. Installing it now.
W1014 14:00:17.622366   53846 warnings.go:70] metadata.name: this is used in Pod names and hostnames, which can result in surprising behavior; a DNS label is recommended: [must not contain dots]
NAME: mimir
LAST DEPLOYED: Mon Oct 14 14:00:14 2024
NAMESPACE: mimir
STATUS: deployed
REVISION: 1
❯ kubectl get pods -n mimir
NAME                                      READY   STATUS      RESTARTS   AGE
mimir-compactor-0                         1/1     Running     0          3m35s
mimir-distributor-7bf8b8c4cd-qwsx9        1/1     Running     0          3m35s
mimir-gateway-6668576657-s6ltv            1/1     Running     0          3m35s
mimir-ingester-0                          1/1     Running     0          3m35s
mimir-make-minio-buckets-5.0.14-drd7x     0/1     Completed   0          3m35s
mimir-minio-5d8b7d8f79-pkxrc              1/1     Running     0          3m35s
mimir-querier-c66568668-69z9x             1/1     Running     0          3m35s
mimir-query-frontend-98464565-phhf4       1/1     Running     0          3m35s
mimir-query-scheduler-85cd695988-psjfj    1/1     Running     0          3m35s
mimir-rollout-operator-7f4847889f-bxssr   1/1     Running     0          3m35s
mimir-ruler-5d587b578c-jwstb              1/1     Running     0          3m35s
mimir-store-gateway-0                     1/1     Running     0          3m35s

@metalmatze
Copy link
Member

Ok. I'll try again. The cluster is healthy for me too. It just doesn't serve actual queries due to the above error.

@PabloPie
Copy link

@msvechla I understand making Pyrra compatible with the multi-tenancy of Mimir is outside of the scope of this PR, but should the tenant ID be at least configurable similar to what mimirtool supports? This way you can at least use different Pyrra instances for different tenants.

@msvechla
Copy link
Contributor Author

@msvechla I understand making Pyrra compatible with the multi-tenancy of Mimir is outside of the scope of this PR, but should the tenant ID be at least configurable similar to what mimirtool supports? This way you can at least use different Pyrra instances for different tenants.

I can upate the PR and add another flag to specify an optional mimir tenant ID if we want to include this already here, no problem.

@metalmatze
Copy link
Member

Another day, another minikube cluster.
Mimir has started just fine it seems.

NAME                                      READY   STATUS      RESTARTS   AGE
mimir-compactor-0                         1/1     Running     0          31m
mimir-distributor-7bf8b8c4cd-xzkn6        1/1     Running     0          31m
mimir-gateway-6668576657-gtkpw            1/1     Running     0          31m
mimir-ingester-0                          1/1     Running     0          31m
mimir-make-minio-buckets-5.0.14-mqzr2     0/1     Completed   0          31m
mimir-minio-5d8b7d8f79-5w8bd              1/1     Running     0          31m
mimir-querier-c66568668-5mwjn             1/1     Running     0          31m
mimir-query-frontend-98464565-flzrm       1/1     Running     0          31m
mimir-query-scheduler-85cd695988-hqzbn    1/1     Running     0          31m
mimir-rollout-operator-7f4847889f-b2nvw   1/1     Running     0          31m
mimir-ruler-5d587b578c-n47bw              1/1     Running     0          31m
mimir-store-gateway-0                     1/1     Running     0          31m

Still, I get the same error talking to http://localhost:8080/prometheus/api/v1/query_range from the Pyrra API.

@metalmatze
Copy link
Member

It would be great to figure this out so that I can help maintain this part of the code in the future.
I'll try another type of Kubernetes cluster, like kind.

In any case, once we've fixed the golangci-lint errors, let's finally get this in, and then we can figure out the rest.

@metalmatze
Copy link
Member

hm, I tried running the mimir integration on my MacBook now with Docker for Mac on their Kubernetes. Same error.
Is the README missing a single step to get the ring up and running?

@msvechla
Copy link
Contributor Author

@metalmatze can you try:

kubectl port-forward -n mimir svc/mimir-ingester 8080:8080

and visit http://localhost:8080/ingester/ring in your browser?

There should only be a single instance in your ring.

@metalmatze
Copy link
Member

Thank you for sending the instructions for the ingester ring debugging.
It seems like everything is fine on that side. The query part still doesn't work.

Let's merge this PR and I'll try to figure out what's happening next. Thankfully, being part of Prometheus team, I know many Mimir maintainers, I'm going to ask.

Thank you so much for this incredible contribution!!!

@metalmatze metalmatze merged commit 03758b0 into pyrra-dev:main Oct 16, 2024
9 of 10 checks passed
@msvechla
Copy link
Contributor Author

msvechla commented Oct 17, 2024 via email

@metalmatze
Copy link
Member

Makes sense.
I'll take a look if there are some other smaller things we can get in and then cut v0.8! 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants