-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use scrapeconfig instead of servicemonitor #607
Conversation
- kind: ScrapeConfigs | ||
name: scrapeconfigs.monitoring.coreos.com | ||
version: v1alpha1 | ||
- kind: ServiceMonitors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was missing previously, and I'm not 100% sure the effects. My guess is it would affected garbage collection in the operator-sdk, preventing these servicemonitors from being cleaned up automatically when the servicetelemetry object is deleted.
Build failed (check pipeline). Post https://review.rdoproject.org/zuul/buildset/f801d6c3273d471990d68e8f495dbc93 ❌ stf-crc-ocp_412-local_build FAILURE in 38m 04s |
I believe the smoketest failures are related to label changes. I'll take a closer look at this locally and see how bad it is (I see a lot of labels in that part of the code) |
I adjusted the labels to exactly match what we were producing before and the smoketests passed locally. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good Chris, thanks! Considering we are removing the service monitors, what happen in brownfield deployments? Do the existing STF continue to run without change and the new scrape config component is deployed with new STF deployments? Thanks!
No. The code as-is deploys the new ScrapeConfig objects, and then deletes the old ServiceMonitor objects[1] Upgrade testing will be required, but I don't forsee any difficulties. The labels of the new metrics exactly match the old ones. [1] https://github.com/infrawatch/service-telemetry-operator/pull/607/files#diff-6b852f8c4b44841b7ddd16194f6304203cf7c6b134b7a008a0fc4b3c043b4d1cR87 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise LGTM
Co-authored-by: Jaromír Wysoglad <[email protected]>
Build failed (check pipeline). Post https://review.rdoproject.org/zuul/buildset/1f17428efe534a5b87a77ec40cc61ec4 ❌ stf-crc-ocp_412-local_build FAILURE in 35m 42s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didnt' test this, but it seems lgtm to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
recheck |
This allows us to create a prometheus config where targets are specified by DNS name, rather than IP addresses.
Solves a problem discovered here: #599 (comment)
Risks:
observability_strategy: use_community
on default deployments of OCP <= 4.14 (at least)observability_strategy: use_redhat
w/ obo-prometheus-operator from COO 0.2.0 works fineto ensure metrics labeling compatibility with all the other parts of the system, orimpact on the (deprecated) HA mode