Use scrapeconfig instead of servicemonitor #607

csibbitt · 2024-06-19T16:15:54Z

This allows us to create a prometheus config where targets are specified by DNS name, rather than IP addresses.

Solves a problem discovered here: #599 (comment)

Risks:

This may not work when using observability_strategy: use_community on default deployments of OCP <= 4.14 (at least)
- the prometheus operator from openshift-monitoring looks new enough, but the scrapeconfig CRD was missing in my test environments
- observability_strategy: use_redhat w/ obo-prometheus-operator from COO 0.2.0 works fine
I haven't done additional testing, for example ~~to ensure metrics labeling compatibility with all the other parts of the system, or~~ impact on the (deprecated) HA mode

csibbitt · 2024-06-19T16:37:20Z

...g/service-telemetry-operator/manifests/service-telemetry-operator.clusterserviceversion.yaml

+      - kind: ScrapeConfigs
+        name: scrapeconfigs.monitoring.coreos.com
+        version: v1alpha1
+      - kind: ServiceMonitors


This was missing previously, and I'm not 100% sure the effects. My guess is it would affected garbage collection in the operator-sdk, preventing these servicemonitors from being cleaned up automatically when the servicetelemetry object is deleted.

softwarefactory-project-zuul · 2024-06-19T18:39:16Z

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/f801d6c3273d471990d68e8f495dbc93

❌ stf-crc-ocp_412-local_build FAILURE in 38m 04s
❌ stf-crc-ocp_414-local_build FAILURE in 37m 39s
❌ stf-crc-ocp_412-local_build-index_deploy FAILURE in 47m 40s
❌ stf-crc-ocp_414-local_build-index_deploy FAILURE in 47m 35s

csibbitt · 2024-06-20T14:17:37Z

I believe the smoketest failures are related to label changes. I'll take a closer look at this locally and see how bad it is (I see a lot of labels in that part of the code)

csibbitt · 2024-06-20T15:41:13Z

I adjusted the labels to exactly match what we were producing before and the smoketests passed locally.

vkmc

Looks good Chris, thanks! Considering we are removing the service monitors, what happen in brownfield deployments? Do the existing STF continue to run without change and the new scrape config component is deployed with new STF deployments? Thanks!

csibbitt · 2024-06-27T13:56:43Z

Looks good Chris, thanks! Considering we are removing the service monitors, what happen in brownfield deployments? Do the existing STF continue to run without change and the new scrape config component is deployed with new STF deployments? Thanks!

No. The code as-is deploys the new ScrapeConfig objects, and then deletes the old ServiceMonitor objects[1]
If users have manually configured additional servicemonitors via the "servicemonitor_manifest" option[2], then they will be preserved.

Upgrade testing will be required, but I don't forsee any difficulties. The labels of the new metrics exactly match the old ones.

[1] https://github.com/infrawatch/service-telemetry-operator/pull/607/files#diff-6b852f8c4b44841b7ddd16194f6304203cf7c6b134b7a008a0fc4b3c043b4d1cR87
[2] https://github.com/infrawatch/service-telemetry-operator?tab=readme-ov-file#overriding-default-manifests

vyzigold

Otherwise LGTM

roles/servicetelemetry/tasks/component_scrapeconfig.yml

Co-authored-by: Jaromír Wysoglad <[email protected]>

softwarefactory-project-zuul · 2024-07-08T05:23:19Z

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://review.rdoproject.org/zuul/buildset/1f17428efe534a5b87a77ec40cc61ec4

❌ stf-crc-ocp_412-local_build FAILURE in 35m 42s
✔️ stf-crc-ocp_414-local_build SUCCESS in 32m 58s
✔️ stf-crc-ocp_412-local_build-index_deploy SUCCESS in 42m 54s
✔️ stf-crc-ocp_414-local_build-index_deploy SUCCESS in 35m 49s

change commited

paramite

Didnt' test this, but it seems lgtm to me.

vkmc

LGTM

vkmc · 2024-07-10T14:35:23Z

recheck

csibbitt added 2 commits June 18, 2024 13:16

Move from ServiceMonitor to ScrapeConfig

aa08e77

Clean up legacy servicemonitors

4756295

csibbitt added the do-not-merge Code is not ready to be merged label Jun 19, 2024

csibbitt requested review from paramite, vkmc and vyzigold June 19, 2024 16:16

csibbitt commented Jun 19, 2024

View reviewed changes

Merge branch 'master' into csibbitt_STF-1776_ipv6-scraping

9f59209

Emulate servicemonitor-compatible labels

d6ec624

csibbitt removed the do-not-merge Code is not ready to be merged label Jun 26, 2024

vkmc reviewed Jun 27, 2024

View reviewed changes

vyzigold previously requested changes Jun 28, 2024

View reviewed changes

roles/servicetelemetry/tasks/component_scrapeconfig.yml Outdated Show resolved Hide resolved

Update roles/servicetelemetry/tasks/component_scrapeconfig.yml

7105d22

Co-authored-by: Jaromír Wysoglad <[email protected]>

paramite approved these changes Jul 10, 2024

View reviewed changes

vkmc approved these changes Jul 10, 2024

View reviewed changes

csibbitt enabled auto-merge (squash) July 10, 2024 16:40

csibbitt merged commit 43726ee into master Jul 10, 2024
10 checks passed

csibbitt deleted the csibbitt_STF-1776_ipv6-scraping branch July 10, 2024 20:10

vyzigold mentioned this pull request Jul 17, 2024

Add kube-state-metrics service openstack-k8s-operators/telemetry-operator#337

Closed

elfiesmelfie mentioned this pull request Jul 26, 2024

Create our own token for prometheus-stf SA #623

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use scrapeconfig instead of servicemonitor #607

Use scrapeconfig instead of servicemonitor #607

csibbitt commented Jun 19, 2024 •

edited

Loading

csibbitt Jun 19, 2024

softwarefactory-project-zuul bot commented Jun 19, 2024

csibbitt commented Jun 20, 2024

csibbitt commented Jun 20, 2024

vkmc left a comment

csibbitt commented Jun 27, 2024

vyzigold left a comment

softwarefactory-project-zuul bot commented Jul 8, 2024

paramite left a comment

vkmc left a comment

vkmc commented Jul 10, 2024

Use scrapeconfig instead of servicemonitor #607

Use scrapeconfig instead of servicemonitor #607

Conversation

csibbitt commented Jun 19, 2024 • edited Loading

csibbitt Jun 19, 2024

Choose a reason for hiding this comment

softwarefactory-project-zuul bot commented Jun 19, 2024

csibbitt commented Jun 20, 2024

csibbitt commented Jun 20, 2024

vkmc left a comment

Choose a reason for hiding this comment

csibbitt commented Jun 27, 2024

vyzigold left a comment

Choose a reason for hiding this comment

softwarefactory-project-zuul bot commented Jul 8, 2024

paramite left a comment

Choose a reason for hiding this comment

vkmc left a comment

Choose a reason for hiding this comment

vkmc commented Jul 10, 2024

csibbitt commented Jun 19, 2024 •

edited

Loading