RHINENG-14720: Add metrics for counting host publication checks and creations #2127

thearifismail · 2024-12-10T21:50:52Z

rh-pre-commit.version: 2.2.0
rh-pre-commit.check-secrets: ENABLED

Overview

This PR is being created to address RHINENG-14720.

PR Checklist

Secure Coding Practices Documentation Reference

You can find documentation on this checklist here.

Secure Coding Checklist

rh-pre-commit.version: 2.2.0 rh-pre-commit.check-secrets: ENABLED

computercamplove · 2024-12-11T10:51:06Z

/retest

chambridge · 2024-12-11T14:57:56Z

I'm not sure that adding prometheus metrics into this job will accomplish the goal here. This job doesn't have an API layer with a metrics endpoint that would be configured to be scrapped by Prometheus, so these metrics will not be consumed. They only way to introduce the metrics would be to push the metrics to a prometheus push gateway. While that is possible, I think its simpler to just utilize already embedded Kubernetes metrics that job status (specifically failure) with kube_job_status_failed. When a traceback occurs or the exit(1) happens due to an inactive replication slot the failed count will increase for the job. So an alert can be driven off of this with something like the following:

sum(kube_job_status_failed{namespace="<your-namespace>"}) - sum(kube_job_status_failed{namespace="<your-namespace>"} offset 10m) > 0

thearifismail · 2024-12-11T16:11:50Z

@chambridge I forgot to add the http_server to serve metrics. Yes I agree using the K8S provided metrics is easier as less work. I plan to use include job_name also to narrow down the error source.

sum(kube_job_status_failed{namespace="host-inventory-prod", job_name=~"syndicator-.*"}) - sum(kube_job_status_failed{namespace="host-inventory-prod", job_name=~"syndicator-.*"} offset 10m) > 0

Add metrics for counting host publication checks and creations

fa41eb1

rh-pre-commit.version: 2.2.0 rh-pre-commit.check-secrets: ENABLED

thearifismail requested a review from a team as a code owner December 10, 2024 21:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RHINENG-14720: Add metrics for counting host publication checks and creations #2127

RHINENG-14720: Add metrics for counting host publication checks and creations #2127

thearifismail commented Dec 10, 2024

computercamplove commented Dec 11, 2024

chambridge commented Dec 11, 2024

thearifismail commented Dec 11, 2024 •

edited

Loading

RHINENG-14720: Add metrics for counting host publication checks and creations #2127

Are you sure you want to change the base?

RHINENG-14720: Add metrics for counting host publication checks and creations #2127

Conversation

thearifismail commented Dec 10, 2024

Overview

PR Checklist

Secure Coding Practices Documentation Reference

Secure Coding Checklist

computercamplove commented Dec 11, 2024

chambridge commented Dec 11, 2024

thearifismail commented Dec 11, 2024 • edited Loading

thearifismail commented Dec 11, 2024 •

edited

Loading