Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add internal metrics for API Server and Controller #3701

Open
1 of 2 tasks
boh-dan opened this issue Mar 24, 2025 · 3 comments
Open
1 of 2 tasks

Add internal metrics for API Server and Controller #3701

boh-dan opened this issue Mar 24, 2025 · 3 comments

Comments

@boh-dan
Copy link

boh-dan commented Mar 24, 2025

Checklist

  • I've searched the issue queue to verify this is not a duplicate feature request.
  • [] I've pasted the output of kargo version, if applicable.
  • I've pasted logs, if applicable.

Proposed Feature

Add internal metrics for API and Controller services (potentially other services).
API internal metrics:

  • api.request - count amount of requests to the API server (potential labels: http_status:xxx, grpc_method:, failed:true|false)

Controller internal metrics:

  • controller.reconcile - count controller reconcile invocations (potential labels: namesapce, failed:true|false)
    get a warehouse in the project (potential labels: namespace, failed:true|false)
  • controller.discover_artifacts - count amount of discover artifact attempts (potential labels: namepsace, type: git|helm|image, failed: true|false)
  • controller.build_frieght_of_latest_artifact - the amount of fright built from the discovered artifacts (type: meter)

Motivation

Internal metrics allow the monitoring of the health of Kargo and provide the ability to create alerts in Prometheus/Datadog.

@semenar-0
Copy link

Hi, I’ve also looked for any available metrics but haven’t found anything so far.

btw, Is there a recommended approach for observing promotion failures? For example, if a promotion fails during a git-clone step, how can we receive an alert or notification for that failure?

@krancour
Copy link
Member

krancour commented Mar 26, 2025

@semenar-0 #3639 is tracking general failure-handling. Notification functionality isn't directly implemented in Kargo, but you can easily use an http step to fire off a message to Slack, for instance.

If you have further questions about that, please open a separate issue or discussion so that we don't derail this thread.

@semenar-0
Copy link

Thank you for the response, @krancour. I’ve opened a separate discussion to continue the conversation:
👉 #3724

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants