You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've searched the issue queue to verify this is not a duplicate feature request.
[] I've pasted the output of kargo version, if applicable.
I've pasted logs, if applicable.
Proposed Feature
Add internal metrics for API and Controller services (potentially other services).
API internal metrics:
api.request - count amount of requests to the API server (potential labels: http_status:xxx, grpc_method:, failed:true|false)
Controller internal metrics:
controller.reconcile - count controller reconcile invocations (potential labels: namesapce, failed:true|false)
get a warehouse in the project (potential labels: namespace, failed:true|false)
Hi, I’ve also looked for any available metrics but haven’t found anything so far.
btw, Is there a recommended approach for observing promotion failures? For example, if a promotion fails during a git-clone step, how can we receive an alert or notification for that failure?
@semenar-0#3639 is tracking general failure-handling. Notification functionality isn't directly implemented in Kargo, but you can easily use an http step to fire off a message to Slack, for instance.
If you have further questions about that, please open a separate issue or discussion so that we don't derail this thread.
Checklist
kargo version
, if applicable.Proposed Feature
Add internal metrics for API and Controller services (potentially other services).
API internal metrics:
api.request
- count amount of requests to the API server (potential labels:http_status:xxx
,grpc_method:
,failed:true|false
)Controller internal metrics:
controller.reconcile
- count controller reconcile invocations (potential labels:namesapce
,failed:true|false
)get a warehouse in the project (potential labels:
namespace
,failed:true|false
)controller.discover_artifacts
- count amount of discover artifact attempts (potential labels:namepsace
,type: git|helm|image
,failed: true|false
)controller.build_frieght_of_latest_artifact
- the amount of fright built from the discovered artifacts (type: meter)Motivation
Internal metrics allow the monitoring of the health of Kargo and provide the ability to create alerts in Prometheus/Datadog.
The text was updated successfully, but these errors were encountered: