-
Notifications
You must be signed in to change notification settings - Fork 172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add agent injector telemetry #703
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This worked well when I was running it locally, just left a couple thoughts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
@@ -142,11 +151,20 @@ func (h *Handler) Handle(w http.ResponseWriter, r *http.Request) { | |||
msg := fmt.Sprintf("error marshalling admission response: %s", err) | |||
http.Error(w, msg, http.StatusInternalServerError) | |||
h.Log.Error("error on request", "Error", msg, "Code", http.StatusInternalServerError) | |||
incrementInjectionFailures(admReq.Request.Namespace) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may also want to call incrementInjectionFailures()
for the error returns further up in this function too? But this should suffice as-is; I could even see those being a different error metric added in the future.
Add Prometheus metrics to monitor the Agent Injector's performance. New metrics include a gauge of current requests being processed by the webhook, a summary of request processing times, and a count of successful and failed injections by Kubernetes namespace. Successful injections are broken down by injection type. The `injection_type` label can assume the value `init_only` for injections with only an initContainer (no sidecar) and `sidecar` for all other cases (sidecar only or sidecar + initContainer). Fixes AG-005161.
a2ec614
to
f3f6b3a
Compare
Force-pushed:
|
Update the `Mutate()` method to return a struct that extends the existing return data (AdmissionResponse) with metadata on the types of Vault Agent injections made. The metadata informs the count of injections by namespace, which are now further broken down by type of injection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking great, just a couple minor thoughts (but nothing to hold this up over).
I was also playing around with ways to test the logic in incrementInjections(), and it looks like we could use prometheus' testutil something like this: https://gist.github.com/tvoran/b6ec51e716fd10efe7bb1847def8e98a
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good. I have included some additional feedback for your consideration. Thanks!
Subsystem: metricsSubsystem, | ||
Name: "injections_by_namespace_total", | ||
Help: "Total count of Agent Sidecar injections by namespace", | ||
}, []string{metricsLabelNamespace, metricsLabelType}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose that the number of namespaces should be relatively low, so we should be okay with this high cardinality metric.
requestQueue.Inc() | ||
requestStart := time.Now() | ||
defer func() { | ||
requestProcessingTime.Observe(float64(time.Since(requestStart).Milliseconds())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if would be worth tracking the failures as well. We do something like this in VSO here: https://github.com/hashicorp/vault-secrets-operator/blob/6ab056c22acb5dc4812620233e3b141fb9e02f9c/vault/client.go#L810
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Add Prometheus metrics to monitor the Agent Injector's performance. New metrics include a gauge of current requests being processed by the webhook, a summary of request processing times, and a count of successful and failed injections by Kubernetes namespace.
Fixes AG-005161.