-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Emit metrics for requests to the Kubernetes control plane #118
Conversation
|
||
resp, err := k.next.RoundTrip(req) | ||
duration := time.Since(start) | ||
instrument.ObserveWithExemplar(req.Context(), k.hist.WithLabelValues(req.URL.EscapedPath(), req.Method, strconv.Itoa(resp.StatusCode)), duration.Seconds()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't love using the URL path as a label, I'm concerned about a cardinality explosion here. I'd much rather have something like the object type, so we have metrics for operations like "patch any StatefulSet" and "list all Pods" - but there doesn't seem to be a reliable way to get that information at this level.
One mitigating factor is that there aren't too many unique URLs that a rollout-operator instance will call. In my testing, this results in a set of buckets for each of:
- GET pods in namespace
- GET certificate
- GET validating webhook configurations
- GET mutating webhook configurations
- GET statefulsets in namespace
- PATCH statefulset (one set per statefulset)
One option would be to add some custom instrumentation around each client API call, but this seems easy to forget. Another option that would reduce the impact would be to use native histograms for this.
Interested in opinions and other ideas from reviewers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could have a pre-defined subset of URLs/paths that we want to track (such as the ones listed), and "other" for ones we don't care about or expect. We could also take a subsection of the path to group similar ones together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand the hesitation to include URLs as a label but like you mentioned, it's a small and known number of URLs and we only run a single rollout-operator per namespace. I'm not too worried about the cardinality but Joshua's suggestion seems pretty reasonable if you are.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can go with what's here for now and revisit it if it turns out to be worse than expected.
|
||
resp, err := k.next.RoundTrip(req) | ||
duration := time.Since(start) | ||
instrument.ObserveWithExemplar(req.Context(), k.hist.WithLabelValues(req.URL.EscapedPath(), req.Method, strconv.Itoa(resp.StatusCode)), duration.Seconds()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could have a pre-defined subset of URLs/paths that we want to track (such as the ones listed), and "other" for ones we don't care about or expect. We could also take a subsection of the path to group similar ones together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
||
resp, err := k.next.RoundTrip(req) | ||
duration := time.Since(start) | ||
instrument.ObserveWithExemplar(req.Context(), k.hist.WithLabelValues(req.URL.EscapedPath(), req.Method, strconv.Itoa(resp.StatusCode)), duration.Seconds()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand the hesitation to include URLs as a label but like you mentioned, it's a small and known number of URLs and we only run a single rollout-operator per namespace. I'm not too worried about the cardinality but Joshua's suggestion seems pretty reasonable if you are.
"github.com/prometheus/client_golang/prometheus" | ||
"github.com/prometheus/client_golang/prometheus/promauto" | ||
"k8s.io/client-go/rest" | ||
"net/http" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you reorg this imports how we do in Mimir? stdlib, 3rd party, local packages
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I opened the issue #120 since this has shown up in other PRs as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in b5eae5a
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR adds support for emitting latency metrics for requests to the Kubernetes control plane.