Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce cardinality of metrics emitted for requests to the Kubernetes control plane #123

Merged
merged 2 commits into from
Jan 18, 2024

Conversation

charleskorn
Copy link
Contributor

@charleskorn charleskorn commented Jan 18, 2024

The metrics added in #118 turned out to produce many more series than expected (see discussion here - I missed the pod deletion behaviour in my testing).

For example, in a very small Mimir development cluster with 2 compactors, 9 store-gateways and 21 ingesters, the rollout-operator was emitting 44 unique method / path combinations after a rollout:

DELETE /api/v1/namespaces/the-namespace/pods/compactor-0
DELETE /api/v1/namespaces/the-namespace/pods/compactor-1
DELETE /api/v1/namespaces/the-namespace/pods/ingester-zone-a-0
DELETE /api/v1/namespaces/the-namespace/pods/ingester-zone-a-1
DELETE /api/v1/namespaces/the-namespace/pods/ingester-zone-a-2
DELETE /api/v1/namespaces/the-namespace/pods/ingester-zone-a-3
DELETE /api/v1/namespaces/the-namespace/pods/ingester-zone-a-4
DELETE /api/v1/namespaces/the-namespace/pods/ingester-zone-a-5
DELETE /api/v1/namespaces/the-namespace/pods/ingester-zone-a-6
DELETE /api/v1/namespaces/the-namespace/pods/ingester-zone-b-0
DELETE /api/v1/namespaces/the-namespace/pods/ingester-zone-b-1
DELETE /api/v1/namespaces/the-namespace/pods/ingester-zone-b-2
DELETE /api/v1/namespaces/the-namespace/pods/ingester-zone-b-3
DELETE /api/v1/namespaces/the-namespace/pods/ingester-zone-b-4
DELETE /api/v1/namespaces/the-namespace/pods/ingester-zone-b-5
DELETE /api/v1/namespaces/the-namespace/pods/ingester-zone-b-6
DELETE /api/v1/namespaces/the-namespace/pods/ingester-zone-c-0
DELETE /api/v1/namespaces/the-namespace/pods/ingester-zone-c-1
DELETE /api/v1/namespaces/the-namespace/pods/ingester-zone-c-2
DELETE /api/v1/namespaces/the-namespace/pods/ingester-zone-c-3
DELETE /api/v1/namespaces/the-namespace/pods/ingester-zone-c-4
DELETE /api/v1/namespaces/the-namespace/pods/ingester-zone-c-5
DELETE /api/v1/namespaces/the-namespace/pods/ingester-zone-c-6
DELETE /api/v1/namespaces/the-namespace/pods/store-gateway-zone-a-0
DELETE /api/v1/namespaces/the-namespace/pods/store-gateway-zone-a-1
DELETE /api/v1/namespaces/the-namespace/pods/store-gateway-zone-a-2
DELETE /api/v1/namespaces/the-namespace/pods/store-gateway-zone-b-0
DELETE /api/v1/namespaces/the-namespace/pods/store-gateway-zone-b-1
DELETE /api/v1/namespaces/the-namespace/pods/store-gateway-zone-b-2
DELETE /api/v1/namespaces/the-namespace/pods/store-gateway-zone-c-0
DELETE /api/v1/namespaces/the-namespace/pods/store-gateway-zone-c-1
DELETE /api/v1/namespaces/the-namespace/pods/store-gateway-zone-c-2
GET /api/v1/namespaces/the-namespace/pods
GET /api/v1/namespaces/the-namespace/secrets/rollout-operator-self-signed-certificate
GET /apis/admissionregistration.k8s.io/v1/mutatingwebhookconfigurations
GET /apis/admissionregistration.k8s.io/v1/validatingwebhookconfigurations
GET /apis/apps/v1/namespaces/the-namespace/statefulsets
PUT /apis/apps/v1/namespaces/the-namespace/statefulsets/compactor/status
PUT /apis/apps/v1/namespaces/the-namespace/statefulsets/ingester-zone-a/status
PUT /apis/apps/v1/namespaces/the-namespace/statefulsets/ingester-zone-b/status
PUT /apis/apps/v1/namespaces/the-namespace/statefulsets/ingester-zone-c/status
PUT /apis/apps/v1/namespaces/the-namespace/statefulsets/store-gateway-zone-a/status
PUT /apis/apps/v1/namespaces/the-namespace/statefulsets/store-gateway-zone-b/status
PUT /apis/apps/v1/namespaces/the-namespace/statefulsets/store-gateway-zone-c/status

Each of these combinations emits a classic histogram with 17 series each, for a total of 748 series.

It's not uncommon to run clusters with hundreds of ingesters and store-gateways, so this is not sustainable, and it's not necessary either - we're most interested in understanding the performance of a particular kind of request, not the performance of requests for a single specific object.

This PR reduces the cardinality of metrics emitted by grouping equivalent requests together. For example, all pod delete requests would be emitted with path="core/v1/pods object".

The Kubernetes API follows a fairly rigid pattern for URLs, documented here, so the changes in this PR use that pattern to parse the URL and format it for the metric label.

…duration_seconds metric by grouping requests to the same endpoint
@charleskorn charleskorn force-pushed the charleskorn/reduce-cardinality branch from bb6037f to c6578ee Compare January 18, 2024 03:32
@charleskorn charleskorn marked this pull request as ready for review January 18, 2024 03:38
@charleskorn charleskorn requested a review from a team as a code owner January 18, 2024 03:38
version := match[pattern.SubexpIndex("version")]
resourceType := match[pattern.SubexpIndex("type")]
name := match[pattern.SubexpIndex("name")]
subresourceType := match[pattern.SubexpIndex("subresource")]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could consider each of these as their own label, but obviously that has other cardinality considerations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall cardinality would be the same (unless we end up creating series for the combinations that aren't used), but I think it's OK as it is - we can always change it later if we need to.

@charleskorn charleskorn merged commit 0030218 into main Jan 18, 2024
6 checks passed
@charleskorn charleskorn deleted the charleskorn/reduce-cardinality branch January 18, 2024 22:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants