-
Notifications
You must be signed in to change notification settings - Fork 332
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Explain the memory use of the default cert-manager installation
Signed-off-by: Richard Wall <[email protected]>
- Loading branch information
Showing
3 changed files
with
81 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,77 @@ | ||
--- | ||
title: Deploying cert-manager on Large Clusters | ||
description: | | ||
Learn how to optimize cert-manager for deployment on large clusters, | ||
with thousands of Certificate and Secret resources. | ||
--- | ||
|
||
Learn how to optimize cert-manager for deployment on large clusters, | ||
with thousands of Certificate and Secret resources. | ||
|
||
## Overview | ||
|
||
The defaults in the Helm chart or YAML manifests are intended for general use. | ||
You will need to modify the configuration if your Kubernetes cluster has thousands of Certificate resources and TLS Secrets. | ||
|
||
## Memory | ||
|
||
### Recommendations | ||
|
||
Here are some `memory.request` recommendations for each of the cert-manager components in different scenarios: | ||
|
||
| Scenario | controller (Mi) | cainjector (Mi) | webhook (Mi) | | ||
|----------------------------|-----------------|-----------------|--------------| | ||
| 2000 RSA 4096 Certificates | 350 | 150 | 50 | | ||
|
||
> 📖️ Read [What Everyone Should Know About Kubernetes Memory Limits](https://home.robusta.dev/blog/kubernetes-memory-limit), | ||
> to learn why the best practice is to set memory limit equal to memory request. | ||
### Rationale | ||
|
||
**When Certificate resources are the dominant use-case**, | ||
such as when workloads need to mount the TLS Secret or when gateway-shim is used, | ||
the memory consumption of the cert-manager controller will be roughly | ||
proportional to the total size of those Secret resources that contain the TLS | ||
key pairs. | ||
Why? Because the cert-manager controller caches the entire content of these Secret resources in memory. | ||
If large TLS keys are used (e.g. RSA 4096) the memory use will be higher than if smaller TLS keys are used (e.g. ECDSA). | ||
|
||
The other Secrets in the cluster, such as those used for Helm chart configurations or for other workloads, | ||
will not significantly increase the memory consumption, because cert-manager will only cache the metadata of these Secrets. | ||
|
||
**When CertificateRequest resources are the dominant use-case**, | ||
such as with csi-driver or with istio-csr, | ||
the memory consumption of the cert-manager controller will be much lower, | ||
because there will be fewer TLS Secrets and fewer resources to be cached. | ||
|
||
### Evidence | ||
|
||
This chart shows the memory consumption of the cert-manager controller (1.14) | ||
during an experiment where 2000 RSA 4096 Certificate are created, signed and | ||
then deleted. | ||
|
||
<img src="/docs/devops-tips/large-clusters/default-memory-1.png" alt="Scatter chart showing cert-manager memory usage and cluster resource counts over time" /> | ||
|
||
The pattern of memory consumption can be explained as follows: | ||
|
||
1. `0min`: `~50MiB`: There are 0 Certificates. | ||
There are 13 incidental Secret resources which are: | ||
Helm chart configuration Secrets, and | ||
other Secrets of the metrics-server and Prometheus stack, which are also installed in the test cluster. | ||
All the cert-manager Deployments were restarted before the experiment and the components have only cached cached resources. | ||
1. `33min`: `~260MiB`: All 2000 Certificate resources have been reconciled. | ||
Every Certificate now has a corresponding CertificateRequest and TLS Secret. | ||
There are `~3600` Secret resources -- `~1600` more than can be explained by the 2000 TLS Secrets. | ||
**Why?** | ||
Possibly because cert-manager creates a temporary Secret resource for each Certificate. | ||
The temporary Secret is where cert-manager stores the private key when it is first generated. | ||
After the TLS certificate has been signed, the temporary Secret is deleted. | ||
1. `40min`: `~280MiB`: When remaining temporary Secret resources are being deleted. | ||
This causes a spike in memory. **Why?**. | ||
1. `42min`: `~225MiB`: The Go garbage collector eventually frees the memory which had been allocated for the recently deleted Secrets. | ||
1. `46min`: `~300MiB`: The Certificates and Secrets are now being rapidly deleted. | ||
This causes another spike in memory. **Why?**. | ||
1. `48min`: `~280MiB`: All the Certificate, CertificateRequest and Secret resources have now been deleted, | ||
but the memory consumption remains at roughly the peak size. | ||
**Why?** | ||
The memory which had been allocated to cache the resources is not immediately freed. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.