diff --git a/content/docs/devops-tips/large-clusters.md b/content/docs/devops-tips/large-clusters.md index b64c77d778..9094ef6a70 100644 --- a/content/docs/devops-tips/large-clusters.md +++ b/content/docs/devops-tips/large-clusters.md @@ -13,6 +13,46 @@ with thousands of Certificate and Secret resources. The defaults in the Helm chart or YAML manifests are intended for general use. You will need to modify the configuration if your Kubernetes cluster has thousands of Certificate resources and TLS Secrets. +## CPU + + + +### Recommendations + +#### Disable client-side rate limiting for Kubernetes API requests + +By default cert-manager throttles the rate of requests to the Kubernetes API server. +Historically this was intended to prevent cert-manager from overwhelming the Kubernetes API server, +but modern versions of Kubernetes implement [API Priority and Fairness](https://kubernetes.io/docs/concepts/cluster-administration/flow-control/), +which obviates the need for client side throttling. +Disable the client-side rate limiter as follows: + +```yaml +config: + apiVersion: controller.config.cert-manager.io/v1alpha1 + kind: ControllerConfiguration + kubernetesAPIQPS: -1 + kubernetesAPIBurst: -1 +``` + +> 📖 Read [API documentation for ControllerConfiguration](https://cert-manager.io/docs/reference/api-docs/#controller.config.cert-manager.io%2fv1alpha1). +> +> 📖 Read [kubernetes#111880: Disable client-side rate-limiting when AP&F is enabled](https://github.com/kubernetes/kubernetes/issues/111880). +> +> 📖 Read client-go source code to [understand why negative QPS disables the rate limiter](https://github.com/kubernetes/kubernetes/blob/6813625b7cd706db5bc7388921be03071e1a492d/staging/src/k8s.io/client-go/rest/config.go#L351-L364). +> +> 🔗 Examples of other projects that disable client-side rate limiting are: [FluxCD](https://github.com/fluxcd/pkg/issues/269). + +### Rationale + +### Evidence + +Default: +Scatter chart showing cert-manager CPU usage and cluster resource counts over time with default cert-manager configuration + +Client rate-limit disabled: +Scatter chart showing cert-manager CPU usage and cluster resource counts over time with client-side rate-limiter disabled + ## Memory ### Recommendations diff --git a/public/docs/devops-tips/large-clusters/default-cpu-1.png b/public/docs/devops-tips/large-clusters/default-cpu-1.png new file mode 100644 index 0000000000..28c472353c Binary files /dev/null and b/public/docs/devops-tips/large-clusters/default-cpu-1.png differ diff --git a/public/docs/devops-tips/large-clusters/default-cpu-2.png b/public/docs/devops-tips/large-clusters/default-cpu-2.png new file mode 100644 index 0000000000..a17b4e1344 Binary files /dev/null and b/public/docs/devops-tips/large-clusters/default-cpu-2.png differ