Also check shorter version
- Structure
- Wording
- Production readiness criteria
- Guidelines
The document consists of two big parts: criteria grouped in themes and guidelines that solving some of the problems the criteria defined. The document intends to help developers to create a component that the platform engineer would be able to operate using existing Kubernetes patterns.
Component - a single application. Platform engineer - a person responsible for deploying and operating Cloud Foundry.
Ability to provide an acceptable level of service in the face of faults.
Guidelines:
Guidelines:
Guidelines:
Guidelines:
Guidelines:
Guidelines:
Guidelines:
Guidelines:
- Passing configuration to the application
- Pod service account
- Using certificates
- Pod port names
- Networking policy
- Istio RBAC
- Access outside the cluster
- Using secrets
- Service accounts
As a platform operator, I can upgrade the Kubernetes cluster with the minimum application downtime.
As a platform operator, I can upgrade Cloud Foundry with minimum control plane downtime.
Guidelines:
- Health checks
- Rollback
- Work with signals
- Resilience to cluster upgrades
- Storing dependencies on the image
- Image referenced by sha256
- Number of containers
- Number of init containers
- Pod disruption budget
Guidelines:
- Non-root user
- Storing dependencies on the image
- Keeping base image up to date
- Pod service account
- Service account token
- Pod security configuration
- Pod security policy
- Networking policy
- Istio RBAC
- Service accounts
As an open-source contributor, I can submit a patch for the component(that passing tests).
As an open-source platform architect, I can consume the component.
Guidelines:
- Image metadata
- Base image
- Image location
- Packaging instructions
- Networking policy
- Istio RBAC
- Access outside the cluster
- Using secrets
Every component must have an endpoint with health information or crash if it is unhealthy.
Improves:
- operability.
- upgrades by allowing using blue-green deployments.
Read more:
All logs must go in stdout/stderr
Improves:
Reasons:
- Kubernetes handles the stdout and stderr output of the pod. It saves the logs to the known disk location and the operator can later check the logs from the previous version of the pod for some time after the restart.
- One of the security recommendations for Kubernetes applications is to use ReadOnlyRootFilesystem and as a result do not write anything on the disk.
The component must be able to use up-to-date configuration.
Possible solutions:
-
Read the config from the config files. Ideally, a
conf.d
style config dir so that you can compose the config from multiple configmaps and secrets. The application should check for the file changes and update its config when it changes. -
The configs are immutable and the component only consumes new configuration through deploying between ConfigMaps and Secrets.
-
The configs are stored in config files and when the config is updated, the liveness probe starts to fail that requires Kubernetes to restart the pod.
- Problem: It is harder to achieve high availability here.
-
The configs are stored in config files. When the config is updated, the pod gets restarted.
- Problem: The pod requires access to Kubernetes API.
-
The hash of the config will be stored in the pod specification later.
- Problem: The platform engineer needs to know how to do the manual config update.
-
The configuration is passed via command-line flags in the pod specification
- Problem: It is not possible to use the Kubernetes secrets.
Improves:
The application with version n-1 should be able to start with the database that has been migrated to version n
Improves:
Reason:
It is very easy to implement blue-green deployment with the Kubernetes object and do the rollback. Ability to work with a previous version gives the platform engineer the ability to rollback using the kubectl rollout undo
command.
See also:
Kubernetes documentation about deployments
The application must respect SIGTERM signal and start sending NotReady probe
Improves:
See also:
The component is either expected to work during K8s control plane downtime or has a clear notice README that to achieve high availability, the control plane of Kubernetes must have multiple replicas.
Improves:
All components should run with a non-root user unless it is completely impossible, including “FROM scratch” images. The user UID should not be 1337 due to Istio limitations.
Improves:
See also:
All components images should have labels in the metadata with repo URL and SHA of the commit it the metadata as recommended by OCI
Improves:
Reasons:
- This allows the operator to check if the component has a CVE.
- This also helps scanners that work off of artefact metadata to determine source code/provenance (e.g. OSL, security)
See also:
Eirini docker image as an example
The default base image should come from cloudfoundry/stacks
. All components should have the possibility to change the base image
Improves:
Reasons:
- The base layer for the images should be the same.
- Other Cloud Foundry Foundation members want to build their own base images.
The run image should not have packages required for compilation, only for running. i.e. don’t have Go package or JDK in the final image, For java, only JRE should be shipped. It should be possible to get a list of current dependencies installed inside.
Improves:
Reasons:
- It takes more time to pull a new version of the image in case of scaling, upgrade or recovery.
- Images stored on the worker node disk and might require bigger disks.
- The unneeded packages require upgrading it more often.
- The operator needs to know if the packages in the image up-to-date or not.
See also:
Kubernetes best practices by Google Cloud
Images are continuously updated with the new version of the base layer.
Improves:
Images are stored in the CFF organisation under Dockerhub.
Improves:
The component has clear instructions on how to build its container image
Improves:
The image references must always include sha256.
Reasons:
The tags in Docker registries are mutable, this can cause two different versions of the application to run on the cluster due to node restart. Sha256 provides immutable version. Both tags and versions can be used at the same time.
Improves:
- Kubernetes cluster upgrades
- Availability by starting the same version of the component during the recovery event
See also:
Kubernetes Configuration Best Practices
The component should have the labels that suggested by Kubernetes, for example, app.kubernetes.io/name
, app.kubernetes.io/version
. The app.kubernetes.io/part-of
is set to Cloud Foundry
Improves:
The readiness probe for the main container must be always present.
Improves:
Reasons:
- the pod won't serve traffic until it is ready.
- the rolling upgrade will wait for the pod to come up before deleting the existing pod.
- if pod disruption budget is set, the node draining will proceed only if pods are ready.
See also:
Kubernetes documentation about probes
The liveness probe should only fail if the application is in an unrecoverable state and has to be restarted. Ideally, the liveness probe should not be set and the application should crash. If it is present, it should point to a different endpoint than the readiness probe.
Improves:
Reasons:
- Failing liveness probe causes the restart of the pod - this might take more time if the pod is scheduled on the different node.
- Crashing of the pod might increase pressure on the running pods.
See also:
The pod must have as little containers as possible. Ideally, a single pod should have a single container. Most of native Kubernetes deployments have up to three containers.
Improves:
Reasons:
- All the containers are scheduled on a single node. They require more resources and slow down the start of the pod.
- When one container is down, the pod won't be processing requests.
- When the configuration is changed for a single container, the whole pod has to restart.
The non-short-living pod must have up as little init containers as possible. Most of native Kubernetes deployments to 2 init containers in its spec.
Improves:
Reasons:
- The pod executes init containers sequentially and this increases the time for the pod to start. Consider using Kubernetes jobs instead.
- Crashes in init containers do not show up in pod crash count.
- The logs from init containers is impossible to get from the pod after the pod is started.
Each container in a pod must always have configurable CPU & memory requests with sane defaults(required to run 50 applications /start 5 at the same time)
Improves:
Reasons:
- Kubernetes blocks the resources on the node and does not allow to schedule more applications that are possible. Overcommitting slows down the deployment.
See all:
Memory limits are optional, but if they are present they must be at least 50% bigger than requests. CPU limits must be never set.
Improves:
Reasons:
- CPU limits are broken now
- Pod will restart if it uses more memory than allowed
See also:
Each component must have its own service account. It must never use the default
service account.
Improves:
Reason:
This allows attaching pod security policy to the pod.
If the pod does not need access to the Kubernetes API, the service account token is not mounted to it
Improves:
The pod spec should satisfy the restricted pod security policy provided by Kubernetes
- Pod should drop all capabilities.
- Pod should have proper
seccomd
orapparmor
annotation. - Pod should have property readOnlyRootFilesystem=readOnlyRootFilesystem.
- Pod should set
securityContext.runAsNonRoot
property.
Improves:
See also:
If a pod requires TLS/SSL cert/keys for public consumption it must support utilising cert-manager.
Improves:
Reason:
Kubernetes secrets have a special format for certificates and the operators expect the components to use it.
Ports that are exposed by pod must have a name which should be the same as in the corresponding service
Improves:
Reason:
Istio requires proper port names.
The specification allows setting affinity and anti-affinity rules.
Improves:
See also:
All pods should be part of services.
Reason:
The component creates a service if it has to be accessed by other components. Service ports should have the name of format <protocol>[-<suffix>]
, e.g. grpc-api
or tcp-database
. See more in Istio documentation.
Reason:
The service must have the same labels as a pod
Improves:
If the process is expected to have no downtime, it has PodDisruptionBudget
Improves:
Reason:
Metadata from a higher level is ignored when the node is drained. The pod disruption budget prevents too many pods from going down.
See also:
Kubernetes documentation about disruptions
Minimal pod security policy is provided. Ideally, it should be the same (or stricter) as Kubernetes provided policy
Improves:
Sample networking policy is provided
Improves:
Sample Istio RBAC is provided
Improves:
If the component has to be accessed externally, it writes a K8s Ingress resource or a set of Istio VirtualService + Gateway resourcesproviders ingress with free form annotations and the ability to provide a load balancer
Improves:
If the component needs some secrets, it has an option to use an existing Kubernetes object with the predefined format. The Kubernetes object name can be provided by the platform engineer. The secret for certificates uses known K8s format
Improves:
Each component creates and attached its own service account.
Improves:
#### Deployment
Each stateless component is deployed as a deployment
The number of replicas is not specified in the template unless it can only be deployed as a single copy.
Improves:
If the component has a soft dependency(can work without it) on another component, the depending part can be skipped. i.e. Eirini deploys with rootfs patcher, but rootfs patcher can be skipped in the deployment.
The address for the dependent component can always be specified by the platform engineer and has a sane default
Improves:
Reason:
Integration with KubeCF and CF-4-K8s requires using different DNS addresses.
Each non-alpha property that platform engineer can specify is documented in README.
Improves:
Each component is expected to support all supported by CNCF versions of Kubernetes by using correct API specification.
Improves:
Reason:
APIs get deprecated and has to be fixed in advance.
The component must support both Docker and containerd as container runtimes.
Improves:
Reason:
Docker is widely used, containerd implements CRI(container runtime interface) and is used in several public cloud offerings.