Kubernetes-idiomatic Cloud Foundry components guidelines

Structure
Wording
Production readiness criteria
Guidelines

Structure

The document consists of two big parts: criteria grouped in themes and guidelines that solving some of the problems the criteria defined. The document intends to help developers to create a component that the platform engineer would be able to operate using existing Kubernetes patterns.

Wording

Component - a single application. Platform engineer - a person responsible for deploying and operating Cloud Foundry.

Production readiness criteria

Resilience

Ability to provide an acceptable level of service in the face of faults.

Availability

Guidelines:

Image referenced by sha256
Liveness probe
Number of containers
Pod requests
Affinity
Replica count

Failure Recovery

Guidelines:

Storing dependencies on the image
Number of containers
Number of init containers

Isolation

Guidelines:

Number of containers
Affinity

Operability

Resource Planning

Guidelines:

Storing dependencies on the image
Number of containers
Pod requests
Pod limits
Affinity

Health Monitoring

Guidelines:

Health checks

Logging

Guidelines:

Log location
Pod labels
Service labels

Diagnostics Tooling

Guidelines:

Image metadata
Pod labels
Service labels

Customisation

Guidelines:

Passing configuration to the application
Pod service account
Using certificates
Pod port names
Networking policy
Istio RBAC
Access outside the cluster
Using secrets
Service accounts

Upgrades

As a platform operator, I can upgrade the Kubernetes cluster with the minimum application downtime.

As a platform operator, I can upgrade Cloud Foundry with minimum control plane downtime.

Guidelines:

Health checks
Rollback
Work with signals
Resilience to cluster upgrades
Storing dependencies on the image
Image referenced by sha256
Number of containers
Number of init containers
Pod disruption budget

Security

Guidelines:

Non-root user
Storing dependencies on the image
Keeping base image up to date
Pod service account
Service account token
Pod security configuration
Pod security policy
Networking policy
Istio RBAC
Service accounts

Open Source

As an open-source contributor, I can submit a patch for the component(that passing tests).

As an open-source platform architect, I can consume the component.

Guidelines:

Image metadata
Base image
Image location
Packaging instructions
Networking policy
Istio RBAC
Access outside the cluster
Using secrets

Guidelines

Code

Health endpoint

Every component must have an endpoint with health information or crash if it is unhealthy.

Improves:

operability.
upgrades by allowing using blue-green deployments.

Log location

All logs must go in stdout/stderr

Improves:

logging

Reasons:

Kubernetes handles the stdout and stderr output of the pod. It saves the logs to the known disk location and the operator can later check the logs from the previous version of the pod for some time after the restart.
One of the security recommendations for Kubernetes applications is to use ReadOnlyRootFilesystem and as a result do not write anything on the disk.

Passing configuration to the application

The component must be able to use up-to-date configuration.

Possible solutions:

Read the config from the config files. Ideally, a conf.d style config dir so that you can compose the config from multiple configmaps and secrets. The application should check for the file changes and update its config when it changes.
The configs are immutable and the component only consumes new configuration through deploying between ConfigMaps and Secrets.
The configs are stored in config files and when the config is updated, the liveness probe starts to fail that requires Kubernetes to restart the pod.
- Problem: It is harder to achieve high availability here.
The configs are stored in config files. When the config is updated, the pod gets restarted.
- Problem: The pod requires access to Kubernetes API.
The hash of the config will be stored in the pod specification later.
- Problem: The platform engineer needs to know how to do the manual config update.
The configuration is passed via command-line flags in the pod specification
- Problem: It is not possible to use the Kubernetes secrets.

Improves:

customisation

Ability to rollback

The application with version n-1 should be able to start with the database that has been migrated to version n

Improves:

application upgrade

Reason:

It is very easy to implement blue-green deployment with the Kubernetes object and do the rollback. Ability to work with a previous version gives the platform engineer the ability to rollback using the kubectl rollout undo command.

Work with signals

The application must respect SIGTERM signal and start sending NotReady probe

Improves:

Upgrades

See also:

Termination of Pods
Closed issue in Kubernetes and the blog entry with a better explanation

Resilience to cluster upgrades

The component is either expected to work during K8s control plane downtime or has a clear notice README that to achieve high availability, the control plane of Kubernetes must have multiple replicas.

Improves:

Kubernetes cluster upgrades

Packaging

Non-root user

All components should run with a non-root user unless it is completely impossible, including “FROM scratch” images. The user UID should not be 1337 due to Istio limitations.

Improves:

security

See also:

Pod security policy

Image metadata

All components images should have labels in the metadata with repo URL and SHA of the commit it the metadata as recommended by OCI

Improves:

open source
diagnostics tooling

Reasons:

This allows the operator to check if the component has a CVE.
This also helps scanners that work off of artefact metadata to determine source code/provenance (e.g. OSL, security)

See also:

Eirini docker image as an example

Base image

The default base image should come from cloudfoundry/stacks. All components should have the possibility to change the base image

Improves:

open source

Reasons:

The base layer for the images should be the same.
Other Cloud Foundry Foundation members want to build their own base images.

Dependencies

The run image should not have packages required for compilation, only for running. i.e. don’t have Go package or JDK in the final image, For java, only JRE should be shipped. It should be possible to get a list of current dependencies installed inside.

Improves:

upgrades
resource planning
recovery
security

Reasons:

It takes more time to pull a new version of the image in case of scaling, upgrade or recovery.
Images stored on the worker node disk and might require bigger disks.
The unneeded packages require upgrading it more often.
The operator needs to know if the packages in the image up-to-date or not.

See also:

Kubernetes best practices by Google Cloud

Keeping base image up to date

Images are continuously updated with the new version of the base layer.

Improves:

security

Image location

Images are stored in the CFF organisation under Dockerhub.

Improves:

open source

Packaging instructions

The component has clear instructions on how to build its container image

Improves:

open source

Pod specification

Image referenced by sha256

The image references must always include sha256.

Reasons:

The tags in Docker registries are mutable, this can cause two different versions of the application to run on the cluster due to node restart. Sha256 provides immutable version. Both tags and versions can be used at the same time.

Improves:

Kubernetes cluster upgrades
Availability by starting the same version of the component during the recovery event

See also:

Kubernetes Configuration Best Practices

Pod labels

The component should have the labels that suggested by Kubernetes, for example, app.kubernetes.io/name, app.kubernetes.io/version. The app.kubernetes.io/part-of is set to Cloud Foundry

Improves:

logging
diagnostic tooling

Readiness probe

The readiness probe for the main container must be always present.

Improves:

availability
upgrades

Reasons:

the pod won't serve traffic until it is ready.
the rolling upgrade will wait for the pod to come up before deleting the existing pod.
if pod disruption budget is set, the node draining will proceed only if pods are ready.

See also:

Kubernetes documentation about probes

Liveness probe

The liveness probe should only fail if the application is in an unrecoverable state and has to be restarted. Ideally, the liveness probe should not be set and the application should crash. If it is present, it should point to a different endpoint than the readiness probe.

Improves:

availability

Reasons:

Failing liveness probe causes the restart of the pod - this might take more time if the pod is scheduled on the different node.
Crashing of the pod might increase pressure on the running pods.

See also:

Outage report
Liveness probes are dangerous

Number of containers

The pod must have as little containers as possible. Ideally, a single pod should have a single container. Most of native Kubernetes deployments have up to three containers.

Improves:

upgrades
availability
failure recovery
isolation
resource planning

Reasons:

All the containers are scheduled on a single node. They require more resources and slow down the start of the pod.
When one container is down, the pod won't be processing requests.
When the configuration is changed for a single container, the whole pod has to restart.

Number of init containers

The non-short-living pod must have up as little init containers as possible. Most of native Kubernetes deployments to 2 init containers in its spec.

Improves:

upgrades
failure recovery

Reasons:

The pod executes init containers sequentially and this increases the time for the pod to start. Consider using Kubernetes jobs instead.
Crashes in init containers do not show up in pod crash count.
The logs from init containers is impossible to get from the pod after the pod is started.

Pod requests

Each container in a pod must always have configurable CPU & memory requests with sane defaults(required to run 50 applications /start 5 at the same time)

Improves:

Resource Planning
Availability

Reasons:

Kubernetes blocks the resources on the node and does not allow to schedule more applications that are possible. Overcommitting slows down the deployment.

See all:

Managing Compute Resources

Pod limits

Memory limits are optional, but if they are present they must be at least 50% bigger than requests. CPU limits must be never set.

Improves:

Resource planning

Reasons:

CPU limits are broken now
Pod will restart if it uses more memory than allowed

See also:

KubeCon presentation

Pod service account

Each component must have its own service account. It must never use the default service account.

Improves:

security
customisation

Reason:

This allows attaching pod security policy to the pod.

Service account token

If the pod does not need access to the Kubernetes API, the service account token is not mounted to it

Improves:

security

Pod security configuration

The pod spec should satisfy the restricted pod security policy provided by Kubernetes

Pod should drop all capabilities.
Pod should have proper seccomd or apparmor annotation.
Pod should have property readOnlyRootFilesystem=readOnlyRootFilesystem.
Pod should set securityContext.runAsNonRoot property.

Improves:

security

See also:

Pod security policies
Non-root user for container

Using certificates

If a pod requires TLS/SSL cert/keys for public consumption it must support utilising cert-manager.

Improves:

customisation

Reason:

Kubernetes secrets have a special format for certificates and the operators expect the components to use it.

Pod port names

Ports that are exposed by pod must have a name which should be the same as in the corresponding service

Improves:

customisation

Reason:

Istio requires proper port names.

Affinity

The specification allows setting affinity and anti-affinity rules.

Improves:

availability
isolation
resource planning

See also:

Kubernetes documentation

Service specification

Using services

All pods should be part of services.

Reason:

Istio requirements.

Service pod names

The component creates a service if it has to be accessed by other components. Service ports should have the name of format <protocol>[-<suffix>], e.g. grpc-api or tcp-database. See more in Istio documentation.

Reason:

Istio requirements.

Service labels

The service must have the same labels as a pod

Improves:

logging
diagnostics

Other Kubernetes objects

Pod Disruption Budget

If the process is expected to have no downtime, it has PodDisruptionBudget

Improves:

Kubernetes cluster upgrades

Reason:

Metadata from a higher level is ignored when the node is drained. The pod disruption budget prevents too many pods from going down.

Pod Security Policy

Minimal pod security policy is provided. Ideally, it should be the same (or stricter) as Kubernetes provided policy

Improves:

Security

Networking policy

Sample networking policy is provided

Improves:

Security
Customisation
Open Source

Istio RBAC

Sample Istio RBAC is provided

Improves:

Security
Customisation
Open Source

Access outside the cluster

If the component has to be accessed externally, it writes a K8s Ingress resource or a set of Istio VirtualService + Gateway resourcesproviders ingress with free form annotations and the ability to provide a load balancer

Improves:

Customisation
Open Source

Using secrets

If the component needs some secrets, it has an option to use an existing Kubernetes object with the predefined format. The Kubernetes object name can be provided by the platform engineer. The secret for certificates uses known K8s format

Improves:

Customisation
Open Source

Service accounts

Each component creates and attached its own service account.

Improves:

Security
Customisation

#### Deployment

Each stateless component is deployed as a deployment

Replicas count

The number of replicas is not specified in the template unless it can only be deployed as a single copy.

Improves:

Availability

Work with other components

Optional parts

If the component has a soft dependency(can work without it) on another component, the depending part can be skipped. i.e. Eirini deploys with rootfs patcher, but rootfs patcher can be skipped in the deployment.

Using custom DNS addresses

The address for the dependent component can always be specified by the platform engineer and has a sane default

Improves:

Customisation

Reason:

Integration with KubeCF and CF-4-K8s requires using different DNS addresses.

Documentation

Each non-alpha property that platform engineer can specify is documented in README.

Improves:

Customisation

Kubernetes versions support

Each component is expected to support all supported by CNCF versions of Kubernetes by using correct API specification.

Improves:

upgrades

Reason:

APIs get deprecated and has to be fixed in advance.

Container runtime support

The component must support both Docker and containerd as container runtimes.

Improves:

Kubernetes cluster customisation

Reason:

Docker is widely used, containerd implements CRI(container runtime interface) and is used in several public cloud offerings.

Files

DETAILED.md

Latest commit

History

DETAILED.md

File metadata and controls

Kubernetes-idiomatic Cloud Foundry components guidelines

Structure

Wording

Production readiness criteria

Resilience

Availability

Failure Recovery

Isolation

Operability

Resource Planning

Health Monitoring

Logging

Diagnostics Tooling

Customisation

Upgrades

Security

Open Source

Guidelines

Code

Health endpoint

Log location

Passing configuration to the application

Ability to rollback

Work with signals

Resilience to cluster upgrades

Packaging

Non-root user

Image metadata

Base image

Dependencies

Keeping base image up to date

Image location

Packaging instructions

Pod specification

Image referenced by sha256

Pod labels

Readiness probe

Liveness probe

Number of containers

Number of init containers

Pod requests

Pod limits

Pod service account

Service account token

Pod security configuration

Using certificates

Pod port names

Affinity

Service specification

Using services

Service pod names

Service labels

Other Kubernetes objects

Pod Disruption Budget

Pod Security Policy

Networking policy

Istio RBAC

Access outside the cluster

Using secrets

Service accounts

Replicas count

Work with other components

Optional parts

Using custom DNS addresses

Documentation

Kubernetes versions support

Container runtime support