Skip to content

Latest commit

 

History

History
765 lines (464 loc) · 22.9 KB

DETAILED.md

File metadata and controls

765 lines (464 loc) · 22.9 KB

Kubernetes-idiomatic Cloud Foundry components guidelines

Also check shorter version

Structure

The document consists of two big parts: criteria grouped in themes and guidelines that solving some of the problems the criteria defined. The document intends to help developers to create a component that the platform engineer would be able to operate using existing Kubernetes patterns.

Wording

Component - a single application. Platform engineer - a person responsible for deploying and operating Cloud Foundry.

Production readiness criteria

Resilience

Ability to provide an acceptable level of service in the face of faults.

Availability

Guidelines:

Failure Recovery

Guidelines:

Isolation

Guidelines:

Operability

Resource Planning

Guidelines:

Health Monitoring

Guidelines:

Logging

Guidelines:

Diagnostics Tooling

Guidelines:

Customisation

Guidelines:

Upgrades

As a platform operator, I can upgrade the Kubernetes cluster with the minimum application downtime.

As a platform operator, I can upgrade Cloud Foundry with minimum control plane downtime.

Guidelines:

Security

Guidelines:

Open Source

As an open-source contributor, I can submit a patch for the component(that passing tests).

As an open-source platform architect, I can consume the component.

Guidelines:

Guidelines

Code

Health endpoint

Every component must have an endpoint with health information or crash if it is unhealthy.

Improves:

Read more:

Log location

All logs must go in stdout/stderr

Improves:

Reasons:

  • Kubernetes handles the stdout and stderr output of the pod. It saves the logs to the known disk location and the operator can later check the logs from the previous version of the pod for some time after the restart.
  • One of the security recommendations for Kubernetes applications is to use ReadOnlyRootFilesystem and as a result do not write anything on the disk.

Passing configuration to the application

The component must be able to use up-to-date configuration.

Possible solutions:

  • Read the config from the config files. Ideally, a conf.d style config dir so that you can compose the config from multiple configmaps and secrets. The application should check for the file changes and update its config when it changes.

  • The configs are immutable and the component only consumes new configuration through deploying between ConfigMaps and Secrets.

  • The configs are stored in config files and when the config is updated, the liveness probe starts to fail that requires Kubernetes to restart the pod.

    • Problem: It is harder to achieve high availability here.
  • The configs are stored in config files. When the config is updated, the pod gets restarted.

    • Problem: The pod requires access to Kubernetes API.
  • The hash of the config will be stored in the pod specification later.

    • Problem: The platform engineer needs to know how to do the manual config update.
  • The configuration is passed via command-line flags in the pod specification

    • Problem: It is not possible to use the Kubernetes secrets.

Improves:

Ability to rollback

The application with version n-1 should be able to start with the database that has been migrated to version n

Improves:

Reason:

It is very easy to implement blue-green deployment with the Kubernetes object and do the rollback. Ability to work with a previous version gives the platform engineer the ability to rollback using the kubectl rollout undo command.

See also:

Kubernetes documentation about deployments

Work with signals

The application must respect SIGTERM signal and start sending NotReady probe

Improves:

See also:

Resilience to cluster upgrades

The component is either expected to work during K8s control plane downtime or has a clear notice README that to achieve high availability, the control plane of Kubernetes must have multiple replicas.

Improves:

Packaging

Non-root user

All components should run with a non-root user unless it is completely impossible, including “FROM scratch” images. The user UID should not be 1337 due to Istio limitations.

Improves:

See also:

Pod security policy

Image metadata

All components images should have labels in the metadata with repo URL and SHA of the commit it the metadata as recommended by OCI

Improves:

Reasons:

  • This allows the operator to check if the component has a CVE.
  • This also helps scanners that work off of artefact metadata to determine source code/provenance (e.g. OSL, security)

See also:

Eirini docker image as an example

Base image

The default base image should come from cloudfoundry/stacks. All components should have the possibility to change the base image

Improves:

Reasons:

  • The base layer for the images should be the same.
  • Other Cloud Foundry Foundation members want to build their own base images.

Dependencies

The run image should not have packages required for compilation, only for running. i.e. don’t have Go package or JDK in the final image, For java, only JRE should be shipped. It should be possible to get a list of current dependencies installed inside.

Improves:

Reasons:

  • It takes more time to pull a new version of the image in case of scaling, upgrade or recovery.
  • Images stored on the worker node disk and might require bigger disks.
  • The unneeded packages require upgrading it more often.
  • The operator needs to know if the packages in the image up-to-date or not.

See also:

Kubernetes best practices by Google Cloud

Keeping base image up to date

Images are continuously updated with the new version of the base layer.

Improves:

Image location

Images are stored in the CFF organisation under Dockerhub.

Improves:

Packaging instructions

The component has clear instructions on how to build its container image

Improves:

Pod specification

Image referenced by sha256

The image references must always include sha256.

Reasons:

The tags in Docker registries are mutable, this can cause two different versions of the application to run on the cluster due to node restart. Sha256 provides immutable version. Both tags and versions can be used at the same time.

Improves:

See also:

Kubernetes Configuration Best Practices

Pod labels

The component should have the labels that suggested by Kubernetes, for example, app.kubernetes.io/name, app.kubernetes.io/version. The app.kubernetes.io/part-of is set to Cloud Foundry

Improves:

Readiness probe

The readiness probe for the main container must be always present.

Improves:

Reasons:

  • the pod won't serve traffic until it is ready.
  • the rolling upgrade will wait for the pod to come up before deleting the existing pod.
  • if pod disruption budget is set, the node draining will proceed only if pods are ready.

See also:

Kubernetes documentation about probes

Liveness probe

The liveness probe should only fail if the application is in an unrecoverable state and has to be restarted. Ideally, the liveness probe should not be set and the application should crash. If it is present, it should point to a different endpoint than the readiness probe.

Improves:

Reasons:

  • Failing liveness probe causes the restart of the pod - this might take more time if the pod is scheduled on the different node.
  • Crashing of the pod might increase pressure on the running pods.

See also:

Number of containers

The pod must have as little containers as possible. Ideally, a single pod should have a single container. Most of native Kubernetes deployments have up to three containers.

Improves:

Reasons:

  • All the containers are scheduled on a single node. They require more resources and slow down the start of the pod.
  • When one container is down, the pod won't be processing requests.
  • When the configuration is changed for a single container, the whole pod has to restart.

Number of init containers

The non-short-living pod must have up as little init containers as possible. Most of native Kubernetes deployments to 2 init containers in its spec.

Improves:

Reasons:

  • The pod executes init containers sequentially and this increases the time for the pod to start. Consider using Kubernetes jobs instead.
  • Crashes in init containers do not show up in pod crash count.
  • The logs from init containers is impossible to get from the pod after the pod is started.

Pod requests

Each container in a pod must always have configurable CPU & memory requests with sane defaults(required to run 50 applications /start 5 at the same time)

Improves:

Reasons:

  • Kubernetes blocks the resources on the node and does not allow to schedule more applications that are possible. Overcommitting slows down the deployment.

See all:

Managing Compute Resources

Pod limits

Memory limits are optional, but if they are present they must be at least 50% bigger than requests. CPU limits must be never set.

Improves:

Reasons:

See also:

Pod service account

Each component must have its own service account. It must never use the default service account.

Improves:

Reason:

This allows attaching pod security policy to the pod.

Service account token

If the pod does not need access to the Kubernetes API, the service account token is not mounted to it

Improves:

Pod security configuration

The pod spec should satisfy the restricted pod security policy provided by Kubernetes

  • Pod should drop all capabilities.
  • Pod should have proper seccomd or apparmor annotation.
  • Pod should have property readOnlyRootFilesystem=readOnlyRootFilesystem.
  • Pod should set securityContext.runAsNonRoot property.

Improves:

See also:

Using certificates

If a pod requires TLS/SSL cert/keys for public consumption it must support utilising cert-manager.

Improves:

Reason:

Kubernetes secrets have a special format for certificates and the operators expect the components to use it.

Pod port names

Ports that are exposed by pod must have a name which should be the same as in the corresponding service

Improves:

Reason:

Istio requires proper port names.

Affinity

The specification allows setting affinity and anti-affinity rules.

Improves:

See also:

Kubernetes documentation

Service specification

Using services

All pods should be part of services.

Reason:

Service pod names

The component creates a service if it has to be accessed by other components. Service ports should have the name of format <protocol>[-<suffix>], e.g. grpc-api or tcp-database. See more in Istio documentation.

Reason:

Service labels

The service must have the same labels as a pod

Improves:

Other Kubernetes objects

Pod Disruption Budget

If the process is expected to have no downtime, it has PodDisruptionBudget

Improves:

Reason:

Metadata from a higher level is ignored when the node is drained. The pod disruption budget prevents too many pods from going down.

See also:

Kubernetes documentation about disruptions

Pod Security Policy

Minimal pod security policy is provided. Ideally, it should be the same (or stricter) as Kubernetes provided policy

Improves:

Networking policy

Sample networking policy is provided

Improves:

Istio RBAC

Sample Istio RBAC is provided

Improves:

Access outside the cluster

If the component has to be accessed externally, it writes a K8s Ingress resource or a set of Istio VirtualService + Gateway resourcesproviders ingress with free form annotations and the ability to provide a load balancer

Improves:

Using secrets

If the component needs some secrets, it has an option to use an existing Kubernetes object with the predefined format. The Kubernetes object name can be provided by the platform engineer. The secret for certificates uses known K8s format

Improves:

Service accounts

Each component creates and attached its own service account.

Improves:

#### Deployment

Each stateless component is deployed as a deployment

Replicas count

The number of replicas is not specified in the template unless it can only be deployed as a single copy.

Improves:

Work with other components

Optional parts

If the component has a soft dependency(can work without it) on another component, the depending part can be skipped. i.e. Eirini deploys with rootfs patcher, but rootfs patcher can be skipped in the deployment.

Using custom DNS addresses

The address for the dependent component can always be specified by the platform engineer and has a sane default

Improves:

Reason:

Integration with KubeCF and CF-4-K8s requires using different DNS addresses.

Documentation

Each non-alpha property that platform engineer can specify is documented in README.

Improves:

Kubernetes versions support

Each component is expected to support all supported by CNCF versions of Kubernetes by using correct API specification.

Improves:

Reason:

APIs get deprecated and has to be fixed in advance.

Container runtime support

The component must support both Docker and containerd as container runtimes.

Improves:

Reason:

Docker is widely used, containerd implements CRI(container runtime interface) and is used in several public cloud offerings.