[ macOS/ARM64 | Linux/AMD64 ]
Previous: Installing Essential Cluster Services
In this extra chapter we'll look at Cilium, a project that will allow us to simplify our Kubernetes deployment, especially its networking part.
Cilium's distinguishing feature is the use of eBPF, a Linux kernel technology which allows
injecting programs into various, predefined hooks within the kernel. It is especially useful for manipulating
network traffic - in an efficient, low level and safe way. As a result, Cilium can be used to completely replace
iptables
based solutions.
Table of Contents generated with DocToc
- Benefits from using Cilium
- Removing stuff to be replaced with Cilium
- Installing Cilium
- Cleaning up
- Summary
- What's next?
Installing Cilium will result in the following changes in our Kubernetes deployment:
- The default CNI plugins and their
iptables
usage will be removed kube-proxy
will be removed- Routes to pod CIDRs will no longer be necessary, thanks to Cilium's overlay network using tunnelling
br_netfilter
kernel module will no longer be needed, thanks to not relying oniptables
anymore
Since we are going to reconfigure networking in our deployment from scratch, it would be best to uninstall everything from the cluster, so that there are no pods left. Alternatively, you can let them run and restart them after Cilium is fully set up.
First, let's get rid of kube-proxy
. At minimum, run this on all control and worker nodes:
sudo systemctl stop kube-proxy
sudo systemctl disable kube-proxy
Then, get rid of the old CNI plugin configurations:
sudo rm /etc/cni/net.d/*.conf
and restart kubelet
:
sudo systemctl restart kubelet
After CNI is gone and kubelet
is restarted, it will detect that something is wrong and taint the nodes as
not-ready
:
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
control0 NotReady <none> 13m v1.28.3 192.168.1.11 <none> Ubuntu 22.04.3 LTS 5.15.0-83-generic containerd://1.7.7
control1 NotReady <none> 13m v1.28.3 192.168.1.12 <none> Ubuntu 22.04.3 LTS 5.15.0-83-generic containerd://1.7.7
control2 NotReady <none> 13m v1.28.3 192.168.1.13 <none> Ubuntu 22.04.3 LTS 5.15.0-83-generic containerd://1.7.7
worker0 NotReady <none> 13m v1.28.3 192.168.1.14 <none> Ubuntu 22.04.3 LTS 5.15.0-83-generic containerd://1.7.7
worker1 NotReady <none> 13m v1.28.3 192.168.1.15 <none> Ubuntu 22.04.3 LTS 5.15.0-83-generic containerd://1.7.7
worker2 NotReady <none> 13m v1.28.3 192.168.1.16 <none> Ubuntu 22.04.3 LTS 5.15.0-83-generic containerd://1.7.7
This is a good starting point for installing Cilium. Alternatively, you can leave old CNI config files in place and simply let them be replaced by Cilium (which will copy them into backup files).
Let's install Cilium. Here's what it consists of:
cilium-operator
This is an operator, i.e. a service that watches various Kubernetes resources and reacts accordingly when they change.cilium-agent
This is aDaemonSet
, i.e. a special type of Kubernetes workload that runs a single pod on every node. Daemon sets are often used by "infrastructural" services to configure and/or monitor nodes. In case of cilium,agent
pods are a very privileged ones - they run directly in host network and have access to nodes' filesystems. This way they can configure worker nodes (i.e. set up Cilium's CNI plugin, inject eBPF programs). Apart from running as almost un-containerized processes,agent
pods also have tolerations, which make them run unconditionally on all nodes, including the ones tainted asnot-ready
andcontrol-plane
. This stripping of isolation seems quite "hacky", as Kubernetes is designed to run properly containerized (i.e. isolated) workloads, andcilium-agent
is anything but that. Creating aDaemonSet
like this one is just a convenient way of automating low-level configuration of Kubernetes nodes.- eBPF programs - they effectively implement the CNI as well as replace
kube-proxy
. They are installed bycilium-agent
pods. - CNI plugin - a CNI plugin for
kubelet
to interact with. This is likely just a thin layer over eBPF programs, which do all the heavy lifting. It is installed and configured bycilium-agent
pods.
Usage of cilium-agent
for configuring node networking simplifies Cilium installation, as it obviates the necessity
of SSHing into nodes and configuring the CNI manually.
helm repo add cilium https://helm.cilium.io/
helm install -n kube-system cilium cilium/cilium \
--set kubeProxyReplacement=true \
--set k8sServiceHost=kubernetes \
--set k8sServicePort=6443 \
--set cgroup.automount.enabled=false \
--set cgroup.hostRoot=/sys/fs/cgroup
Use kubectl get pod -n kube-system --watch
to see Cilium starting up.
Wait until all of cilium-*
and cilium-operator-*
are up and running.
$ kubectl get pods -n kube-system -o wide | grep cilium
cilium-4hnmx 1/1 Running 0 12m 192.168.1.13 control2 <none> <none>
cilium-6s7sg 1/1 Running 0 12m 192.168.1.11 control0 <none> <none>
cilium-84csz 1/1 Running 0 12m 192.168.1.15 worker1 <none> <none>
cilium-gnrnq 1/1 Running 0 12m 192.168.1.14 worker0 <none> <none>
cilium-operator-b78cfddc-ht829 1/1 Running 0 12m 192.168.1.13 control2 <none> <none>
cilium-operator-b78cfddc-vz7gv 1/1 Running 0 4m31s 192.168.1.11 control0 <none> <none>
cilium-rk84w 1/1 Running 0 12m 192.168.1.12 control1 <none> <none>
cilium-wl49h 1/1 Running 0 12m 192.168.1.16 worker2 <none> <none>
As you can see, IPs of these pods are equal to their nodes' IPs,
which indicates that they are running with hostNetwork: true
setting.
After Cilium is installed, we must restart all the pods in the cluster that were started with the previous CNI setup.
The easiest way to do it is simply by deleting them, and letting them be respawned by their corresponding controllers
(e.g. a Deployment
or StatefulSet
). Restarted pods should get new IP addresses that follow a new CIDR scheme,
managed by Cilium.
Cilium implements Kubernetes overlay network with IP encapsulation between nodes, and with eBPF programs instead of
iptables
. This means that some previously used tricks are no longer necessary. Namely:
- We can delete routes to pod subnets from the host machine
- We can disable the
br_netfilter
module on all nodes
In this chapter, we have replaced the CNI and kube-proxy
with Cilium project, allowing for simpler and
more lightweight networking in our cluster.
This is the end of the guide. We have successfully reached a point where our local Kubernetes deployment mimics a real world production deployment's as much as possible, allowing it to run typical workloads, including stateful and externally-exposed services. We have focused primarily on things that are normally available in a proper cloud environment, but must be more or less simulated when running K8S on a local machine.
There's a ton of other "infrastructural" components that a real-world Kubernetes cluster typically runs, including a monitoring subsystem (e.g. Prometheus), a logging stack (e.g. ELK or Loki), a service mesh, and many, many others.