On limited docker hosts, such as CoreOS or Atomic, one cannot easily install utilities that are necessary to mount various filesystems such as Gluster. This repository contains a proof of concept how to make it possible.
We cannot install e.g. GlusterFS tools on the host, where Kubernetes runs. The only other option is to have a container with these tools. And we must make sure that the Gluster volume we mount inside the container is visible also to the host. Huamin Chen created a little container that makes it possible using very dirty tricks. This repository uses his work and just adds some tunables to the container.
So, we have a container, that can mount GlusterFS and the host can see it. Now we must make Kubernetes to use this container. Traditionally, Kubernetes just calls mount -t glusterfs <what> <where>
. We need it to use docker exec <mount container> mount -t glusterfs <what> <where>
, and only for GlusterFS, all other filesystems should use plain mount
(we do have mount.nfs
on Atomic).
- mounter container is in container directory.
- Kubernetes changes are in my Kubernetes branch.
The mount container has installed mount.glusterfs
and "magic" mymount.so
registered in /etc/ld.so.preload
. It works in this way:
- Kubernetes starts the mounter container (see below) as privileged. We need these privileges in step 4. to do
nsenter()
and eventuallymount()
. - Whenever Kubernetes need to mount GlusterFS, it calls
docker exec <mounter container ID> mount -t glusterfs <what> <where>
- Inside the container,
/bin/mount
findsmount.glusterfs
and prepares everything for the mount as usual. - When
/bin/mount
callsmount()
syscall,mymount.so
, registered in/etc/ld.so.preload
, catches it and enters the host mount namespace. Then it callsmount()
syscall from there. As result, GlusterFS is mounted to the host mount namespace. mount.glusterfs
starts a fuse daemon, processing GlusterFS stuff. This daemon runs inside the mounter container.- Whenever Kubernetes decide to tear down the volume, it just unmounts appropriate directory in the host namespace (standard
/bin/umount <what>
call). The fuse daemon inside the mounter container dies automatically.
This is of course very hackish solution, it would be better to patch docker not to create a new mount namespace for this mounter container and throw away our ld.so.preload
trick.
The daemon, resposible for mounting volumes is kubelet
. It runs on every node, mounts volumes and starts pods and is fully controlled by Kubernetes API server.
We need kubelet
to start a mounter container instance on every node. There are many ways how to do it, we use static containers in this proof of concept. On every node in Kubernetes cluster we need:
-
Compile Kubernetes from my branch.
-
Create e.g.
/etc/kubelet.d
directory for static pods. The directory name does not matter, as long as the same name is used in subsequent steps.$ mkdir /etc/kubelet.d
-
Create a static pod with our mounter container there. The mounter container needs access to host's
/var/lib/kubelet
(that's where we will mount stuff) and/proc
(to find/proc/1/ns
, necessary fornseneter()
call).cat <<EOF >/etc/kubelet.d/mounter.yaml kind: Pod apiVersion: v1beta3 metadata: name: mounter labels: name: glusterfs-mounter spec: containers: - name: mounter image: jsafrane/glusterfs-mounter env: - name: "HOSTPROCPATH" value: "/host" privileged: true volumeMounts: - name: var mountPath: /var/lib/kubelet/ - name: proc mountPath: /host/proc volumes: - name: var hostPath: path: /var/lib/kubelet - name: proc hostPath: path: /proc EOF
-
Configure
kubelet
daemon. On Atomic host, edit/etc/kubernetes/kubelet
and add--config=...
option to set directory with static pods and--volume-mounter=container --mount-container=...
to instructkubelet
to use a container named 'mounter' in a pod with label 'name=glusterfs-mounter' to mount GlusterFS (and plain/bin/mount
for everything else).$ vi /etc/kubernetes/kubelet ... KUBELET_ARGS="--config=/etc/kubelet.d --volume-mounter=container --mount-container=glusterfs:name=glusterfs-mounter:mounter --cluster_dns=10.254.0.10 --cluster_domain=kube.local"
-
Restart
kubelet
and wait until it downloads the mounter container image. It should start it in a minute or so, it will be visible indocker ps
output on the node or as mirror pod inkubectl get pods
output.$ service kubelet restart $ docker ps CONTAINER ID IMAGE COMMAND d3a0318e9f56 jsafrane/glusterfs-mounter:latest "/bin/sh -c /sleep.s" ...
-
Everything is set up now, you can start creating pods with GlusterFS volumes and Kubernetes should use the container to mount them.
This code is just a proof of concept. Some steps may be necessary to make it production ready:
- Don't use
ld.so.preload
trick to mount volumes from inside the container to the host mount namespace. Perhaps a newdocker run
option? Still, the container needs to be privileged to allow mounting... - How to start the mount pods on all nodes?
- We can use either static pods, i.e. distribute configuration files to all nodes during node setup (Puppet, Ansible, ...)
- Develop a replication controller that wil start a pod on all nodes. There are several proposals at kubernetes/kubernetes#1518
- Think hard about security. Anyone can label a pod with 'name=glusterfs-mounter'. Can it be used to steal data?
- Combine it with NsenterMounter when kubelet itself runs inside a container. Now
kubelet
does plain/bin/unmount
to destroy volumes, we may neednsenter /bin/umount
.