HealthCheck-CoreDNS

A binary and packages to perform health checks on pods and services running on Kubernetes and to remedy any failures.

push event

Objective

CoreDNS is the cluster DNS server for Kubernetes and is very critical for the overall health of the Kubernetes cluster. It is important to monitor the health of CoreDNS itself and restarting or repairing any CoreDNS pods that are not behaving correctly.

While CoreDNS exposes a health check itself in the form of Kubernetes’ livenessProbe:

The health check is not UDP (DNS) based. There have been cases where the health port is accessible (for TCP) but CoreDNS itself isn't (UDP). This protocol difference means that CoreDNS is unhealthy from a cluster standpoint, but the control plane can't see this.
The existing health check is also launched locally (the kubelet uses the livenessProbe) and the situation could be different for pods accessing it remotely.

Motivation and Scope

This project idea aims to get around limitations on Kubernetes’ health check and build an application that:

Checks CoreDNS health externally through UDP (DNS), from a remote Golang application.
Remedy CoreDNS pods which involves restarting, increasing memory limits, changing network configuration and more, if the response from the cluster and pod IPs is unsatisfactory. This is done by interacting with Kubernetes API through the Golang application.

Thus, making the state of CoreDNS available externally and reliably is important to ensure important services run as they are expected to.

Architecture

The binary is designed to access all network namespaces and run commands in them. As such, it needs to be deployed on every node, just like a CNI plugin. This can be achieved through a DaemonSet.

Inside a node, there exist different pods bound to their respective namespaces. The binary is deployed on the host network and is thus on the root network namespace.

Workflow

Firstly, the binary queries the CoreDNS pods from the host network namespace and checks the response.

If the response received is unsatisfactory, then the pods are restarted, or the memory limit is increased if restarting doesn't help.
If the response is correct, the binary proceeds to list all network namespaces on the host and starts a session in each, one by one.

Entering a namespace

To accomplish this, we need the PIDs of the container so that the network namespaces can be located on the host's /proc directory.

One way to get the PIDs is to use the following command.
```
pid = "$(docker inspect -f '{{.State.Pid}}' "container_id")"
```
This however, requires that the container be able to run docker commands on the host. The way this application approaches this problem is:
1. Mounting the docker daemon from the host to the container (/var/run/docker.sock)
2. Installing the Docker CLI as part of the Dockerfile. This CLI is necessary to communicate with the daemon.
3. Mounting the host's /proc directory on the container. This way, we have to the network namespaces corresponding to the different pids.
We can use the path to the network namespace to obtain an NetNS object from the package ns. This object has a function Do() that can be used to execute functions inside other namespaces.

Thus, the binary now queries the CoreDNS pods from every namespace to check the DNS availability.

The following is a screenshot from a trial run to test this functionality. We can see that the two pod IPs and one service IPs are queried. The logs serve to show the namespace in which the operations are performed.

If the service is unavailable from any namespace, then a series of checks are performed:
- etc/resolv.conf file is inspected to check if the nameserver is the correct one (corresponding to the DNS service).
Other measures are proposed but not yet implemented. These include:
- Checking if the route table has an entry for a DefaultGateway and if its the one corresponding to the bridge.
- Do virtual interfaces (veth) exist and are they connected to the bridge?
- Are iptables rules restricting connections?

Prometheus Exporter

A exporter library is implemented at pkg/exporter that takes values from the application and registers them with Prometheus using the golang client.

The exporter will help determine the number of times the remedies were required, how often the pods failed, the primary reasons for the failures (ascertained by the type of remedy that fixed it) among other things.

Available Groups Of Data

Name	Description
remedy	This group has metrics related to the remedial measures taken when the pods fail, such as restarting pods or increasing memory allocation.
dns	This group has metrics that deal with dns queries made by the application and their response.

Remedy group has the following available metrics:

Name	Metric Type	Exposed Information
`oom_count`	Counter	Counts the number of OOM errors encountered
`restart_count`	Counter	Counts the number of restarts performed on the pods
`total_failures`	Counter	Counts the total number of failures of the pods under check

DNS group has the following available metrics:

Name	Metric Type	Exposed Information
`dns_query_count`	Counter	Counts the number of DNS queries made
`dns_query_response_time`	Histogram	The time it takes to finish dns queries

Port

The metrics are exposed on the port 9890 of the pod's IP address. The endpoint is /metrics.

A sample call could be done in the following manner:

curl 10.244.0.13:9890/metrics

Visuals

A sample run with an orchestrated OOM error:

This shows the number of restarts that fixed the pods and the total number of failures.

More visuals will be added as tests proceed.

Deployment

The application can be deployed either inside a Kubernetes cluster or outside it. Even inside a cluster, it can be deployed as a DaemonSet so that it runs on all nodes or it can be deployed on a single node too. The driving principle behind this binary is that it should function gracefully under all conditions.

When the deployment is done as a pod in a cluster, no flags need to be used.

When deploying externally, the kubeconfig file path has to be provided so that authentication with the api-server can be done.

Additionally, two flags can be set as the need be:

allowPods : boolean value that states whether pod creation inside the cluster is allowed.
udpPort : If CoreDNS pods are using some port other than port 53, specify that here.

Docker

The Dockerfile is present at the root directory and an image is also pushed to the DockerHub at wjayesh/health.

To deploy the application on Docker, use the following command:

docker run wjayesh/health:latest -path=PATH -allowPods=BOOL -udpPort=PORT

Kubernetes

The image from wjayesh/health can be used to create Pods. A basic YAML description is provided below.

apiVersion: v1
kind: Pod
metadata:
  name: health-check
  labels:
    target: coredns-deployment
spec:
  hostNetwork: true    # to have access to all netns
  dnsPolicy: ClusterFirstWithHostNet  # needs to be set explicitly 
   containers:
   - image: wjayesh/health:latest
     name: health-check-container
     args: ["-path=PATH", "-allowPods=BOOL", "-port=PORT"]  # fill these values or skip line if deployed 
     volumeMounts:                                          # on the same cluster as the pods under check
     - mountPath: /hostProc  
       name: proc
     - mountPath: /var/run/docker.sock
       name: docker-sock
       readOnly: false
     - mountPath: /var/lib/docker
       name: docker-directory
       readOnly: false
     securityContext:
       privileged: true   # to run docker command for finding PIDs of containers whose ns to enter
   volumes:
   - name: proc   # the net ns will be located at /proc/pid/ns/net
     hostPath:
       path: /proc     
   - name: docker-sock   # using the docker daemon of the host 
     hostPath:
       path: "/var/run/docker.sock"
       type: File
   - name: docker-directory  # contains all images and other info
     hostPath:
       path: "/var/lib/docker"
  restartPolicy: OnFailure

This definition can be used in a DaemonSet. An example:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: coredns-hc
  namespace: kube-system
  labels:
    k8s-app: health-check
spec:
  selector:
    matchLabels:
      target: coredns-deployment
  template:
    metadata:
      labels:
        target: coredns-deployment
    spec:
      tolerations:
      # this toleration is to have the daemonset runnable on master nodes
      # remove it if your masters can't run pods
      - key: node-role.kubernetes.io/master
        effect: NoSchedule
      hostNetwork: true
      dnsPolicy: ClusterFirstWithHostNet  # needs to be set explicitly 
      containers:
      - image: wjayesh/health:latest
        name: health-check-container
        args: ["-path=PATH", "-allowPods=BOOL", "-udpPort=PORT"]
        volumeMounts:
        - mountPath: /hostProc  
          name: proc
        - mountPath: /var/run/docker.sock
          name: docker-sock
          readOnly: false
        - mountPath: /var/lib/docker
          name: docker-directory
          readOnly: false
        securityContext:
          privileged: true   # to run docker command for finding PIDs of containers whose ns to enter
      volumes:
      - name: proc   # the net ns will be located at /proc/pid/ns/net
        hostPath:
          path: /proc     
      - name: docker-sock   # using the docker daemon of the host 
        hostPath:
          path: "/var/run/docker.sock"
          type: File
      - name: docker-directory  # contains all images and other info
        hostPath:
          path: "/var/lib/docker"

Note

The DNS ploicy needs to be set to ClusterFirstWithHostNet explicitly. Reference: https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#pod-s-dns-policy
To allow the application to access the namespaces (specifically to first get the PIDs used to locate the network ns), it has to be run as a privileged pod. If creation of the pod fails in your cluster, check whether you have a PodSecurityPolicy that allows privileged pods.
Keep in mind that you cannot use environment variables like "$(PORT)" as identifiers inside the args field. This is because there is no shell being run in the container and your variables won't resolve to their values.
Make sure your service account has a role that can access the services, pods and deployment resources of your cluster. An example ClusterRole with some privileges is shown below.

kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  namespace: kube-system
  name: health-manager
rules:
- apiGroups: [""] # "" indicates the core API group
  resources: ["services", "pods"]
  verbs: ["get", "watch", "list", "create", "update", "patch",  "delete"]
- apiGroups: ["extensions", "apps"]
  resources: ["deployments"]
  verbs: ["get", "watch", "list", "create", "update", "patch",  "delete"]

This cluster role can be bound to your default service account in the default namespace as follows:

kubectl create clusterrolebinding health-role-pod \
--clusterrole=health-manager  \
--serviceaccount=default:default

Milestones ✨

Here I will list the milestones achieved in sync with the tasks done on the project board.

Connection to the api-server established on an AKS cluster.
Service and Pod IPs successfully retrieved.
Restarting CoreDNS pods through the binary. The logs shows the pods to be deleted.

The condition of invalid output has been harcoded in order to force a restart, for testing purposes.

We can see that new pods have been created.
Functionality of dig replicated using miekg/exdns/q inside the health-check app. The first two IPs belong to the CoreDNS pods. The third is the ClusterIP for the kube-dns service.

I have selected the kubernetes.default service to test the DNS response.
Doubling the memory allocation for the CoreDNS pods.

At first, the memory limit is 170Mi

Later, it is doubled to 340Mi. This was triggered when multiple subsequent restarts failed to make the pods healthy.

Name		Name	Last commit message	Last commit date
Latest commit History 292 Commits
.github/workflows		.github/workflows
assets/docs/images		assets/docs/images
cmd		cmd
pkg		pkg
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
VERSION		VERSION
go.mod		go.mod
go.sum		go.sum

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HealthCheck-CoreDNS

Contents

Objective

Motivation and Scope

Architecture

Workflow

Entering a namespace

Prometheus Exporter

Available Groups Of Data

Port

Visuals

Deployment

Docker

Kubernetes

Note

Milestones ✨

About

Releases 2

Packages

Languages

License

wjayesh/coredns-healthcheck

Folders and files

Latest commit

History

Repository files navigation

HealthCheck-CoreDNS

Contents

Objective

Motivation and Scope

Architecture

Workflow

Entering a namespace

Prometheus Exporter

Available Groups Of Data

Port

Visuals

Deployment

Docker

Kubernetes

Note

Milestones ✨

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages