Mission: Health Check

Table of Contents

Description
User Problem
Concepts and Architectural Patterns
Prerequisites
Use Case
Acceptance Criteria
Integration Requirements
Tags
Notes
Approval

ID	Short Name
`104`	`health-check`

Description

The purpose of this use case is to demonstrate how the Kubernetes health check works in order to determine if a container is still alive (= liveness) and ready to serve (= readiness) the traffic for the application’s HTTP endpoints.

To demonstrate this behavior, we will configure a /health HTTP endpoint which is used by Kubernetes to issue HTTP requests. If the container is still alive, as the Health HTTP endpoint is able to reply, the management platform will receive 200 as return code and then no further action is required.

But, if the HTTP endpoint doesn’t return a response (JVM no longer running, thread blocked, etc), then the platform will kill the pod and recreate a new container to restart the application.

As the pod will be down for a certain period of time, we will be able to show that the endpoint exposing the service is no longer available; in this case, an HTTP 503 response will be returned. The user gets this return code from the Kubernetes proxy; the management platform has detected that the endpoint used to check if the container is ready to serve the traffic can’t reply. By consequence, the IP address and port of the server exposing the service will be removed from the Kubernetes proxy.

User Problem

When an application is deployed top of OpenShift/Kubernetes it is important to figure out if each container is available and able to serve incoming requests. By implementing the health-check pattern, it becomes possible to monitor the health of the container and whether it is able to serve traffic.

Concepts and Architectural Patterns

Health Check using Liveness (= process is alive, JVM started) and Readiness (= ready to serve traffic) probes
Fail-over
Resilience

Prerequisites

The runtime (SpringBoot, Swarm, Vert.x) provides the code or the jar file containing the /health endpoint .

Use Case

Success scenario

The use case starts when the application has been deployed into OpenShift. The user can access the application using a web page provided by the application where the following scenario will be proposed:

Click on the greeting service button to call api/greeting
Verify that a JSON response message is received:
```
{"content": "Hello, World!"}
```
Click on the button /api/killme and wait till you will get a response timeout message displayed.
Click again on the greeting service button.
Verify that you will now get a HTTP 503 response which means that the service has been removed by Kubernetes as the pod is killed and readiness probe can’t reply.
Wait a sufficient amount of time to let the time to Kubernetes to detect that the pod is killed to recreate a new one. This value corresponds to the parameter “periodSeconds”.
Click the /api/greeting button again.
Verify that a JSON response message is received as expected {"content": "Hello, World!"}

Alternate scenario

Open a Unix/Windows Terminal
Retrieve the URL address of the route exposing the service /api/greeting from the OpenShift web console, or by using the OpenShift oc client and the command
```
oc get route/${artifactId}
```
Call the greeting service using the curl client with the command
```
curl http://<HOST_PORT_ADDRESS>/api/greeting
```
Verify that t a JSON response message is received
```
{"content": "Hello, World!"}
```
Issue another curl request in order to call the HTTP endpoint responsible to kill the server (or make the response time of the server longer than the probe value expected).
```
curl http://<HOST_PORT_ADDRESS>/api/killme
```
Call the REST endpoint exposing the greeting service to verify that you will now get a HTTP 503 response which means that the service has been removed by Kubernetes as the pod is killed and readiness probe can’t reply.
Wait a sufficient amount of time to let the time to Kubernetes to detect that the pod is killed to recreate a new one. This value corresponds to the parameter “periodSeconds”
Call the greeting service using the curl client and the following request
```
curl http://<HOST_PORT_ADDRESS>/api/greeting
```
Verify that a JSON response message is received as expected
```
{"content": "Hello, World!"}
```

Call the /health endpoint to get a HTTP 200 response but also the status of the health endpoint {"status":"UP"}

curl http://<HOST_PORT_ADDRESS>/health

Note	The steps 1. to 10. don’t render visually what happens behind the scenes when Kubernetes triggers if the pod is ready/alive, remove the endpoint from the Kubernetes API gateway and recreate it.

A more dynamic approach could be developed to include a video like this one: https://www.dropbox.com/s/j5747pwkzfj5o7m/kube-liveness-readiness.mov?dl=0

with the step-by-step instructions as described previously.

Acceptance Criteria

During nominal work, a curl or http request issued against the following service $protocol://$hostname:$port/api/greeting returns { "content": "Hello, World!"} If the pod is killed (and during a period of x seconds), the same request will get as response a HTTP 503 - unavailable response

Vert.x-specific Acceptance Criteria

Swarm-specific Acceptance Criteria

Swarm uses it’s internal feature to “suspend” the server as means to simulate a non-responsive service.

Boot-specific Acceptance Criteria

Integration Requirements

Notes

The use case will consist of:

Develop a HTTP application which expose 3 endpoints; a /api/greeting, /health and a /api/killme. The greeting endpoint will return a json Hello World message while the killme endpoint will be used to stop the server. Create a deployment.yaml file under the directory src/main/fabric8. It will contain the definition of the readiness & liveness probes. They both will setup the endpoint /health under the port 8080. The initial delay like the period & threshold will be defined as follows:

...
livenessProbe:
  failureThreshold: 3
  httpGet:
    path: /health
    port: 8080
    scheme: HTTP
  initialDelaySeconds: 180
  periodSeconds: 10
  successThreshold: 1
  timeoutSeconds: 1
...
readinessProbe:
  failureThreshold: 3
  httpGet:
    path: /health
    port: 8080
    scheme: HTTP
  initialDelaySeconds: 10
  periodSeconds: 10
  successThreshold: 1
  timeoutSeconds: 1

Approval

PM	Charles Moulliard	☑
DevExp	John Clingan	☑
Vert.x	Clement Escoffier (Pending tuning of the right times to make user experience acceptable)	☑
WildFly Swarm	Heiko Braun Heiko Braun (Pending my comment on the /health protocol)	☑
Spring Boot	Charles Moulliard	☑
QE	Ladislav Thon	☑
Docs	Zach Rhoads	☑
DevExp	Andrew Lee Rubinger	☑
Architect	Scott Stark	☑

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mission: Health Check

Mission: Health Check

Description

User Problem

Concepts and Architectural Patterns

Prerequisites

Use Case

Acceptance Criteria

Vert.x-specific Acceptance Criteria

Swarm-specific Acceptance Criteria

Boot-specific Acceptance Criteria

Integration Requirements

Tags

Notes

Approval

Clone this wiki locally