Skip to content
This repository has been archived by the owner on May 29, 2024. It is now read-only.

Mission: Health Check

Stefan Sitani edited this page Apr 20, 2017 · 5 revisions

Mission: Health Check

ID Short Name

104

health-check

Description

The purpose of this use case is to demonstrate how the Kubernetes health checks works in order to determine if a container is still alive (= liveness) and ready to serve (= readiness) the traffic for the application’s HTTP endpoints.

To demonstrate this behavior, we will configure a /health HTTP endpoint which is used by Kubernetes to issue HTTP requests. If the container is still alive, as the Health HTTP endpoint is able to reply, the management platform will receive 200 as return code and then no further action is required.

But, if the HTTP endpoint doesn’t return a response (JVM no longer running, thread blocked, etc), then the platform will kill the pod and recreate a new container to restart the application.

As the pod will be down for a certain period of time, we will be able to show that the endpoint exposing the service is no longer available; in this case, an HTTP 503 response will be returned. The user gets this return code from the Kubernetes proxy; the management platform has detected that the endpoint used to check if the container is ready to serve the traffic can’t reply. By consequence, the IP address and port of the server exposing the service will be removed from the Kubernetes proxy. == User Problem

When an application is deployed top of OpenShift/Kubernetes it is important to figure out if each container is available and able to serve incoming requests. By implementing the health-check pattern, it becomes possible to monitor the health of the container and whether it is able to serve traffic.

Concepts and Architectural Patterns

  • Health Check using Liveness (= process is alive, jvm started) and Readiness (= ready to serve traffic) probes

  • Fail-over

  • Resilience

Prerequisites

The runtime (SpringBoot, Swarm, Vert.x) provides the code or the jar file containing the endpoint /health.

Use Case

Success scenario

The use case starts when the application has been deployed into OpenShift The user can access the application using a web page provided by the application where the following scenario will be proposed : Click on the “greeting service” button to call api/greeting Verify that a JSON response message is received {"content": "Hello, World!"} Click on the button /api/killme and wait till you will get a response timeout message displayed Click again on the greeting service button Verify that you will now get a HTTP 503 response which means that the service has been removed by Kubernetes as the pod is killed and readiness probe can’t reply Wait a sufficient amount of time to let the time to Kubernetes to detect that the pod is killed to recreate a new one. This value corresponds to the parameter “periodSeconds” Re-click on the /api/greeting button Verify that a JSON response message is received as expected {"content": "Hello, World!"}

Alternate scenario

2a. Open a Unix/Windows Terminal 3. Retrieve the URL address of the route exposing the service /api/greeting using the OpenShift console or the OpenShift client and the command oc get route/${artifactId} 4. Call the greeting service using the curl client with the command curl http://<HOST_PORT_ADDRESS>/api/greeting 5. Verify that t a JSON response message is received {"content": "Hello, World!"} 6. Issue another curl request in order to call the HTTP endpoint responsible to kill the server (or make the response time of the server longer than the probe value expected) curl http://<HOST_PORT_ADDRESS>/api/killme 7. Call the REST endpoint exposing the greeting service to verify that you will now get a HTTP 503 response which means that the service has been removed by Kubernetes as the pod is killed and readiness probe can’t reply 8. Wait a sufficient amount of time to let the time to Kubernetes to detect that the pod is killed to recreate a new one. This value corresponds to the parameter “periodSeconds” 9. Call the greeting service using the curl client and the following request curl http://<HOST_PORT_ADDRESS>/api/greeting Verify that a JSON response message is received as expected {"content": "Hello, World!"}

3a. Call the /health endpoint to get a HTTP 200 response but also the status of the health endpoint {"status":"UP"} curl http://<HOST_PORT_ADDRESS>/health

The steps 1. to 10. doesn’t render visually what happen behind the scene when Kubernetes triggers if the pod is ready/alive, remove the endpoint from the Kubernetes Api gateway and recreate it.

A more dynamic approach could be developed to include a video like this one

with the steps/by/steps instructions as described previously

Acceptance Criteria

During nominal work, a curl or http request issued against the following service $protocol://$hostname:$port/api/greeting returns { "content": "Hello, World!"} If the pod is killed (and during a period of x seconds), the same request will get as response a HTTP 503 - unavailable response

Vert.x-specific Acceptance Criteria

Swarm-specific Acceptance Criteria

Swarm uses it’s internal feature to “suspend” the server as means to simulate a non-responsive service.

Boot-specific Acceptance Criteria

Integration Requirements

Tags

Health Check, Readiness, Liveness

Notes

The use case will consist of:

Develop a HTTP application which expose 3 endpoints; a /api/greeting, /health and a /api/killme. The greeting endpoint will return a json Hello World message while the killme endpoint will be used to stop the server. Create a deployment.yaml file under the directory src/main/fabric8. It will contain the definition of the readiness & liveness probes. They both will setup the endpoint /health under the port 8080. The initial delay like the period & threshold will be defined as such

…​ livenessProbe: failureThreshold: 3 httpGet: path: /health port: 8080 scheme: HTTP initialDelaySeconds: 180 periodSeconds: 10 successThreshold: 1 timeoutSeconds: 1 …​ readinessProbe: failureThreshold: 3 httpGet: path: /health port: 8080 scheme: HTTP initialDelaySeconds: 10 periodSeconds: 10 successThreshold: 1 timeoutSeconds: 1

Approval

PM

Name

DevExp

Name

Vert.x

Name

WildFly Swarm

Name

Spring Boot

Name

QE

Name

Docs

Name

Architect

Name