How to auto-restart failed apps
Goal
This shows how probes can be added to Kubernetes deployments so that the underlying application is automatically restarted or excluded from receiving requests.
Prerequisites
This guide assumes that you already have an application and a deployment manifest for it.
Info
Also see the upstream Kubernetes documentation about probes
Generally speaking Kubernetes supports three kinds different kinds of probes. All of these are run by Kubernetes at regular (configurable) intervals to check the health and readiness of a container.
Restart unhealthy apps
The first probe we'll consider is called a liveness probe. Its job is to probe the container to check if it is still alive and restarts it when it is not.
Adding a liveness probe is simple enough.
In the following example an HTTP probe is added to check the /healthz
route.
Any status code greater than or equal to 200 and less than 400 indicates success.
Any other code indicates failure.
What makes a liveness check
Liveness checks should only check the bare minimum of a running container. They should never include checks of required dependent services. They should also be configured with more generous failure thresholds than readiness probes.
Starve unready apps
Another useful probe is a readiness probe. Similar to liveness probes, Kubernetes regularly runs it to check whether the container is ready to accept traffic. If not, the container is removed from any upstream services and no new requests are delivered to it.
In the following example an HTTP probe is configure to check the /readiness
route.
Any status code greater than or equal to 200 and less than 400 indicates success.
Any other code indicates failure.
What makes a readiness check
Readiness checks should check whether the app could serve a request right now. It should however still not check shared dependencies12. Instead it should check internal state (e.g. request queue is not too long) and it may check availability of exclusive dependencies.