How to auto-restart failed apps

Goal

This shows how probes can be added to Kubernetes deployments so that the underlying application is automatically restarted or excluded from receiving requests.

Prerequisites

This guide assumes that you already have an application and a deployment manifest for it.

Info

Also see the upstream Kubernetes documentation about probes

Generally speaking Kubernetes supports three kinds different kinds of probes. All of these are run by Kubernetes at regular (configurable) intervals to check the health and readiness of a container.

Restart unhealthy apps

The first probe we'll consider is called a liveness probe. Its job is to probe the container to check if it is still alive and restarts it when it is not.

Adding a liveness probe is simple enough. In the following example an HTTP probe is added to check the /healthz route. Any status code greater than or equal to 200 and less than 400 indicates success. Any other code indicates failure.

apiVersion: v1
kind: Pod
metadata:
  name: example-liveness-http
spec:
  containers:
  - name: liveness
    image: k8s.gcr.io/liveness
    args:
    - /server
    livenessProbe:
      httpGet:
        path: /healthz
        port: 8080

What makes a liveness check

Liveness checks should only check the bare minimum of a running container. They should never include checks of required dependent services. They should also be configured with more generous failure thresholds than readiness probes.

Starve unready apps

Another useful probe is a readiness probe. Similar to liveness probes, Kubernetes regularly runs it to check whether the container is ready to accept traffic. If not, the container is removed from any upstream services and no new requests are delivered to it.

In the following example an HTTP probe is configure to check the /readiness route. Any status code greater than or equal to 200 and less than 400 indicates success. Any other code indicates failure.

apiVersion: v1
kind: Pod
metadata:
  name: example-liveness-http
spec:
  containers:
  - name: liveness
    image: k8s.gcr.io/liveness
    args:
    - /server
    readinessProbe:
      httpGet:
        path: /healthz
        port: 8080

What makes a readiness check

Readiness checks should check whether the app could serve a request right now. It should however still not check shared dependencies¹². Instead it should check internal state (e.g. request queue is not too long) and it may check availability of exclusive dependencies.

Last update: 2022-10-04