Complete Guide to Docker Healthchecks and Restart Policies
Your Jellyfin container is running. Docker says it’s healthy. But the web UI returns a blank page and nobody can stream anything. Docker’s default “running” status only tells you the process hasn’t crashed — it says nothing about whether the service actually works.
Docker healthchecks fix this. They let you define what “healthy” actually means for each container, and combined with restart policies, they create a self-healing setup where broken services recover automatically without you waking up at 3 AM.
This guide covers everything: how healthchecks work, how to write good ones, restart policy options, and how to wire them together in Docker Compose for a bulletproof self-hosted stack.
Why Default Docker Status Isn’t Enough
By default, Docker tracks one thing: is the main process (PID 1) running? If yes, the container is “running.” If not, it exited.
This misses a huge category of failures:
- Deadlocked processes — the app is running but frozen, not accepting connections
- Database connection loss — the app started but can’t reach its database
- OOM degradation — the container is alive but thrashing on memory, effectively unusable
- Config errors — the service started but loaded bad config and returns 500s on every request
- Port conflicts — the process is running but not bound to the expected port
Healthchecks let you probe the actual service behavior, not just process existence.
How Docker Healthchecks Work
A healthcheck is a command that Docker runs inside the container at regular intervals. Based on the exit code, Docker marks the container as one of three states:
| State | Meaning |
|---|---|
starting | Container just started, still within the start period |
healthy | Healthcheck command returned exit code 0 |
unhealthy | Healthcheck failed more times than the retry threshold |
The key parameters:
test— The command to runinterval— How often to run the check (default: 30s)timeout— How long to wait for the check to complete (default: 30s)retries— How many consecutive failures before marking unhealthy (default: 3)start_period— Grace period after container start where failures don’t count (default: 0s)start_interval— Check interval during the start period (Docker 25+, default: 5s)
Writing Healthchecks in Docker Compose
Here’s the basic syntax in a docker-compose.yml:
services:
myapp:
image: myapp:latest
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40s
The test Field Formats
You have three options for defining the test command:
# Option 1: CMD — runs the command directly (preferred)
test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
# Option 2: CMD-SHELL — runs through /bin/sh (supports pipes, redirects)
test: ["CMD-SHELL", "curl -f http://localhost:8080 || exit 1"]
# Option 3: Shell string shorthand (equivalent to CMD-SHELL)
test: curl -f http://localhost:8080 || exit 1
Use CMD when the command is simple. Use CMD-SHELL when you need shell features like ||, &&, pipes, or variable expansion.
The curl Problem
Many minimal Docker images don’t include curl. You’ll get a healthcheck failure not because the service is down, but because the check tool doesn’t exist. Alternatives:
# wget (available in Alpine-based images)
test: ["CMD", "wget", "--no-verbose", "--tries=1", "--spider", "http://localhost:8080/health"]
# Using /dev/tcp in bash (no external tools needed)
test: ["CMD-SHELL", "bash -c '</dev/tcp/localhost/8080' || exit 1"]
# For PostgreSQL
test: ["CMD-SHELL", "pg_isready -U postgres"]
# For Redis
test: ["CMD-SHELL", "redis-cli ping | grep -q PONG"]
# For MariaDB/MySQL
test: ["CMD-SHELL", "mariadb-admin ping -h localhost -u root --password=$$MYSQL_ROOT_PASSWORD || exit 1"]
The double $$ in Compose files escapes the $ sign so the variable is expanded inside the container, not by Compose.
Healthcheck Examples for Popular Self-Hosted Apps
Here are battle-tested healthchecks for services you’re probably running:
Databases
# PostgreSQL
postgres:
image: postgres:16
environment:
POSTGRES_PASSWORD: secretpassword
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres"]
interval: 10s
timeout: 5s
retries: 5
start_period: 30s
# MariaDB
mariadb:
image: mariadb:11
environment:
MYSQL_ROOT_PASSWORD: secretpassword
healthcheck:
test: ["CMD-SHELL", "mariadb-admin ping -h localhost -u root --password=$$MYSQL_ROOT_PASSWORD"]
interval: 10s
timeout: 5s
retries: 5
start_period: 30s
# Redis
redis:
image: redis:7-alpine
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 10s
timeout: 5s
retries: 3
Web Applications
# Jellyfin
jellyfin:
image: jellyfin/jellyfin:latest
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8096/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 60s
# Nextcloud
nextcloud:
image: nextcloud:latest
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost/status.php"]
interval: 30s
timeout: 10s
retries: 3
start_period: 120s
# Vaultwarden
vaultwarden:
image: vaultwarden/server:latest
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:80/alive"]
interval: 30s
timeout: 10s
retries: 3
start_period: 20s
Reverse Proxies
# Nginx Proxy Manager
npm:
image: jc21/nginx-proxy-manager:latest
healthcheck:
test: ["CMD-SHELL", "curl -f http://localhost:81/api/ || exit 1"]
interval: 30s
timeout: 10s
retries: 3
start_period: 30s
# Caddy
caddy:
image: caddy:2-alpine
healthcheck:
test: ["CMD", "wget", "--no-verbose", "--tries=1", "--spider", "http://localhost:2019/config/"]
interval: 30s
timeout: 10s
retries: 3
Docker Restart Policies Explained
Restart policies tell Docker what to do when a container stops. There are four options:
no (default)
restart: "no"
Docker does nothing when the container exits. You have to start it manually. This is the default — which means if you haven’t set a restart policy, your containers won’t survive a server reboot.
always
restart: always
Docker restarts the container no matter what — whether it exited cleanly (code 0), crashed (non-zero), or the Docker daemon restarted (like after a reboot). The only way to stop it is docker stop, and even then it restarts when the daemon starts again.
Best for: Critical services that must always run (reverse proxy, databases, DNS).
unless-stopped
restart: unless-stopped
Same as always, except Docker won’t restart it after a daemon restart if you manually stopped it with docker stop. This is the sweet spot for most self-hosted services.
Best for: Most services. Survives reboots, but respects manual stops.
on-failure
restart: on-failure:5
Only restarts if the container exits with a non-zero exit code. The optional number limits restart attempts. After 5 failures, Docker gives up.
Best for: Batch jobs, migration scripts, or containers that should run once and exit cleanly.
Which Restart Policy Should You Use?
For self-hosting, the answer is almost always unless-stopped. Here’s the decision tree:
- Critical infrastructure (reverse proxy, DNS, database) →
always - Normal services (Jellyfin, Nextcloud, Gitea) →
unless-stopped - One-shot tasks (database migrations, backup scripts) →
on-failureorno - Development/testing →
no
Combining Healthchecks with depends_on
The real power comes from wiring healthchecks into service dependencies. Without healthchecks, depends_on only waits for the container to start — not for the service to be ready:
services:
postgres:
image: postgres:16
environment:
POSTGRES_PASSWORD: secretpassword
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres"]
interval: 10s
timeout: 5s
retries: 5
start_period: 30s
restart: unless-stopped
app:
image: myapp:latest
depends_on:
postgres:
condition: service_healthy
restart: unless-stopped
The condition: service_healthy directive tells Compose to wait until PostgreSQL’s healthcheck passes before starting the app. Without this, your app would start immediately, try to connect to a database that’s still initializing, and crash.
Common depends_on Conditions
depends_on:
db:
condition: service_healthy # Wait for healthcheck to pass
cache:
condition: service_started # Just wait for container start (default)
migrations:
condition: service_completed_successfully # Wait for exit code 0
The service_completed_successfully condition is useful for one-shot init containers like database migrations that need to finish before the app starts.
A Complete Self-Healing Stack Example
Here’s a real-world Compose file that ties everything together — a web app with PostgreSQL, Redis, and proper health/dependency management:
services:
postgres:
image: postgres:16
environment:
POSTGRES_DB: appdb
POSTGRES_USER: appuser
POSTGRES_PASSWORD: ${DB_PASSWORD}
volumes:
- postgres_data:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U appuser -d appdb"]
interval: 10s
timeout: 5s
retries: 5
start_period: 30s
restart: unless-stopped
redis:
image: redis:7-alpine
command: redis-server --requirepass ${REDIS_PASSWORD}
volumes:
- redis_data:/data
healthcheck:
test: ["CMD-SHELL", "redis-cli -a $$REDIS_PASSWORD ping | grep -q PONG"]
interval: 10s
timeout: 5s
retries: 3
restart: unless-stopped
app:
image: myapp:latest
environment:
DATABASE_URL: postgres://appuser:${DB_PASSWORD}@postgres:5432/appdb
REDIS_URL: redis://:${REDIS_PASSWORD}@redis:6379
ports:
- "8080:8080"
depends_on:
postgres:
condition: service_healthy
redis:
condition: service_healthy
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 60s
restart: unless-stopped
volumes:
postgres_data:
redis_data:
If PostgreSQL crashes, Docker restarts it. If the app can’t reach the database and its healthcheck starts failing, Docker marks it unhealthy. Combined with a monitoring tool like Uptime Kuma or Beszel, you’ll get alerted before users even notice.
Monitoring Healthcheck Status
Command Line
# Check health status of all containers
docker ps --format "table {{.Names}}\t{{.Status}}"
# Detailed health info for a specific container
docker inspect --format='{{json .State.Health}}' container_name | jq
# Watch healthcheck logs
docker inspect --format='{{range .State.Health.Log}}{{.Output}}{{end}}' container_name
Docker Events
You can subscribe to health status change events:
# Stream health events in real-time
docker events --filter event=health_status
This is useful for integrating with alerting systems — pipe the output to a script that sends notifications on health_status: unhealthy.
Troubleshooting Common Issues
Healthcheck Passes but Service Is Broken
Your healthcheck might be too shallow. Checking if port 8080 is open doesn’t mean the app is functional. If the service has a /health or /api/status endpoint, use that instead of a simple TCP check.
Container Stuck in “starting” State
The start_period is too short for your service. Some apps (Nextcloud, large Java apps) need 60–120 seconds to initialize. Increase start_period and check your container’s startup logs with docker logs.
Healthcheck Can’t Find curl/wget
The image doesn’t include those tools. Options:
- Use a built-in check (
pg_isready,redis-cli ping) - Use the bash TCP trick:
bash -c '</dev/tcp/localhost/PORT' - Build a custom image that adds the tool
- Use
wgetin Alpine-based images (it’s included by default)
Restart Loop (Container Keeps Crashing)
If a container fails, gets restarted, and immediately fails again, Docker applies exponential backoff — waiting longer between each restart attempt (starting at 100ms, doubling each time, capping at ~2 minutes). Check docker logs to find the root cause instead of waiting for it to eventually stay up.
depends_on Not Waiting for Healthy
Make sure you’re using the condition syntax:
# This does NOT wait for health
depends_on:
- postgres
# This DOES wait for health
depends_on:
postgres:
condition: service_healthy
The shorthand list syntax only waits for the container to start, not for it to become healthy.
Best Practices
Always set a
start_period— especially for apps with slow initialization. This prevents false unhealthy alerts during startup.Keep healthchecks lightweight — a simple HTTP request or CLI ping is enough. Don’t run database queries or complex scripts that add load.
Use
unless-stoppedas your default restart policy — it survives reboots but respects manual stops.Check what tools the image provides — before writing a
curlhealthcheck, make surecurlexists in the image.Set appropriate intervals — 10s for databases, 30s for web apps. Too frequent and you add unnecessary load; too infrequent and you miss problems.
Use
depends_on: condition: service_healthyfor any service that relies on a database or cache — it eliminates race conditions during startup.Monitor unhealthy events — healthchecks alone don’t alert you. Pair them with Uptime Kuma, Beszel, or a simple Docker events watcher.
Wrapping Up
Docker healthchecks and restart policies are the foundation of a reliable self-hosted setup. Healthchecks tell you when something is actually broken (not just “running”), restart policies handle recovery automatically, and depends_on conditions prevent startup race conditions.
Add healthchecks to every service in your stack. Start with databases — they’re the most common dependency and the easiest to check. Then add them to web services using their built-in health endpoints. Combined with unless-stopped restart policies, you’ll have a self-healing stack that recovers from most failures without any intervention.
Your future self, not getting paged at 3 AM because Postgres silently wedged itself, will thank you.