Health Check

Quick Summary — TL;DR

A health check is a lightweight HTTP request to a known endpoint (e.g. /health) that verifies a service is running and responsive.
Shallow checks confirm the process is alive; deep checks also verify dependencies like databases and external APIs.
Run health checks every 1-5 minutes on a cron schedule, and alert after consecutive failures to catch outages before users do.

A health check is a lightweight HTTP request sent to an endpoint to verify that a service is running and responsive. If the endpoint returns a success status code (typically 200), the service is healthy. If it returns an error or doesn't respond within the timeout window, something is wrong.

Why health checks matter

Services fail. Deployments break things. Databases run out of connections. SSL certificates expire. Without health checks, you find out when users complain — or worse, when revenue drops. With health checks, you find out in seconds.

Types of health checks

Shallow (liveness)

The simplest check: "is the process running and accepting HTTP requests?" Returns 200 if the web server responds. Doesn't verify that the application is actually working — just that it's alive.

Deep (readiness)

Verifies that the application and its dependencies are working: database is reachable, Redis is connected, external APIs respond. Returns 200 only if everything checks out. More useful but slower and more fragile.

Dependency-specific

Individual checks for each dependency: one for the database, one for Redis, one for the payment API. Gives you pinpoint visibility into which component failed.

Implementing a health check endpoint

A good health check endpoint:

Lives at a known path — /health, /healthz, or /api/health
Returns fast — under 1 second, ideally under 200ms
Returns structured data — status, uptime, dependency states, version
Doesn't require authentication — load balancers and monitors need unauthenticated access
Doesn't have side effects — health checks run frequently, so they must be safe to call repeatedly (idempotent by nature)

Scheduled health checks

Running health checks on a cron schedule — every 1 to 5 minutes — gives you continuous uptime monitoring. When a check fails:

Log the failure with timestamp, status code, and response time
Retry after a short delay (transient failures are common)
If failures persist past a threshold, send an alert
When the service recovers, send a recovery notification

This is exactly what Recuro's cron scheduler does: hit your endpoint on a schedule, log every execution, and alert on consecutive failures.

Health checks vs uptime monitoring

Health checks are internal — your infrastructure checking itself. Load balancers use them to route traffic away from unhealthy instances. Uptime monitoring is external — a third party checking your service from outside your network. Both are important; they catch different failure modes.

FAQ

What is a health check?

A health check is an HTTP request to a known endpoint (like /health) that verifies a service is running and responsive. It's used by load balancers, monitoring systems, and scheduled jobs to detect outages.

How often should I run health checks?

Every 1 to 5 minutes is standard. More frequent checks catch outages faster but generate more traffic. For critical services, every minute. For less critical services, every 5 minutes.

Should health checks be authenticated?

Generally no — load balancers and external monitors need unauthenticated access. If you're concerned about exposing internal state, restrict access by IP or put the health endpoint on an internal port.

Health checks run on a cron schedule and return an HTTP status code to indicate service state. They're closely related to heartbeat monitoring, where the service itself reports in on a schedule. Failed checks should trigger alerts after a consecutive failure threshold, similar to how background jobs use retry policies. Use timeouts to detect unresponsive endpoints quickly.