Circuit Breaker Pattern — Protecting Services from Cascading Failures

Quick Summary — TL;DR

A circuit breaker stops your app from repeatedly calling a failing service, preventing cascading failures across your system.
Three states: closed (normal), open (all requests blocked), half-open (testing if the service recovered).
Complements retry logic — retries handle individual failures, circuit breakers handle sustained outages by failing fast.

A circuit breaker is a design pattern that prevents your application from repeatedly calling a failing service. Like an electrical circuit breaker that trips to prevent a fire, a software circuit breaker "trips" after detecting too many failures — temporarily blocking requests to the failing service so it has time to recover.

Why circuit breakers exist

When a downstream service goes down, continuing to send requests creates two problems:

Wasted resources — every request ties up a connection, a thread, and time while waiting for a response that will never come (or will be an error)
Cascading failures — your service slows down waiting for the broken dependency, which slows down services that depend on you, which slows down their dependents — until the entire system is degraded

A circuit breaker stops this cascade by failing fast. Instead of waiting for a timeout, the circuit breaker returns an error immediately, freeing resources for requests that can actually succeed.

The three states

Closed (normal operation)

Requests pass through normally. The circuit breaker monitors failures. If the failure count exceeds a threshold within a time window (e.g., 5 failures in 30 seconds), the breaker trips open.

Open (failing fast)

All requests are immediately rejected without calling the downstream service. This gives the failing service time to recover. After a configured timeout (e.g., 60 seconds), the breaker moves to half-open.

Half-open (testing recovery)

A limited number of test requests are allowed through. If they succeed, the breaker closes and normal operation resumes. If they fail, the breaker opens again for another timeout period.

Circuit breaker vs retry

Retries with backoff handle transient failures — a single request that failed and might work on the next attempt. A circuit breaker handles sustained failures — when the service is down and retrying is pointless.

They work together: retries handle individual failures, and when retries consistently fail, the circuit breaker trips to stop all attempts until recovery.

Configuration parameters

Parameter	What it controls	Typical value
Failure threshold	How many failures before tripping	5 – 10
Monitoring window	Time window for counting failures	30 – 60 seconds
Open duration	How long to stay open before half-open	30 – 120 seconds
Half-open requests	How many test requests in half-open	1 – 3
Success threshold	How many successes to close again	2 – 5

When to use a circuit breaker

Calling external APIs — third-party services you don't control
Microservice communication — internal services that may deploy independently
Database connections — when connection pools are exhausted
Any call that can time out — and where failing fast is better than waiting

FAQ

What is the circuit breaker pattern?

The circuit breaker pattern monitors calls to a service and stops making requests when failures exceed a threshold. It fails fast instead of waiting for timeouts, preventing cascading failures across your system.

How is a circuit breaker different from a retry?

A retry policy handles individual request failures by trying again. A circuit breaker handles systemic failures by stopping all requests. Retries are per-request; circuit breakers are per-service.

Do I need a circuit breaker for background jobs?

If your background jobs call external services, yes. Without a circuit breaker, a down dependency can cause job retries to pile up, consume all your workers (concurrency exhaustion), and block other jobs from processing.

Circuit breakers complement retry policies and exponential backoff — retries handle transient failures, circuit breakers handle sustained outages. They prevent timeout-induced slowdowns, guard against thundering herd scenarios, and act as a form of backpressure when an API is struggling. They also protect against rate limit exhaustion by stopping requests before they're sent.

Circuit Breaker