Exponential backoff is a retry strategy where the delay between failed attempts doubles each time. Instead of retrying immediately (which hammers an already-struggling server), you wait 1 second, then 2, then 4, then 8 — giving the failing system time to recover.
Imagine an API goes down for 30 seconds. If your system retries every second, it fires 30 requests into a server that's already overloaded. Multiply that by every client doing the same thing and you get a thundering herd — a flood of retry traffic that prevents the server from recovering, even after the original problem is fixed.
Exponential backoff spreads retries over time. Instead of 30 retries in 30 seconds, you get 5 retries over several minutes. The server gets breathing room to recover.
The standard formula is: delay = base × 2(attempt - 1)
With a 1-second base delay:
| Attempt | Delay | Total elapsed |
|---|---|---|
| 1 | 1 second | 1s |
| 2 | 2 seconds | 3s |
| 3 | 4 seconds | 7s |
| 4 | 8 seconds | 15s |
| 5 | 16 seconds | 31s |
| 6 | 32 seconds | ~1 min |
| 7 | 64 seconds | ~2 min |
| 8 | 128 seconds | ~4 min |
Delays grow fast. Eight attempts cover about four minutes. Ten attempts reach roughly 17 minutes. This is by design — if a service hasn't recovered in 30 seconds, it probably needs minutes, not more requests.
Pure exponential backoff has a problem: if 100 clients all start retrying at the same time, they'll all retry at the same intervals — 1s, 2s, 4s — creating synchronized traffic spikes. Jitter fixes this by adding random variation to each delay.
The most common approach is full jitter: instead of waiting exactly 4 seconds on attempt 3, wait a random duration between 0 and 4 seconds. This spreads the retries across the entire interval, eliminating synchronized spikes.
The formula becomes: delay = random(0, base × 2(attempt - 1))
Your app has a background job that sends a Slack notification via webhook. The Slack API returns 503 (service unavailable). Here's what happens with exponential backoff (base 2s, max 5 attempts):
Total elapsed: 6 seconds. The notification was delivered with minimal delay and zero manual intervention.
Exponential backoff is a retry strategy where the wait time between retries doubles after each failure. It prevents overwhelming a struggling service with retry traffic and gives it time to recover.
Jitter is random variation added to the backoff delay. Without jitter, many clients retrying at the same time create synchronized traffic spikes. With jitter, retries are spread out randomly, reducing contention.
It depends on the job's criticality and failure mode. For HTTP jobs hitting external APIs, 3 to 5 retries is common. For critical business operations (payments, data sync), 5 to 10 retries with longer base delays gives more time for recovery. Always pair with a max delay ceiling.
Retries only work safely if jobs are idempotent. After max retries with backoff, jobs go to the dead letter queue. Job queues implement backoff automatically, and backoff is especially important when failures are caused by timeouts or rate limits.
Recuro handles cron scheduling, retries, alerts, and execution logs -- so you can focus on building your product.
No credit card required