Quick Summary — TL;DR
Jitter is randomness added to the delay between retry attempts. When a service goes down and a hundred clients start retrying with exponential backoff, they all compute the same delays — 1 second, 2 seconds, 4 seconds — and retry in synchronized waves. Jitter breaks this synchronization by making each client wait a slightly different amount of time, spreading retries evenly across the delay window.
Exponential backoff solves the problem of retrying too frequently. But it introduces a subtler problem: synchronized retries. If 500 clients all fail at the same moment and all use the same backoff formula, they will all retry at second 1, then second 3, then second 7 — in perfect lockstep.
Each synchronized wave is a traffic spike. The recovering service handles the wave, partially recovers, then gets hit by the next wave. This pattern can keep a service in a degraded state far longer than the original failure warranted. It is a form of the thundering herd problem.
Jitter eliminates this. Instead of all 500 clients retrying at second 4, they retry at random times between second 0 and second 4, spreading the load evenly.
The delay is a random value between 0 and the calculated backoff ceiling. This produces the widest spread and the smoothest retry distribution.
Formula: delay = random(0, base × 2(attempt - 1))
On attempt 4 with a 1-second base, the backoff ceiling is 8 seconds. The actual delay is a random value between 0 and 8 seconds. Some clients retry almost immediately; others wait the full 8 seconds.
Take half the calculated delay as a fixed floor, then add randomness for the other half. This guarantees a minimum wait while still spreading retries.
Formula: delay = (ceiling / 2) + random(0, ceiling / 2)
On attempt 4 with a 1-second base, the delay falls between 4 and 8 seconds. Retries are spread out but never fire immediately after a failure. This is useful when you want to guarantee some breathing room for the recovering service.
Each delay is based on the previous delay rather than the attempt number. The delay grows unpredictably, making retry patterns harder to synchronize even across clients that started at different times.
Formula: delay = min(max_delay, random(base, previous_delay × 3))
This approach is less common but produces the most varied retry patterns. AWS recommends it in their architecture best practices for highly distributed systems.
| Strategy | Spread | Minimum delay | Best for |
|---|---|---|---|
| Full jitter | Widest | 0 | Most use cases; maximum spread |
| Equal jitter | Moderate | Half the ceiling | When you need a guaranteed minimum wait |
| Decorrelated jitter | Variable | Base delay | Highly distributed systems with many clients |
For most applications, full jitter is the right default. It is simple to implement, produces the widest spread of retry times, and has been extensively studied. Only switch to equal or decorrelated jitter if you have a specific reason to guarantee minimum delays.
Your app sends webhook notifications to 200 customer endpoints. Your notification API goes down for 10 seconds. All 200 background jobs fail and enter retry.
Without jitter: all 200 jobs retry at second 1, then second 3, then second 7. Each wave is a 200-request spike.
With full jitter: the first retry is spread across 0 to 1 second (200 requests over 1 second). The second retry is spread across 0 to 2 seconds. The third retry is spread across 0 to 4 seconds. The spikes flatten into smooth, manageable traffic.
Use the retry delay calculator to experiment with different jitter strategies and see how delays change across attempts.
Jitter is random variation added to the delay between retry attempts. It prevents many clients from retrying at the exact same time, which would create traffic spikes that overwhelm a recovering service. Jitter is almost always used alongside exponential backoff.
Without randomness, every client computes the same delay and retries at the same instant — creating synchronized spikes. Randomness breaks this synchronization and spreads retries evenly across the delay window. The result is smoother, more predictable load on the recovering service.
Full jitter picks a random delay between 0 and the backoff ceiling, giving the widest possible spread. Equal jitter picks a random delay between half the ceiling and the full ceiling, guaranteeing a minimum wait time. Full jitter produces better load distribution; equal jitter provides a safety floor. For a deeper look, see retrying failed HTTP requests with exponential backoff.
Jitter is essential when using exponential backoff — without it, backoff alone still causes synchronized retry waves. This synchronization is the thundering herd problem, where many clients overwhelm a service simultaneously. Jitter is a core component of any robust retry policy, and its effects compound when combined with circuit breakers that stop retries entirely during sustained outages.
Recuro handles cron scheduling, retries, alerts, and execution logs -- so you can focus on building your product.
No credit card required