Recuro.

Jitter

Quick Summary — TL;DR

  • Jitter is random variation added to retry delays so that many clients retrying at the same time do not all hit the server at the same instant.
  • Without jitter, exponential backoff still creates synchronized retry spikes — all clients wait the same duration and retry together.
  • Full jitter (random between 0 and the calculated delay) is the simplest and most effective approach for most systems.

Jitter is randomness added to the delay between retry attempts. When a service goes down and a hundred clients start retrying with exponential backoff, they all compute the same delays — 1 second, 2 seconds, 4 seconds — and retry in synchronized waves. Jitter breaks this synchronization by making each client wait a slightly different amount of time, spreading retries evenly across the delay window.

Why pure backoff is not enough

Exponential backoff solves the problem of retrying too frequently. But it introduces a subtler problem: synchronized retries. If 500 clients all fail at the same moment and all use the same backoff formula, they will all retry at second 1, then second 3, then second 7 — in perfect lockstep.

Each synchronized wave is a traffic spike. The recovering service handles the wave, partially recovers, then gets hit by the next wave. This pattern can keep a service in a degraded state far longer than the original failure warranted. It is a form of the thundering herd problem.

Jitter eliminates this. Instead of all 500 clients retrying at second 4, they retry at random times between second 0 and second 4, spreading the load evenly.

Types of jitter

Full jitter

The delay is a random value between 0 and the calculated backoff ceiling. This produces the widest spread and the smoothest retry distribution.

Formula: delay = random(0, base × 2(attempt - 1))

On attempt 4 with a 1-second base, the backoff ceiling is 8 seconds. The actual delay is a random value between 0 and 8 seconds. Some clients retry almost immediately; others wait the full 8 seconds.

Equal jitter

Take half the calculated delay as a fixed floor, then add randomness for the other half. This guarantees a minimum wait while still spreading retries.

Formula: delay = (ceiling / 2) + random(0, ceiling / 2)

On attempt 4 with a 1-second base, the delay falls between 4 and 8 seconds. Retries are spread out but never fire immediately after a failure. This is useful when you want to guarantee some breathing room for the recovering service.

Decorrelated jitter

Each delay is based on the previous delay rather than the attempt number. The delay grows unpredictably, making retry patterns harder to synchronize even across clients that started at different times.

Formula: delay = min(max_delay, random(base, previous_delay × 3))

This approach is less common but produces the most varied retry patterns. AWS recommends it in their architecture best practices for highly distributed systems.

Which jitter strategy to use

Strategy Spread Minimum delay Best for
Full jitterWidest0Most use cases; maximum spread
Equal jitterModerateHalf the ceilingWhen you need a guaranteed minimum wait
Decorrelated jitterVariableBase delayHighly distributed systems with many clients

For most applications, full jitter is the right default. It is simple to implement, produces the widest spread of retry times, and has been extensively studied. Only switch to equal or decorrelated jitter if you have a specific reason to guarantee minimum delays.

Jitter in practice

Your app sends webhook notifications to 200 customer endpoints. Your notification API goes down for 10 seconds. All 200 background jobs fail and enter retry.

Without jitter: all 200 jobs retry at second 1, then second 3, then second 7. Each wave is a 200-request spike.

With full jitter: the first retry is spread across 0 to 1 second (200 requests over 1 second). The second retry is spread across 0 to 2 seconds. The third retry is spread across 0 to 4 seconds. The spikes flatten into smooth, manageable traffic.

Use the retry delay calculator to experiment with different jitter strategies and see how delays change across attempts.

FAQ

What is jitter in retry logic?

Jitter is random variation added to the delay between retry attempts. It prevents many clients from retrying at the exact same time, which would create traffic spikes that overwhelm a recovering service. Jitter is almost always used alongside exponential backoff.

Why add randomness to retry delays?

Without randomness, every client computes the same delay and retries at the same instant — creating synchronized spikes. Randomness breaks this synchronization and spreads retries evenly across the delay window. The result is smoother, more predictable load on the recovering service.

What is the difference between full jitter and equal jitter?

Full jitter picks a random delay between 0 and the backoff ceiling, giving the widest possible spread. Equal jitter picks a random delay between half the ceiling and the full ceiling, guaranteeing a minimum wait time. Full jitter produces better load distribution; equal jitter provides a safety floor. For a deeper look, see retrying failed HTTP requests with exponential backoff.

Jitter is essential when using exponential backoff — without it, backoff alone still causes synchronized retry waves. This synchronization is the thundering herd problem, where many clients overwhelm a service simultaneously. Jitter is a core component of any robust retry policy, and its effects compound when combined with circuit breakers that stop retries entirely during sustained outages.

Stop managing infrastructure. Start scheduling jobs.

Recuro handles cron scheduling, retries, alerts, and execution logs -- so you can focus on building your product.

No credit card required