Backpressure

Quick Summary — TL;DR

Backpressure occurs when a system produces work faster than downstream consumers can process it, causing queues to grow unbounded.
Symptoms include rising queue depth, increasing job latency, memory exhaustion, and eventually dropped or timed-out jobs.
Handle it by rate limiting producers, scaling consumers, dropping low-priority work, or buffering with bounded queues that reject overflow.

Backpressure is what happens when a system produces work faster than consumers can process it. Think of water flowing through a pipe: if you pour faster than the pipe can drain, pressure builds and eventually something overflows. In software, the "pipe" is your job queue or message buffer, and the overflow is growing latency, out-of-memory crashes, or lost data.

Why backpressure matters

Every system has a processing ceiling. Your workers can handle a fixed number of jobs per second. When incoming work exceeds that rate — even briefly — a backlog forms. If the burst is short, the backlog clears itself once traffic returns to normal. If the burst persists, the backlog grows without bound.

An unchecked backlog is dangerous. Queue depth climbs, memory usage rises, and job latency degrades from seconds to minutes to hours. Eventually the broker runs out of memory and crashes, taking every queued job with it. This is not a theoretical risk — it is the most common failure mode in systems without backpressure handling.

Common causes

Traffic spikes — a marketing campaign, viral event, or batch import floods the system with work that exceeds steady-state capacity
Slow downstream services — an external API starts responding in 5 seconds instead of 200 milliseconds, reducing your effective throughput by 25x
Underpowered workers — you added a new job type that is heavier than expected, and your existing worker pool cannot keep up
Retry storms — a downstream outage causes thousands of jobs to fail and retry simultaneously, doubling or tripling the effective queue load
Fan-out amplification — a single event creates hundreds of downstream jobs (e.g., notifying all subscribers), multiplying queue depth faster than expected

Detecting backpressure

Monitor these signals:

Signal	What it tells you	Healthy range
Queue depth	How many jobs are waiting	Low and stable
Queue latency	Time between enqueue and dequeue	Under a few seconds
Worker utilization	Percentage of time workers are busy	60 – 80%
Memory usage (broker)	How much memory the queue system is using	Well below limits
Enqueue rate vs dequeue rate	Whether you are producing faster than consuming	Dequeue ≥ enqueue

If queue depth is rising steadily over time, you have a backpressure problem — even if nothing is crashing yet.

Strategies for handling backpressure

Rate limit the producers

Slow down the source. If your API endpoint enqueues a job on every request, add rate limiting to cap how many jobs can be created per second. This is the most direct approach — stop the flood at the source.

Scale the consumers

Add more workers. If your current pool processes 100 jobs per second and you need 200, double the worker count. This works when the bottleneck is worker concurrency rather than a downstream dependency. It does not help if workers are slow because an external API is slow.

Use bounded queues

Set a maximum queue size. When the queue is full, new jobs are rejected with an error that tells the producer to slow down or try later. This prevents unbounded memory growth and forces the system to degrade gracefully instead of crashing.

Drop or defer low-priority work

Not all jobs are equal. During backpressure, shed load by dropping or deferring non-critical work. Process payment jobs immediately but delay analytics events. Use priority queues to ensure critical work is processed first.

Buffer with overflow handling

Write overflow jobs to durable storage (a database table or object store) when the primary queue is under pressure. A separate process drains the overflow buffer back into the main queue once pressure subsides. This preserves every job but accepts higher latency for overflow work.

Backpressure vs rate limiting

Rate limiting is one tool for managing backpressure, but they are not the same thing. Rate limiting caps throughput at a fixed boundary (e.g., 100 requests per second). Backpressure is the broader problem of producers outpacing consumers — rate limiting is one of several strategies to address it.

Rate limiting is proactive: you set limits before pressure builds. Backpressure handling is often reactive: you detect growing queues and respond by scaling, shedding load, or throttling.

FAQ

What is backpressure in software?

Backpressure is the condition where a system receives work faster than it can process it. The term comes from fluid dynamics — pressure that builds when flow is restricted. In software, it manifests as growing queues, rising latency, and eventually resource exhaustion or crashes.

How do you handle backpressure?

The four main strategies are: rate limit the producers to slow incoming work, scale consumers to increase processing capacity, use bounded queues that reject overflow, and shed low-priority load during pressure spikes. The right approach depends on whether the bottleneck is your workers, a downstream service, or both.

What is the difference between backpressure and rate limiting?

Rate limiting is a specific mechanism that caps throughput at a predefined boundary. Backpressure is the broader problem of demand exceeding capacity. Rate limiting is one technique for managing backpressure, alongside scaling consumers, shedding load, and using bounded buffers.

Backpressure is a fundamental challenge in any job queue system. Managing concurrency is the consumer-side response — more workers process more jobs — while rate limiting is the producer-side response. When backpressure causes widespread failures, retry storms can create a circuit breaker-tripping cascade, and jobs that cannot be processed in time may need a dead letter queue as a safety net.