Rate Limiting

Rate limiting is a mechanism that caps how many requests a client can make to an API within a given time window. If you exceed the limit, the API rejects your request with HTTP 429 Too Many Requests until the window resets.

Why APIs implement rate limits

Rate limits protect APIs from abuse, prevent individual clients from monopolizing shared resources, and ensure fair access for all users. Without them, a single misbehaving script could overwhelm a service and degrade it for everyone.

Rate limits are also a business tool. Free tiers get lower limits. Paid plans get higher ones. The limit signals: "you can use this much before you need to upgrade."

Common rate limit types

Type	Example	Common in
Per second	10 requests/second	Real-time APIs, search endpoints
Per minute	60 requests/minute	General-purpose APIs
Per hour	1,000 requests/hour	Data-heavy endpoints
Per day	10,000 requests/day	Free tier APIs, email services
Concurrent	5 simultaneous requests	Heavy processing endpoints

HTTP 429 Too Many Requests

When you exceed a rate limit, the API returns a 429 status code. The response usually includes a Retry-After header telling you how many seconds to wait before retrying. Some APIs also include rate limit headers on every response:

X-RateLimit-Limit — the maximum number of requests allowed
X-RateLimit-Remaining — how many requests you have left
X-RateLimit-Reset — when the window resets (Unix timestamp or seconds)

Always check these headers. They let you proactively slow down before hitting the limit, rather than reacting after you've been rejected.

How to respect rate limits in background jobs

Throttle your queue

Configure your job queue to process jobs at a rate below the API's limit. If the API allows 60 requests per minute, process one job per second. This is the simplest and most reliable approach.

Use backoff on 429 responses

When a job receives a 429, retry it with exponential backoff. If the response includes a Retry-After header, use that value as the delay instead of calculating your own.

Batch requests where possible

If the API supports batch endpoints, combine multiple operations into a single request. One batch request for 100 items uses one rate limit slot instead of 100.

Spread work over time

Instead of enqueuing 10,000 jobs at once, spread them across a longer window. Schedule jobs with small delays between them. This avoids hitting the rate limit in the first place.

Rate limiting your own API

If you expose an API, implement rate limiting to protect against abuse. Common approaches:

Token bucket — each client gets a bucket of tokens. Each request consumes a token. Tokens refill at a fixed rate.
Sliding window — count requests in a rolling time window. More accurate than fixed windows, which can allow bursts at window boundaries.
Fixed window — simple counter that resets every minute/hour. Easy to implement but allows double the limit at window edges.

FAQ

What is a rate limit?

A rate limit is a cap on how many requests a client can make to an API in a given time period. It protects the API from overload and ensures fair access for all users.

What is HTTP 429?

HTTP 429 is the "Too Many Requests" status code. It means you've exceeded the API's rate limit and need to wait before sending more requests. Check the Retry-After header for how long to wait.

How do I handle rate limiting in my app?

Monitor rate limit headers on every response. When you see remaining requests dropping, slow down. If you receive a 429, back off for the duration specified in Retry-After. For background jobs, throttle your queue to stay below the limit proactively.

Use exponential backoff when you hit a rate limit. Queues help you stay within rate limits by controlling throughput. Rate limits and timeouts are the two most common causes of background job failures.