Distributed Cron

Quick Summary — TL;DR

Distributed cron runs cron-scheduled tasks across multiple servers while guaranteeing each job fires exactly once per schedule — not once per server.
The core problem: if three servers each run the same crontab, every job executes three times. Distributed cron uses leader election or distributed locks to prevent this.
Managed scheduling services like Recuro eliminate the problem entirely — the scheduler runs outside your infrastructure, so there's no duplication to solve.

Distributed cron is the practice of running cron-scheduled tasks across a cluster of servers while ensuring each job executes exactly once per schedule interval. On a single machine, the cron daemon handles this naturally — there's one scheduler, one execution. But the moment you scale to multiple servers for redundancy or load distribution, every server's cron daemon independently fires the same jobs at the same time.

The duplication problem

Imagine you have a cron job that sends a daily billing summary email, scheduled via crontab at 0 9 * * *. On a single server, one email goes out at 9 AM. Deploy that same crontab to three app servers behind a load balancer, and your users get three identical emails every morning.

This isn't just annoying — it's dangerous. A job that processes payments, triggers webhooks, or mutates database state will cause real damage when it runs multiple times. The duplication problem is the central challenge distributed cron exists to solve.

Approaches to distributed cron

Leader election

One server in the cluster is elected as the "leader" and is the only node that runs scheduled tasks. If the leader goes down, another node takes over. Tools like etcd, ZooKeeper, and Consul provide leader election primitives. The downside: all cron load concentrates on one server, and failover adds latency.

Distributed locks

Every server attempts to acquire a lock (typically in Redis, a database, or a coordination service) before executing each job. The first server to grab the lock runs the job; the others skip it. This spreads load more evenly than leader election and handles per-job granularity. Redis-based locks with TTLs (using Redlock or similar) are the most common implementation.

Designated scheduler node

Run cron on exactly one server and keep the other servers as workers only. Simple, but creates a single point of failure. If the scheduler node goes down and you don't have health checks in place, jobs silently stop running.

External scheduling service

Move scheduling out of your infrastructure entirely. A managed service like Recuro evaluates cron expressions and triggers your jobs via HTTP at the right time. Your servers only need to handle execution — no locks, no leader election, no duplication logic. This is the approach that scales best and requires the least operational overhead.

Lock contention and edge cases

Distributed locks are not a silver bullet. Watch out for these failure modes:

Clock skew — servers with slightly different clocks may evaluate cron expressions at different times, causing lock races or missed executions.
Lock expiry — if a job runs longer than the lock TTL, the lock expires and another server picks up the same job. Always set lock TTLs longer than your maximum expected job duration.
Split-brain — network partitions can cause multiple nodes to believe they hold the lock. Consensus-based systems (etcd, ZooKeeper) handle this better than simple Redis locks.
Stale locks — a server crashes while holding a lock. Without a TTL, the lock persists forever and the job never runs again.

Making jobs safe for distributed execution

Even with distributed locks, design your jobs to be idempotent. Locks reduce the probability of duplicate execution to near zero, but edge cases (lock expiry, split-brain) mean it can still happen. If running a job twice causes the same final state as running it once, the occasional duplicate is harmless.

FAQ

What is the difference between distributed cron and regular cron?

Regular cron runs on a single machine — one daemon, one execution per schedule. Distributed cron coordinates across multiple machines to ensure a scheduled job runs exactly once across the entire cluster, even though every machine has the same schedule defined.

Do I need distributed cron if I use Kubernetes?

Kubernetes has CronJob resources that handle this natively — the control plane schedules a pod to run the job, so only one instance executes. However, Kubernetes CronJobs have limitations: no built-in retries on HTTP failures, limited observability, and no easy way to manage complex schedules. For production-critical scheduling, a dedicated service gives you more control.

How does Recuro handle the distributed cron problem?

Recuro runs the scheduler externally. It evaluates your cron expressions on its own infrastructure and sends an HTTP request to your endpoint when it's time to execute. Since the trigger comes from a single source, there's no duplication to coordinate — your servers just handle the incoming request.

Distributed cron builds on the same cron daemon and crontab concepts from single-server scheduling, but adds coordination to handle concurrency across nodes. Jobs must be idempotent to handle edge cases, and the ultimate goal is exactly-once processing per schedule interval. For teams that want to skip the coordination complexity, the cron expression generator paired with a managed job scheduling service is the simplest path forward.