Automating Failed Payment Retries

The hidden cost of failed payments

Every subscription business deals with failed payments, but most underestimate how much revenue they silently lose. Industry data consistently shows that 10–15% of subscription revenue is at risk due to involuntary churn — customers who intended to keep paying but whose charges failed for technical reasons.

For a SaaS company doing $500K in annual recurring revenue, that represents $50K–$75K walking out the door every year. Unlike voluntary churn (where a customer actively cancels), involuntary churn is almost entirely recoverable — if you have the right retry logic in place.

The challenge is that most teams either ignore the problem (relying on their payment processor's default behavior) or build a brittle retry system that creates more issues than it solves. This guide covers the full spectrum of approaches, from simple to sophisticated.

Why payments fail

Before choosing a retry strategy, it helps to understand why charges are declined. Not all failures are equal, and your retry logic should account for this.

Expired cards: The most common cause. The customer's card expired and they haven't updated it yet. These are almost always recoverable — the customer just needs a reminder.
Insufficient funds: Temporary cash flow issue. Retrying a few days later usually succeeds, especially after a paycheck cycle.
Bank-side soft declines: The issuing bank temporarily rejects the charge due to velocity limits, suspicious activity, or system issues. These resolve on their own.
Hard declines: The card is reported stolen, the account is closed, or the bank permanently rejects the charge. Do not retry these. Repeated attempts can trigger fraud flags on your merchant account.
Fraud prevention flags: The bank's automated system flagged the charge. Usually resolves after the cardholder confirms the transaction.

Your retry logic should differentiate between soft declines (worth retrying) and hard declines (stop immediately and ask the customer to update their payment method).

Smart retry timing

The biggest mistake in payment retries is retrying too aggressively. If a charge fails at 2:00 AM, retrying at 2:05 AM will almost certainly fail again — nothing has changed. Worse, rapid retries annoy payment processors and can increase your decline rate.

A proven escalation schedule looks like this:

Attempt	Delay after failure	Rationale
1st retry	24 hours	Gives temporary issues time to resolve
2nd retry	72 hours (3 days)	Allows a paycheck cycle or card replacement
3rd retry	7 days	Final attempt before escalating to the customer

Some teams also vary timing by day of week — retrying on a Tuesday or Wednesday tends to have higher success rates than weekends, likely because banks process transactions differently during business hours.

Approaches to automating retries

1. Payment processor built-in retries

Every major payment processor includes some form of automatic retry logic. Stripe's Smart Retries uses machine learning to pick optimal retry times. Braintree and Adyen offer similar features.

Pros: Zero implementation effort. Stripe Smart Retries claims to recover up to 11% of failed charges automatically.

Cons: You have limited control over timing, retry count, or what happens between attempts. You can't easily coordinate retries with customer emails, grace periods, or plan downgrades. It's a black box.

For early-stage products, this is often good enough. Once you're losing meaningful revenue to involuntary churn, you'll want more control.

2. Dunning management platforms

Services like Churnbuster, Baremetrics Recover, and Gravy specialize in payment recovery. They integrate with your payment processor, manage retry schedules, send branded email sequences, and provide recovery analytics.

Pros: Purpose-built for the problem. Good platforms recover 30–50% of failed charges. They handle edge cases you wouldn't think of.

Cons: Expensive. Most charge a percentage of recovered revenue (typically 5–15%), which adds up quickly. You're also adding another third-party dependency to your billing stack.

3. Custom retry logic in your backend

Build a state machine that tracks each invoice's retry status and triggers retries on a schedule. This gives you full control but requires careful implementation.

# Python example — state-machine approach
class InvoiceRetryStateMachine:
    RETRY_DELAYS = [
        timedelta(hours=24),
        timedelta(hours=72),
        timedelta(days=7),
    ]

    def handle_failed_payment(self, invoice):
        if invoice.decline_type == 'hard':
            self.notify_customer_update_card(invoice)
            return

        attempt = invoice.retry_count
        if attempt >= len(self.RETRY_DELAYS):
            self.cancel_subscription(invoice)
            return

        invoice.next_retry_at = (
            datetime.utcnow() + self.RETRY_DELAYS[attempt]
        )
        invoice.retry_count += 1
        invoice.save()

    def process_due_retries(self):
        """Called by a cron job every hour."""
        invoices = Invoice.objects.filter(
            status='failed',
            next_retry_at__lte=datetime.utcnow()
        )
        for invoice in invoices:
            result = self.charge(invoice)
            if result.success:
                invoice.status = 'paid'
                invoice.save()
            else:
                self.handle_failed_payment(invoice)

Pros: Full control over every aspect of the retry flow. No per-recovery fees. You own the code and can adapt it to your billing model.

Cons: You're building and maintaining a state machine. Edge cases multiply: what if the customer updates their card mid-retry? What if two retries overlap? What if your cron job goes down for a day? You need to handle all of this.

4. Cron job polling for failed charges

A simpler version of option 3: run a cron job every hour that queries for failed invoices due for retry and processes them in batch.

Pros: Simple to implement and reason about. Works well at low to moderate scale.

Cons: Every invoice shares the same polling interval. You can't retry Invoice A at exactly 24 hours and Invoice B at exactly 72 hours — both wait for the next cron tick. At scale, a single cron job processing thousands of retries creates spikes in load and payment API usage.

5. Delayed jobs with a task queue

Use a background job system like Celery (Python), Sidekiq (Ruby), or Laravel Queues (PHP) to schedule each retry as a delayed job at the exact time it should execute.

# Celery example — scheduling the next retry as a delayed task
from celery import shared_task

RETRY_DELAYS = [24 * 3600, 72 * 3600, 7 * 86400]

@shared_task(bind=True)
def retry_failed_payment(self, invoice_id, attempt=0):
    invoice = Invoice.objects.get(id=invoice_id)

    if invoice.status == 'paid':
        return  # Customer paid in the meantime

    result = charge_invoice(invoice)
    if result.success:
        invoice.status = 'paid'
        invoice.save()
        send_receipt(invoice)
        return

    if result.decline_type == 'hard' or attempt >= len(RETRY_DELAYS) - 1:
        cancel_subscription(invoice)
        send_cancellation_email(invoice)
        return

    # Schedule next retry
    retry_failed_payment.apply_async(
        args=[invoice_id, attempt + 1],
        countdown=RETRY_DELAYS[attempt + 1]
    )

Pros: Each invoice gets its own precisely-timed retry. No polling overhead. Your existing queue infrastructure handles the scheduling.

Cons: Requires running and maintaining queue infrastructure (Redis, RabbitMQ, etc.). Delayed jobs that are days in the future sit in memory. If the queue restarts, you need durable persistence. Monitoring delayed jobs is harder than monitoring a database column.

6. HTTP job schedulers

An HTTP job scheduler like Recuro takes a different approach: instead of managing retry state in your application, you make an API call to schedule an HTTP request at a future time. The scheduler handles the timing and delivers a webhook to your endpoint when it fires.

# When a payment fails, schedule the first retry
curl -X POST https://api.recurohq.com/api/jobs \
  -H "Authorization: Bearer {token}" \
  -H "Content-Type: application/json" \
  -d '{
    "queue": "payment-retries",
    "url": "https://your-app.com/webhooks/retry-payment",
    "method": "POST",
    "headers": {
      "X-Invoice-Id": "inv_abc123",
      "X-Attempt": "1"
    },
    "delay": 86400
  }'

Your webhook endpoint receives the request 24 hours later, attempts the charge, and — if it fails again — schedules the next retry with a longer delay. Each retry is an independent scheduled job with its own timing.

Pros: No queue infrastructure to manage. Each retry has precise timing. Your app stays stateless — the scheduler owns the "when," your app owns the "what." Works across any language or framework.

Cons: Adds an external dependency. Requires your retry endpoint to be idempotent (which it should be anyway). You need to handle the case where the customer pays between scheduling and execution.

The full escalation pattern

Regardless of which retry mechanism you choose, the overall dunning flow should look like this:

Charge fails → Log the failure reason. If it's a hard decline, skip to step 5.
Retry 1 (24h) → Attempt the charge again silently. Most temporary issues resolve within a day.
Retry 2 (72h) + email → If the second attempt also fails, email the customer. Tell them their payment failed and link to a card update page. Keep the tone helpful, not threatening.
Retry 3 (7d) + grace period → Final charge attempt. If it fails, enter a grace period (typically 7–14 days). The customer keeps access but sees in-app banners urging them to update their payment method.
Cancellation → If the grace period expires without payment, downgrade or cancel the subscription. Send a final email with a reactivation link.

Best practices

Always notify the customer. Don't retry silently forever. After the first failed retry, send a clear, friendly email explaining what happened and what they need to do.
Provide a self-service card update page. Make it trivially easy for customers to update their payment method. Include a direct link in every dunning email. Stripe's customer portal or a custom page both work well.
Never retry hard declines. If the payment processor returns a hard decline code (card stolen, account closed, etc.), retrying will not help and may harm your merchant reputation. Go straight to customer notification.
Make retry endpoints idempotent. If a retry fires twice (network glitch, duplicate webhook, etc.), it should not charge the customer twice. Check the invoice status before attempting the charge.
Track and measure recovery rates. Log each retry attempt and its outcome. Calculate your recovery rate by cohort. This tells you whether your retry timing is working and where customers are dropping off.
Respect decline codes. Payment processors return specific decline codes. Map these to categories (retriable vs. non-retriable) and adjust your logic accordingly. Stripe's decline_code field is a good starting point.
Consider the customer's timezone. Retrying a charge at 3:00 AM in the customer's timezone is less likely to succeed (and more likely to trigger fraud alerts) than retrying during business hours.

Choosing the right approach

The right solution depends on your scale and engineering resources:

Early stage (<$100K ARR): Use your payment processor's built-in retries. Add a manual email follow-up for high-value customers. Don't over-engineer this yet.
Growth stage ($100K–$1M ARR): Implement a lightweight retry system using delayed jobs, an HTTP scheduler, or a cron-based poller. Add automated dunning emails. The ROI is clear at this scale.
Scale ($1M+ ARR): Consider a dedicated dunning platform, or invest in a robust custom system with A/B tested email sequences, ML-optimized retry timing, and detailed recovery analytics.

The worst option is doing nothing. Even a basic retry schedule with customer notification will recover 20–40% of failed charges. At any meaningful revenue level, that pays for itself many times over.