ADR-020: Background Jobs & Scheduled Tasks

Status: Planning Owner: @bilal @deen Date: 2026-02-15

Why This Needs an ADR

Multiple accepted ADRs assume scheduled jobs exist but none defines the infrastructure:

ADRJob NeededFrequency
ADR-011 Regulatory ComplianceData retention cleanup (purge expired messages, media)Daily
ADR-011 Regulatory ComplianceDocument expiry alerts (30-day warning)Daily
ADR-013 Event-Driven ArchitectureRetry failed event publishesEvery minute
ADR-010 Tenant EngineAI summary generation for stale conversationsEvery 4 hours
ADR-010 Tenant EngineConversation archival (inactive 24h)Daily
FutureSLA breach detection (if not using Temporal)Every 15 min

The CI/CD doc shows Vercel Cron config but this hasn’t been formally decided.

Options

Option A: Vercel Cron Jobs

// vercel.json
{
  "crons": [
    { "path": "/api/cron/cleanup", "schedule": "0 2 * * *" },
    { "path": "/api/cron/summaries", "schedule": "0 */4 * * *" },
    { "path": "/api/cron/retry-events", "schedule": "* * * * *" },
    { "path": "/api/cron/doc-expiry", "schedule": "0 9 * * *" }
  ]
}
ProsCons
Zero infrastructure — runs on existing Vercel deployment10s execution limit on Hobby, 60s on Pro
Cron syntax, reliable schedulingNo retry on failure (must handle in code)
Already in the stackCan’t run sub-minute intervals on Hobby
Serverless — no idle costCold starts add latency

Option B: Supabase pg_cron + Edge Functions

-- pg_cron for database-level jobs
SELECT cron.schedule('cleanup', '0 2 * * *', $$
  DELETE FROM messages WHERE created_at < NOW() - INTERVAL '90 days';
$$);

Edge Functions for application-level jobs (retry events, summaries).

ProsCons
Database jobs run at DB level (fast, no HTTP overhead)Two systems (pg_cron + Edge Functions)
Edge Functions have 150s timeoutLess visibility than Vercel dashboard
No cold start for pg_cronpg_cron requires Supabase Pro plan
  • pg_cron for pure SQL jobs: retention cleanup, partition management
  • Vercel Cron for application jobs: event retry, summary generation, expiry alerts
  • Both hit the same database

This avoids adding infrastructure while using each tool where it’s strongest.

Key Design Questions

1. Failure Handling

What happens when a cron job fails?

  • Retry automatically? (Vercel doesn’t retry)
  • Log and alert? (Need ADR-021 Observability)
  • Idempotency — can the job safely run twice?

2. Job Locking

If a job takes longer than the interval:

  • Vercel Cron can trigger overlapping invocations
  • Need advisory lock or “last run” check
// Simple lock pattern
const lock = await prisma.cronLock.findFirst({ where: { job: 'cleanup' } })
if (lock && lock.startedAt > new Date(Date.now() - 30 * 60 * 1000)) {
  return // Still running
}

3. Monitoring

How do we know jobs are running?

  • Log completion to a cron_runs table?
  • Alert if a job hasn’t run in expected window?
  • Dashboard view of job history?

4. The Retry Publisher Problem

ADR-013’s event retry needs to run every minute. Vercel Hobby plan only supports hourly minimum. Options:

  • Accept hourly retry (events delayed up to 1 hour on failure)
  • Use pg_cron for minute-level retry
  • Make the async publish more reliable (reduce need for retry)

Minimum Viable Approach

For pre-deployment / MVP:

  1. Vercel Cron for daily/hourly jobs
  2. In-app error handling with console.error (no retry infrastructure)
  3. Manual monitoring via Vercel logs

Add pg_cron and proper monitoring as the product scales.