ADR-020: Background Jobs & Scheduled Tasks
Status: Planning Owner: @bilal @deen Date: 2026-02-15
Why This Needs an ADR
Multiple accepted ADRs assume scheduled jobs exist but none defines the infrastructure:
| ADR | Job Needed | Frequency |
|---|---|---|
| ADR-011 Regulatory Compliance | Data retention cleanup (purge expired messages, media) | Daily |
| ADR-011 Regulatory Compliance | Document expiry alerts (30-day warning) | Daily |
| ADR-013 Event-Driven Architecture | Retry failed event publishes | Every minute |
| ADR-010 Tenant Engine | AI summary generation for stale conversations | Every 4 hours |
| ADR-010 Tenant Engine | Conversation archival (inactive 24h) | Daily |
| Future | SLA breach detection (if not using Temporal) | Every 15 min |
The CI/CD doc shows Vercel Cron config but this hasn’t been formally decided.
Options
Option A: Vercel Cron Jobs
// vercel.json
{
"crons": [
{ "path": "/api/cron/cleanup", "schedule": "0 2 * * *" },
{ "path": "/api/cron/summaries", "schedule": "0 */4 * * *" },
{ "path": "/api/cron/retry-events", "schedule": "* * * * *" },
{ "path": "/api/cron/doc-expiry", "schedule": "0 9 * * *" }
]
}| Pros | Cons |
|---|---|
| Zero infrastructure — runs on existing Vercel deployment | 10s execution limit on Hobby, 60s on Pro |
| Cron syntax, reliable scheduling | No retry on failure (must handle in code) |
| Already in the stack | Can’t run sub-minute intervals on Hobby |
| Serverless — no idle cost | Cold starts add latency |
Option B: Supabase pg_cron + Edge Functions
-- pg_cron for database-level jobs
SELECT cron.schedule('cleanup', '0 2 * * *', $$
DELETE FROM messages WHERE created_at < NOW() - INTERVAL '90 days';
$$);Edge Functions for application-level jobs (retry events, summaries).
| Pros | Cons |
|---|---|
| Database jobs run at DB level (fast, no HTTP overhead) | Two systems (pg_cron + Edge Functions) |
| Edge Functions have 150s timeout | Less visibility than Vercel dashboard |
| No cold start for pg_cron | pg_cron requires Supabase Pro plan |
Option C: Hybrid (Recommended thinking)
- pg_cron for pure SQL jobs: retention cleanup, partition management
- Vercel Cron for application jobs: event retry, summary generation, expiry alerts
- Both hit the same database
This avoids adding infrastructure while using each tool where it’s strongest.
Key Design Questions
1. Failure Handling
What happens when a cron job fails?
- Retry automatically? (Vercel doesn’t retry)
- Log and alert? (Need ADR-021 Observability)
- Idempotency — can the job safely run twice?
2. Job Locking
If a job takes longer than the interval:
- Vercel Cron can trigger overlapping invocations
- Need advisory lock or “last run” check
// Simple lock pattern
const lock = await prisma.cronLock.findFirst({ where: { job: 'cleanup' } })
if (lock && lock.startedAt > new Date(Date.now() - 30 * 60 * 1000)) {
return // Still running
}3. Monitoring
How do we know jobs are running?
- Log completion to a
cron_runstable? - Alert if a job hasn’t run in expected window?
- Dashboard view of job history?
4. The Retry Publisher Problem
ADR-013’s event retry needs to run every minute. Vercel Hobby plan only supports hourly minimum. Options:
- Accept hourly retry (events delayed up to 1 hour on failure)
- Use
pg_cronfor minute-level retry - Make the async publish more reliable (reduce need for retry)
Minimum Viable Approach
For pre-deployment / MVP:
- Vercel Cron for daily/hourly jobs
- In-app error handling with console.error (no retry infrastructure)
- Manual monitoring via Vercel logs
Add pg_cron and proper monitoring as the product scales.