E-005: Document AI & RAG Context
Status: In Progress Owner: @bilal Priority: P1 — Ship to Production
Objective
Enable the AI to reference property-specific documents (house rules, boiler manuals, gas safety certs) when responding to tenants. Complete the document AI pipeline from upload to RAG retrieval.
What’s Done
-
visibleToAItoggle exposed in GraphQL + document modal UI - AI indicator badge on document cards
- Seed script with house rules + boiler manual (2 docs, 3 chunks, stub embeddings)
-
pnpm db:seed:documentscommand - Property context included in system prompt (address, type, units, room)
What’s Next
| Task | ID | Description | Status |
|---|---|---|---|
| Text extraction pipeline | DOC-AI-003 | On upload with visible_to_ai = true, extract text (PDF/image OCR), chunk, embed into document_chunks | Next |
| Test with real docs | CTX-002 | Upload real property documents, verify RAG retrieval quality | Blocked by DOC-AI-003 |
| Document suggestions | CTX-003 | AI recommends what landlord should upload based on property type | Planned |
| AI configurability | CTX-004 | Per-property/org AI behaviour settings (tone, topics, DIY policy) | Planned |
Key Decisions
- Extraction: PDF text extraction + image OCR (for scanned documents)
- Chunking: 300-500 tokens, semantic paragraph-level, sentence-boundary overlap (see ADR-018 RAG Context Management)
- Embedding: OpenAI
text-embedding-3-small(1536 dimensions) - Storage:
document_chunkstable withpgvectorindex - What AI knows (for triage) vs what it tells the tenant — configurable per org
Dependencies
- Supabase Storage (done — documents uploadable)
- pgvector extension (done — enabled locally)
Related
- ADR-017 RAG Pipeline v2
- ADR-018 RAG Context Management
- E-010 Retrieval Improvements (hybrid search, depends on this epic)