E-005: Document AI & RAG Context

Status: In Progress Owner: @bilal Priority: P1 — Ship to Production

Objective

Enable the AI to reference property-specific documents (house rules, boiler manuals, gas safety certs) when responding to tenants. Complete the document AI pipeline from upload to RAG retrieval.

What’s Done

  • visibleToAI toggle exposed in GraphQL + document modal UI
  • AI indicator badge on document cards
  • Seed script with house rules + boiler manual (2 docs, 3 chunks, stub embeddings)
  • pnpm db:seed:documents command
  • Property context included in system prompt (address, type, units, room)

What’s Next

TaskIDDescriptionStatus
Text extraction pipelineDOC-AI-003On upload with visible_to_ai = true, extract text (PDF/image OCR), chunk, embed into document_chunksNext
Test with real docsCTX-002Upload real property documents, verify RAG retrieval qualityBlocked by DOC-AI-003
Document suggestionsCTX-003AI recommends what landlord should upload based on property typePlanned
AI configurabilityCTX-004Per-property/org AI behaviour settings (tone, topics, DIY policy)Planned

Key Decisions

  • Extraction: PDF text extraction + image OCR (for scanned documents)
  • Chunking: 300-500 tokens, semantic paragraph-level, sentence-boundary overlap (see ADR-018 RAG Context Management)
  • Embedding: OpenAI text-embedding-3-small (1536 dimensions)
  • Storage: document_chunks table with pgvector index
  • What AI knows (for triage) vs what it tells the tenant — configurable per org

Dependencies

  • Supabase Storage (done — documents uploadable)
  • pgvector extension (done — enabled locally)