RAG Pipeline

Status: Implemented Owner: @bilal Last Updated: 2026-02-15

Design of the AI-powered tenant support system (RAG, Chat & Voice).

Purpose

Enable tenants to ask natural language questions about their property through chat and voice channels. The system retrieves relevant documents via semantic search and generates contextual answers.

Architecture

Tenant Channels (WhatsApp, Voice, Chat)
         |
   Security Layer (rate limiting, webhook verification, sanitisation)
         |
   Tenant Identification (phone/email -> tenant -> property -> org)
         |
   RAG Pipeline
   ├── Emergency Check (UK keyword matching -> hardcoded response)
   ├── Intent Classification (greeting / question / issue)
   ├── Retrieval (pgvector semantic search, org/property filtered)
   └── Generation (Claude -> OpenAI -> Kimi -> GLM fallback)
         |
   Data Layer (PostgreSQL + pgvector, 1536-dim HNSW index)

Document Ingestion

Upload Extract Text Chunk (500 words, 50 overlap) Embed (OpenAI) Store (pgvector)

What Gets Embedded

SourceExamplesPriority
Property documentsHouse rules, appliance manualsHigh
Lease documentsTenancy agreementsHigh
Organisation knowledgeFAQs, policiesMedium

Query Pipeline

  1. Emergency check — UK-specific keyword matching (gas, fire, flood, etc.)
  2. Intent classification — Quick keyword classify first; LLM fallback only when the message might be a greeting (the only intent that short-circuits the pipeline). Issue and question intents skip the LLM call — the orchestrator handles them via tool selection.
  3. Retrieval — Vector search filtered by org_id, property_id, visibleToAI. Conditional: skipped on first message (IDLE state, ≤2 messages) where the LLM only needs to ask follow-up questions. Fetched on later turns and when answering questions.
  4. Generation — Model routing: Haiku for gathering turns (GATHERING_DETAILS, AWAITING_PHOTO), Sonnet for first turn and issue creation. Multi-LLM fallback chain for Q&A generation.

Response Limits

ChannelMax Chars
Voice300
SMS160
Chat800
WhatsApp1000
Email2000

Security

LayerProtection
Rate limiting30 req/min per phone, 100/min per IP
Webhook verificationHMAC-SHA256
Input sanitisationPrompt injection detection
DeduplicationexternalMessageId check
Access scopingTenants only see their property + org-wide docs

Conversation Orchestration

The LLM uses tool-use to manage conversation flow:

  • ask_for_details — Gather more info about an issue
  • ask_for_photo — Request photo evidence
  • create_issue — Create an issue (gated behind identity confirmation)
  • respond — Answer a question using RAG context
  • escalate — Hand off to human

Identity Flow

Conversations go through identity states before issue creation: UNIDENTIFIED -> IDENTIFIED -> CONFIRMED -> ACTIVE

Channel-specific: Chat (OTP pre-authenticated) auto-confirms. WhatsApp/Voice require identity challenge.

Cost Estimates

ComponentEstimated Cost
Embeddings (one-time)~$0.10 per 1000 chunks
Embeddings (queries)~$0.0001 per query
Claude generation~$0.01-0.03 per query

See also: Conversation Orchestration, System Design, Tech Stack