RAG Pipeline

Status: Implemented Owner: @bilal Last Updated: 2026-02-15

Design of the AI-powered tenant support system (RAG, Chat & Voice).

Purpose

Enable tenants to ask natural language questions about their property through chat and voice channels. The system retrieves relevant documents via semantic search and generates contextual answers.

Architecture

Tenant Channels (WhatsApp, Voice, Chat)
         |
   Security Layer (rate limiting, webhook verification, sanitisation)
         |
   Tenant Identification (phone/email -> tenant -> property -> org)
         |
   RAG Pipeline
   ├── Emergency Check (UK keyword matching -> hardcoded response)
   ├── Intent Classification (greeting / question / issue)
   ├── Retrieval (pgvector semantic search, org/property filtered)
   └── Generation (Claude -> OpenAI -> Kimi -> GLM fallback)
         |
   Data Layer (PostgreSQL + pgvector, 1536-dim HNSW index)

Document Ingestion

Upload Extract Text Chunk (500 words, 50 overlap) Embed (OpenAI) Store (pgvector)

What Gets Embedded

SourceExamplesPriority
Property documentsHouse rules, appliance manualsHigh
Lease documentsTenancy agreementsHigh
Organisation knowledgeFAQs, policiesMedium

Query Pipeline

  1. Emergency check — UK-specific keyword matching (gas, fire, flood, etc.)
  2. Intent classification — greeting / question / issue
  3. Retrieval — Vector search filtered by org_id, property_id, visibleToAI
  4. Generation — Multi-LLM with channel-aware truncation

Response Limits

ChannelMax Chars
Voice300
SMS160
Chat800
WhatsApp1000
Email2000

Security

LayerProtection
Rate limiting30 req/min per phone, 100/min per IP
Webhook verificationHMAC-SHA256
Input sanitisationPrompt injection detection
DeduplicationexternalMessageId check
Access scopingTenants only see their property + org-wide docs

Conversation Orchestration

The LLM uses tool-use to manage conversation flow:

  • ask_for_details — Gather more info about an issue
  • ask_for_photo — Request photo evidence
  • create_issue — Create an issue (gated behind identity confirmation)
  • respond — Answer a question using RAG context
  • escalate — Hand off to human

Identity Flow

Conversations go through identity states before issue creation: UNIDENTIFIED -> IDENTIFIED -> CONFIRMED -> ACTIVE

Channel-specific: Chat (OTP pre-authenticated) auto-confirms. WhatsApp/Voice require identity challenge.

Cost Estimates

ComponentEstimated Cost
Embeddings (one-time)~$0.10 per 1000 chunks
Embeddings (queries)~$0.0001 per query
Claude generation~$0.01-0.03 per query

See also: Conversation Orchestration, System Design, Tech Stack