RAG Pipeline
Status: Implemented Owner: @bilal Last Updated: 2026-02-15
Design of the AI-powered tenant support system (RAG, Chat & Voice).
Purpose
Enable tenants to ask natural language questions about their property through chat and voice channels. The system retrieves relevant documents via semantic search and generates contextual answers.
Architecture
Tenant Channels (WhatsApp, Voice, Chat)
|
Security Layer (rate limiting, webhook verification, sanitisation)
|
Tenant Identification (phone/email -> tenant -> property -> org)
|
RAG Pipeline
├── Emergency Check (UK keyword matching -> hardcoded response)
├── Intent Classification (greeting / question / issue)
├── Retrieval (pgvector semantic search, org/property filtered)
└── Generation (Claude -> OpenAI -> Kimi -> GLM fallback)
|
Data Layer (PostgreSQL + pgvector, 1536-dim HNSW index)
Document Ingestion
Upload → Extract Text → Chunk (500 words, 50 overlap) → Embed (OpenAI) → Store (pgvector)
What Gets Embedded
| Source | Examples | Priority |
|---|---|---|
| Property documents | House rules, appliance manuals | High |
| Lease documents | Tenancy agreements | High |
| Organisation knowledge | FAQs, policies | Medium |
Query Pipeline
- Emergency check — UK-specific keyword matching (gas, fire, flood, etc.)
- Intent classification — greeting / question / issue
- Retrieval — Vector search filtered by
org_id,property_id,visibleToAI - Generation — Multi-LLM with channel-aware truncation
Response Limits
| Channel | Max Chars |
|---|---|
| Voice | 300 |
| SMS | 160 |
| Chat | 800 |
| 1000 | |
| 2000 |
Security
| Layer | Protection |
|---|---|
| Rate limiting | 30 req/min per phone, 100/min per IP |
| Webhook verification | HMAC-SHA256 |
| Input sanitisation | Prompt injection detection |
| Deduplication | externalMessageId check |
| Access scoping | Tenants only see their property + org-wide docs |
Conversation Orchestration
The LLM uses tool-use to manage conversation flow:
ask_for_details— Gather more info about an issueask_for_photo— Request photo evidencecreate_issue— Create an issue (gated behind identity confirmation)respond— Answer a question using RAG contextescalate— Hand off to human
Identity Flow
Conversations go through identity states before issue creation:
UNIDENTIFIED -> IDENTIFIED -> CONFIRMED -> ACTIVE
Channel-specific: Chat (OTP pre-authenticated) auto-confirms. WhatsApp/Voice require identity challenge.
Cost Estimates
| Component | Estimated Cost |
|---|---|
| Embeddings (one-time) | ~$0.10 per 1000 chunks |
| Embeddings (queries) | ~$0.0001 per query |
| Claude generation | ~$0.01-0.03 per query |
See also: Conversation Orchestration, System Design, Tech Stack