RAG Pipeline

Status: Implemented Owner: @bilal Last Updated: 2026-02-15

Design of the AI-powered tenant support system (RAG, Chat & Voice).

Purpose

Enable tenants to ask natural language questions about their property through chat and voice channels. The system retrieves relevant documents via semantic search and generates contextual answers.

Architecture

Tenant Channels (WhatsApp, Voice, Chat)
         |
   Security Layer (rate limiting, webhook verification, sanitisation)
         |
   Tenant Identification (phone/email -> tenant -> property -> org)
         |
   RAG Pipeline
   ├── Emergency Check (UK keyword matching -> hardcoded response)
   ├── Intent Classification (greeting / question / issue)
   ├── Retrieval (pgvector semantic search, org/property filtered)
   └── Generation (Claude -> OpenAI -> Kimi -> GLM fallback)
         |
   Data Layer (PostgreSQL + pgvector, 1536-dim HNSW index)

Document Ingestion

Upload → Extract Text → Chunk (500 words, 50 overlap) → Embed (OpenAI) → Store (pgvector)

What Gets Embedded

Source	Examples	Priority
Property documents	House rules, appliance manuals	High
Lease documents	Tenancy agreements	High
Organisation knowledge	FAQs, policies	Medium

Query Pipeline

Emergency check — UK-specific keyword matching (gas, fire, flood, etc.)
Intent classification — greeting / question / issue
Retrieval — Vector search filtered by org_id, property_id, visibleToAI
Generation — Multi-LLM with channel-aware truncation

Response Limits

Channel	Max Chars
Voice	300
SMS	160
Chat	800
WhatsApp	1000
Email	2000

Security

Layer	Protection
Rate limiting	30 req/min per phone, 100/min per IP
Webhook verification	HMAC-SHA256
Input sanitisation	Prompt injection detection
Deduplication	externalMessageId check
Access scoping	Tenants only see their property + org-wide docs

Conversation Orchestration

The LLM uses tool-use to manage conversation flow:

ask_for_details — Gather more info about an issue
ask_for_photo — Request photo evidence
create_issue — Create an issue (gated behind identity confirmation)
respond — Answer a question using RAG context
escalate — Hand off to human

Identity Flow

Conversations go through identity states before issue creation: UNIDENTIFIED -> IDENTIFIED -> CONFIRMED -> ACTIVE

Channel-specific: Chat (OTP pre-authenticated) auto-confirms. WhatsApp/Voice require identity challenge.

Cost Estimates

Component	Estimated Cost
Embeddings (one-time)	~$0.10 per 1000 chunks
Embeddings (queries)	~$0.0001 per query
Claude generation	~$0.01-0.03 per query

EHQ Brain

Explorer

RAG Pipeline

RAG Pipeline

Purpose

Architecture

Document Ingestion

What Gets Embedded

Query Pipeline

Response Limits

Security

Conversation Orchestration

Identity Flow

Cost Estimates

Graph View

Table of Contents

Backlinks