ADR-018: RAG Context Management — Scaling, Compression & Landlord Configurability
Status: Proposed Owner: @bilal @deen Date: 2026-02-15 Extends: ADR-017 RAG Pipeline v2 Phase 3
Context
The Scaling Problem
Current RAG pipeline injects top-5 chunks directly into the system prompt. Works at small scale (2-3 docs/property) but breaks as volume grows:
| Property Profile | Documents | Estimated Chunks |
|---|---|---|
| Single-let (now) | 2-3 | 8-12 |
| HMO (near-term) | 5-10 | 25-50 |
| Managed portfolio (target) | 10-20 | 50-100 |
| Enterprise (future) | 20-50 | 100-250 |
Problems at scale: top-5 selection degrades, context pollution, no source diversity, cost scales linearly.
The Configurability Gap
No mechanism for landlords to control what the AI knows, what it tells tenants, AI behaviour per property, or response boundaries. The visibleToAI flag is binary — no middle ground.
Decision
1. Retrieve Many, Inject Few
Replace direct top-K injection with retrieve-rerank-compress:
Query → Retrieve top-20 (vector) → Rerank to top-5 (Haiku LLM) → Compress → Inject
- Reranker: Claude Haiku scores each chunk 0-10 for relevance, take top-5
- Compression: Strip irrelevant sentences when context exceeds 800 words (40-60% token reduction)
- Source diversity: Ensure top-5 includes chunks from multiple documents
2. Metadata-Enriched Chunks
Add document_title and section_heading to chunks at ingestion. Format with source attribution:
From "Gas Safety Certificate" — Boiler Specifications:
The property has a Vaillant ecoTEC Plus 832 combi boiler...
3. Landlord AI Configuration
Organisation-level settings:
tone: professional | friendly | formaldeflectTopics: topics AI redirects to landlord (rent, deposit, eviction, legal)diyInstructionPolicy: allow | basic_only | neveremergencyKeywords: override defaults- Custom AI greeting
Injected into system prompt dynamically.
4. Document Visibility Modes
Replace binary visibleToAI with three states:
| Mode | AI Reads | AI Cites to Tenant |
|---|---|---|
hidden | No | No |
context_only | Yes (for triage) | No |
full | Yes | Yes |
System prompt separates internal context from shareable info:
INTERNAL CONTEXT (do NOT share):
- Landlord's preferred plumber: ABC Plumbing, 07700 900001
PROPERTY INFORMATION (you may reference):
- House rules, boiler manual, fire safety...
5. Context Budget
Per-channel token limits prevent runaway prompt sizes:
| Channel | Budget |
|---|---|
| 500 tokens | |
| SMS | 300 tokens |
| Voice | 400 tokens |
| Chat | 1,000 tokens |
| 1,500 tokens |
Pipeline respects budget: retrieve → rerank → compress → drop lowest if still over.
Implementation Phases
A. Reranking + Compression — Increase K to 20, add Haiku reranker, compression, source diversity, context budget B. Metadata Enrichment — document_title + section_heading on chunks C. Organisation AI Settings — ai_settings JSONB on organisations, settings UI D. Document Visibility Modes — Migrate boolean → enum, update UI E. Per-Property Overrides (Future) — ai_settings_override JSONB on properties
Consequences
Positive
- Scales to 50+ docs/property without quality degradation
- Landlords get meaningful AI control without technical knowledge
- 40-60% token cost reduction via compression
context_onlymode enables better triage without leaking info
Negative
- 1-2 additional Haiku calls per message (~$0.002, +200-300ms)
- Migration complexity (boolean → enum)
- Prompt engineering becomes more complex