ADR-018: RAG Context Management — Scaling, Compression & Landlord Configurability

Status: Proposed Owner: @bilal @deen Date: 2026-02-15 Extends: ADR-017 RAG Pipeline v2 Phase 3

Context

The Scaling Problem

Current RAG pipeline injects top-5 chunks directly into the system prompt. Works at small scale (2-3 docs/property) but breaks as volume grows:

Property ProfileDocumentsEstimated Chunks
Single-let (now)2-38-12
HMO (near-term)5-1025-50
Managed portfolio (target)10-2050-100
Enterprise (future)20-50100-250

Problems at scale: top-5 selection degrades, context pollution, no source diversity, cost scales linearly.

The Configurability Gap

No mechanism for landlords to control what the AI knows, what it tells tenants, AI behaviour per property, or response boundaries. The visibleToAI flag is binary — no middle ground.

Decision

1. Retrieve Many, Inject Few

Replace direct top-K injection with retrieve-rerank-compress:

Query → Retrieve top-20 (vector) → Rerank to top-5 (Haiku LLM) → Compress → Inject
  • Reranker: Claude Haiku scores each chunk 0-10 for relevance, take top-5
  • Compression: Strip irrelevant sentences when context exceeds 800 words (40-60% token reduction)
  • Source diversity: Ensure top-5 includes chunks from multiple documents

2. Metadata-Enriched Chunks

Add document_title and section_heading to chunks at ingestion. Format with source attribution:

From "Gas Safety Certificate" — Boiler Specifications:
  The property has a Vaillant ecoTEC Plus 832 combi boiler...

3. Landlord AI Configuration

Organisation-level settings:

  • tone: professional | friendly | formal
  • deflectTopics: topics AI redirects to landlord (rent, deposit, eviction, legal)
  • diyInstructionPolicy: allow | basic_only | never
  • emergencyKeywords: override defaults
  • Custom AI greeting

Injected into system prompt dynamically.

4. Document Visibility Modes

Replace binary visibleToAI with three states:

ModeAI ReadsAI Cites to Tenant
hiddenNoNo
context_onlyYes (for triage)No
fullYesYes

System prompt separates internal context from shareable info:

INTERNAL CONTEXT (do NOT share):
- Landlord's preferred plumber: ABC Plumbing, 07700 900001

PROPERTY INFORMATION (you may reference):
- House rules, boiler manual, fire safety...

5. Context Budget

Per-channel token limits prevent runaway prompt sizes:

ChannelBudget
WhatsApp500 tokens
SMS300 tokens
Voice400 tokens
Chat1,000 tokens
Email1,500 tokens

Pipeline respects budget: retrieve → rerank → compress → drop lowest if still over.

Implementation Phases

A. Reranking + Compression — Increase K to 20, add Haiku reranker, compression, source diversity, context budget B. Metadata Enrichment — document_title + section_heading on chunks C. Organisation AI Settings — ai_settings JSONB on organisations, settings UI D. Document Visibility Modes — Migrate boolean → enum, update UI E. Per-Property Overrides (Future) — ai_settings_override JSONB on properties

Consequences

Positive

  • Scales to 50+ docs/property without quality degradation
  • Landlords get meaningful AI control without technical knowledge
  • 40-60% token cost reduction via compression
  • context_only mode enables better triage without leaking info

Negative

  • 1-2 additional Haiku calls per message (~$0.002, +200-300ms)
  • Migration complexity (boolean → enum)
  • Prompt engineering becomes more complex