ADR-018: RAG Context Management — Scaling, Compression & Landlord Configurability

Status: Proposed Owner: @bilal @deen Date: 2026-02-15 Extends: ADR-017 RAG Pipeline v2 Phase 3

Context

The Scaling Problem

Current RAG pipeline injects top-5 chunks directly into the system prompt. Works at small scale (2-3 docs/property) but breaks as volume grows:

Property Profile	Documents	Estimated Chunks
Single-let (now)	2-3	8-12
HMO (near-term)	5-10	25-50
Managed portfolio (target)	10-20	50-100
Enterprise (future)	20-50	100-250

Problems at scale: top-5 selection degrades, context pollution, no source diversity, cost scales linearly.

The Configurability Gap

No mechanism for landlords to control what the AI knows, what it tells tenants, AI behaviour per property, or response boundaries. The visibleToAI flag is binary — no middle ground.

Decision

1. Retrieve Many, Inject Few

Replace direct top-K injection with retrieve-rerank-compress:

Query → Retrieve top-20 (vector) → Rerank to top-5 (Haiku LLM) → Compress → Inject

Reranker: Claude Haiku scores each chunk 0-10 for relevance, take top-5
Compression: Strip irrelevant sentences when context exceeds 800 words (40-60% token reduction)
Source diversity: Ensure top-5 includes chunks from multiple documents

2. Metadata-Enriched Chunks

Add document_title and section_heading to chunks at ingestion. Format with source attribution:

From "Gas Safety Certificate" — Boiler Specifications:
  The property has a Vaillant ecoTEC Plus 832 combi boiler...

3. Landlord AI Configuration

Organisation-level settings:

tone: professional | friendly | formal
deflectTopics: topics AI redirects to landlord (rent, deposit, eviction, legal)
diyInstructionPolicy: allow | basic_only | never
emergencyKeywords: override defaults
Custom AI greeting

Injected into system prompt dynamically.

4. Document Visibility Modes

Replace binary visibleToAI with three states:

Mode	AI Reads	AI Cites to Tenant
`hidden`	No	No
`context_only`	Yes (for triage)	No
`full`	Yes	Yes

System prompt separates internal context from shareable info:

INTERNAL CONTEXT (do NOT share):
- Landlord's preferred plumber: ABC Plumbing, 07700 900001

PROPERTY INFORMATION (you may reference):
- House rules, boiler manual, fire safety...

5. Context Budget

Per-channel token limits prevent runaway prompt sizes:

Channel	Budget
WhatsApp	500 tokens
SMS	300 tokens
Voice	400 tokens
Chat	1,000 tokens
Email	1,500 tokens

Pipeline respects budget: retrieve → rerank → compress → drop lowest if still over.

Implementation Phases

A. Reranking + Compression — Increase K to 20, add Haiku reranker, compression, source diversity, context budget B. Metadata Enrichment — document_title + section_heading on chunks C. Organisation AI Settings — ai_settings JSONB on organisations, settings UI D. Document Visibility Modes — Migrate boolean → enum, update UI E. Per-Property Overrides (Future) — ai_settings_override JSONB on properties

EHQ Brain

Explorer

ADR-018 RAG Context Management

ADR-018: RAG Context Management — Scaling, Compression & Landlord Configurability

Context

The Scaling Problem

The Configurability Gap

Decision

1. Retrieve Many, Inject Few

2. Metadata-Enriched Chunks

3. Landlord AI Configuration

4. Document Visibility Modes

5. Context Budget

Implementation Phases

Consequences

Positive

Negative

Graph View

Table of Contents

Backlinks

EHQ Brain

Explorer

ADR-018 RAG Context Management

ADR-018: RAG Context Management — Scaling, Compression & Landlord Configurability

Context

The Scaling Problem

The Configurability Gap

Decision

1. Retrieve Many, Inject Few

2. Metadata-Enriched Chunks

3. Landlord AI Configuration

4. Document Visibility Modes

5. Context Budget

Implementation Phases

Consequences

Positive

Negative

Related

Graph View

Table of Contents

Backlinks