E-010: Retrieval Improvements

Status: Planned Owner: @bilal Priority: P5 — after docs are flowing

Objective

Improve document retrieval quality using hybrid search (vector + keyword) and better chunking.

Tasks

TaskIDDescriptionStatus
pg_trgm extensionRET-001Add keyword search index on document_chunksPlanned
Hybrid search (RRF)RET-002Reciprocal Rank Fusion combining vector + keyword resultsPlanned
Chunk metadataRET-003Enrich with document title, section heading, chunk indexPlanned
Improved chunkingRET-004Semantic paragraph-level, 300-500 tokens, sentence-boundary overlapPlanned

Approach

See ADR-017 RAG Pipeline v2 and ADR-018 RAG Context Management for the full design.

  1. Vector search — pgvector cosine similarity (current)
  2. Keyword search — pg_trgm trigram matching (new)
  3. Fusion — RRF to combine rankings
  4. Reranking — LLM-based reranker for top-K results (optional, cost-dependent)

Dependencies

  • Document AI pipeline (E-005 Document AI & RAG Context) — documents must be flowing into chunks first
  • Enough document volume to meaningfully test retrieval quality