E-010: Retrieval Improvements
Status: Planned Owner: @bilal Priority: P5 — after docs are flowing
Objective
Improve document retrieval quality using hybrid search (vector + keyword) and better chunking.
Tasks
| Task | ID | Description | Status |
|---|---|---|---|
| pg_trgm extension | RET-001 | Add keyword search index on document_chunks | Planned |
| Hybrid search (RRF) | RET-002 | Reciprocal Rank Fusion combining vector + keyword results | Planned |
| Chunk metadata | RET-003 | Enrich with document title, section heading, chunk index | Planned |
| Improved chunking | RET-004 | Semantic paragraph-level, 300-500 tokens, sentence-boundary overlap | Planned |
Approach
See ADR-017 RAG Pipeline v2 and ADR-018 RAG Context Management for the full design.
- Vector search — pgvector cosine similarity (current)
- Keyword search — pg_trgm trigram matching (new)
- Fusion — RRF to combine rankings
- Reranking — LLM-based reranker for top-K results (optional, cost-dependent)
Dependencies
- Document AI pipeline (E-005 Document AI & RAG Context) — documents must be flowing into chunks first
- Enough document volume to meaningfully test retrieval quality