What hybrid retrieval is
Hybrid retrieval fans out a single query across three signals in parallel and fuses their ranks:
| Signal | Backend | Best for |
|---|---|---|
| Lexical | SQLite FTS5 + BM25 | Exact API names, flags, config keys, error strings, code identifiers |
| Dense | Qdrant + FastEmbed bge-base-en-v1.5 | Semantic neighbours ("login" matches "sign in") |
| Sparse | Qdrant + SPLADE | Splits the difference: term-aware but expansion-friendly |
The retrieval dispatcher runs all three in a thread pool, fuses ranks with Reciprocal Rank Fusion, and returns the top-K sections.
Running a hybrid query
docmancer query "How do I authenticate?" --mode hybrid
--mode defaults to lexical for bare configs and auto-flips to hybrid when your YAML includes a non-empty vector_store: block.
You can also force a single signal:
docmancer query "OAuth scopes" --mode dense
docmancer query "OAuth scopes" --mode sparse
docmancer query "OAuth scopes" --mode lexical
Reciprocal Rank Fusion
RRF combines per-signal rankings without needing comparable scores. The formula per result r across signals s:
RRF(r) = sum( 1 / (k + rank_s(r)) ) for each signal s where r appears
k defaults to 60 (see retrieval.fusion.rrf_k). The default fusion method is plain rrf; switch to weighted_rrf in YAML to bias toward one signal.
Explain mode
docmancer query "auth setup" --mode hybrid --explain
Each result is annotated with the per-source rank that contributed:
[lexical#1, dense#3, sparse#2] Authentication > OAuth 2.0 > Scopes
Useful when:
- A result you expected ranks lower than something else —
--explainshows which signal placed each hit. - You're tuning a router rule and want to confirm the dispatcher (not the lexical fast-path) is being used.
- One backend is misbehaving (e.g. Qdrant unavailable) and dense / sparse columns silently return empty.
When to skip vectors
Pure lexical queries bypass the dispatcher and read SQLite directly:
docmancer query "exact error message" --mode lexical
For ingests, skip the vector pass entirely:
docmancer ingest ./docs --no-vectors
DOCMANCER_AUTO_VECTORS=0 docmancer ingest ./docs
You still get a fully functional FTS5 index. Re-run ingest later without --no-vectors to backfill vectors.
Hierarchical retrieval
For large corpora, hybrid retrieval pairs with a two-stage pass:
- Stage 1. Wide-net retrieval (
candidate_pool= 200 by default) across all signals. Scores are aggregated perdocument_title_hash. Top-N documents (documents_limit= 5) survive. - Stage 2. Re-retrieve sections filtered to those documents. Up to
sections_per_document(10) per surviving document. Fuse and return.
Auto-enabled per index once the corpus has at least retrieval.hierarchical.auto_min_documents (default 10) distinct documents. Forced on with retrieval.hierarchical.enabled: true, off with auto: false.
Router rules
Query routers narrow the dispatcher call before fusion. The first regex match merges declared filters (e.g. docset_root, sdk, international_classes) into the call. See Router recipes for concrete patterns.
Routers only fire under dispatcher modes (dense, sparse, hybrid). Pure lexical queries bypass the dispatcher and ignore routers.
Performance notes
- Dense and sparse signals share the same FastEmbed model load, so a hybrid query is not 3x the cost of a lexical one.
- A content-hash-keyed embeddings cache under
~/.docmancer/embeddings-cache/skips re-embedding unchanged chunks on re-ingest. - Bulk upsert into Qdrant uses gRPC for throughput.
- Re-running
ingestwithout--recreatereuses cached embeddings for any section whose content hash matches.