retrieval  · 

Why SQLite FTS5 replaced our vector database

We moved from Qdrant embeddings to SQLite full-text search. Here is what we learned and why retrieval quality held up.

The problem with embedding-based RAG

Most RAG implementations reach for cosine similarity on dense embeddings and call it done. It works surprisingly well, until it doesn't.

The failure mode is predictable: keyword-specific queries get diluted by semantic neighbors that feel related but miss the exact term. Ask about --recreate flag behavior and you'll get chunks about "recreating environments" before you get the actual CLI docs.

We hit this repeatedly in Docmancer's early versions, which used Qdrant with FastEmbed for dense embeddings and BM25 sparse vectors. Hybrid retrieval (dense + sparse with reciprocal rank fusion) helped, but the operational cost was high: users had to download embedding models on first run, Qdrant's embedded mode added startup latency, and the dependency footprint was substantial for what should be a lightweight CLI tool.

The shift to SQLite FTS5

SQLite's FTS5 extension provides full-text search with BM25 ranking built in. It requires no model downloads, no server, and no binary wheels that break across platforms. The database is a single file on disk.

We restructured how content is stored: instead of embedding arbitrary chunks, Docmancer now extracts sections from documentation pages and indexes them as discrete units. Each section preserves its heading hierarchy and source attribution, so retrieval results carry context about where they came from.

docmancer add https://docs.pytest.org
# Fetched 847 pages, indexed 2,341 sections

Token-budgeted context packs

The biggest improvement was not in ranking but in output. Docmancer now returns context packs with a configurable token budget (default: 2400 tokens). Instead of dumping raw chunks, it selects the most relevant sections and packs them into a compact response with source attribution and estimated token savings.

docmancer query "forecast response fields"
# Returns ~280 tokens with 87% savings vs. raw page content

The --expand flag includes adjacent sections when you need more context. The --expand page flag returns full page content within the token budget.

What held up, what didn't

In our testing against pytest, GitHub, and Next.js docs:

The tradeoff we accepted: FTS5 does not understand synonyms or paraphrases the way embeddings do. A query for "login" will not match a section that only says "sign in." In practice, documentation authors tend to be consistent with their terminology, so this rarely matters. When it does, querying with a few alternative terms covers the gap.

What this means for your agents

When Claude Code, Cursor, or Codex calls docmancer query, they get section-level results ranked by BM25 relevance, packed into a token budget. The agent does not need to know any of this. It just gets better answers with fewer wasted tokens, from a tool that installs in seconds and runs entirely on your machine.