Two cooperating local pipelines
Docmancer runs two pipelines on your machine, sharing the same Docmancer home (~/.docmancer/):
- Docs RAG.
docmancer add(URLs) anddocmancer ingest(local files) normalize content into sections, index them in SQLite FTS5 and a managed local Qdrant for dense + sparse vectors, and serve compact context packs throughdocmancer querywith hybrid Reciprocal Rank Fusion. - MCP runtime.
docmancer install-pack <pkg>@<version>installs a version-pinned API tool pack from the registry.docmancer mcp serveexposes every installed pack to your agent through a single stdio MCP server using the Tool Search pattern.
No hosted query API. The only background process is the Docmancer-owned Qdrant binary, and even that is optional (set DOCMANCER_AUTO_VECTORS=0 or pass --no-vectors to stay on FTS5 only).
Docs RAG: ingest
When you run docmancer add <url> or docmancer ingest <path>, Docmancer:
- Fetches content. URLs go through provider detection (GitBook
llms-full.txt, Mintlifyllms.txt/ sitemap, GitHub README + docs dir, generic sitemap or nav crawl). Local paths read files directly. - Loads by format. Markdown, plain text, PDF, DOCX, RTF, and HTML are supported. PDF/DOCX/RTF/HTML need
pip install 'docmancer[local]'for the optional parser dependencies. - Normalizes into sections, splitting on heading structure. Each section carries its full heading path ("Authentication > OAuth 2.0 > Scopes") and source URL.
- Indexes into SQLite FTS5 (BM25 ranking) at
~/.docmancer/docmancer.db. - Embeds and upserts dense + sparse vectors into the managed local Qdrant. FastEmbed produces a
bge-base-en-v1.5dense vector and a SPLADE sparse vector per section. A content-hash-keyed cache under~/.docmancer/embeddings-cache/skips chunks whose text has not changed since the last ingest.
Extracted Markdown and JSON for every section are written to ~/.docmancer/extracted/ so the indexed content is inspectable on disk.
Docs RAG: managed Qdrant
The local Qdrant runs as a Docmancer-owned process:
- Pinned binary. Acquired from the
v1.14.1GitHub release with platform-aware download. Falls back toSqliteVecStore(sqlite-vec) when the platform has no matching binary. - Port selection under
filelockso parallel CLI invocations don't fight for ports. - Telemetry disabled.
- Ownership sentinel.
QdrantStore.ensure_collectionrefuses to claim a pre-existing collection that lacks Docmancer's ownership marker;delete_collectionwill only operate on owned collections. Foreign Qdrants are protected.
Lifecycle is controlled by docmancer qdrant {up,down,status,upgrade,logs}.
Docs RAG: hybrid retrieval
docmancer query --mode {lexical,dense,sparse,hybrid} --explain runs through the retrieval dispatcher:
- Lexical queries the FTS5 index using BM25, dominated by exact API names, option flags, config keys, error strings, and code identifiers.
- Dense queries the Qdrant collection by FastEmbed
bge-base-en-v1.5embeddings, catching semantic neighbours ("login" matches "sign in"). - Sparse queries the SPLADE column in Qdrant, splitting the difference between lexical exactness and dense recall.
- Hybrid fans out across all three in a thread pool and fuses ranks with Reciprocal Rank Fusion (vanilla or weighted).
--explain annotates each result with the per-source rank that contributed (e.g. lexical#1, dense#3, sparse#2).
Two extensions ride this path:
- Hierarchical retrieval runs a wide-net first pass, aggregates by document, picks the top-N documents, then re-retrieves sections filtered to those documents before fusion. Auto-enabled per index once the corpus has at least
retrieval.hierarchical.auto_min_documents(default 10) distinct documents. - Query routers walk an ordered list of regex matchers. The first match merges its declared filters into the dispatcher call (e.g. route "billing" queries to a specific docset). See Router recipes.
The default mode is lexical for bare configs and auto-flips to hybrid when the YAML explicitly opts in by including a non-empty vector_store: block.
Docs RAG: context packs
The output of docmancer query is a compact context pack: the top sections, their heading paths, source URLs, a token estimate, and a savings line:
Context pack: ~900 tokens vs ~4800 raw docs tokens
(81.2% less docs overhead, 5.33x agentic runway)
--expand includes adjacent sections; --expand page returns full page content within the token budget.
MCP runtime: Tool Search
docmancer install-pack <pkg>@<version> resolves pack artifacts (in order):
- Local cache.
- Hosted Docmancer artifact API.
- Built-in known-source fallback (compiles supported packs like
open-meteofrom their public OpenAPI when prebuilt artifacts are missing).
Pack files (contract.json, tools.curated.json, tools.full.json, auth.schema.json, provenance.json) plus a SHA-256 manifest.json are verified and written under ~/.docmancer/servers/<pkg>@<version>/. The package is registered in ~/.docmancer/mcp/manifest.json with its mode (curated / expanded), destructive-call permission, executor permission, and enabled state.
When an agent launches docmancer mcp serve, the server exposes exactly two meta-tools regardless of how many packs are installed:
docmancer_search_tools(query, package?, limit)— token-overlap search across the curated tool surfaces of every enabled pack.docmancer_call_tool(name, args)— dispatches the resolved tool through the matching executor.
This Tool Search pattern keeps startup small even when packs contain hundreds of operations.
MCP runtime: dispatch gate chain
Every docmancer_call_tool invocation passes through:
- Resolve the slug
package__version__operationagainst the manifest. - Validate args against the operation's
inputSchemawithjsonschema. - Auth. Resolve credentials by precedence: per-call override → process env → agent-config env → per-package fallback.
- Safety gate. Destructive ops refuse unless the pack was installed with
--allow-destructive;python_import/shellexecutors refuse unless installed with--allow-execute. - Idempotency. For non-idempotent operations on sources that declare an idempotency header, generate a UUID4
Idempotency-Key. A 24-hour SQLite fingerprint cache reuses the same key on retry. - Execute through
http,noop_doc, orpython_importexecutors. - Log a redacted entry to
~/.docmancer/mcp/calls.jsonl(arg keys only, never values).
Concurrency
Multiple CLI calls from parallel agents or terminals are safe. SQLite handles concurrent reads natively, writes serialize through SQLite's built-in locking, and the Qdrant binary uses filelock for ports and PID files.
Flow
Docs:
GitBook / Mintlify / web / GitHub / local files (md, pdf, docx, rtf, html)
-> SQLite FTS5 + Qdrant (dense + sparse)
-> docmancer query --mode hybrid
-> context pack + token savings
Agents:
docmancer setup
-> skill files for Claude Code, Cursor, Codex, Cline, Gemini, OpenCode,
Claude Desktop, GitHub Copilot
MCP packs:
docmancer install-pack <pkg>@<version>
-> contract.json, tools.curated.json, tools.full.json, auth.schema.json,
provenance.json, manifest.json (SHA-256s)
-> docmancer mcp serve
-> docmancer_search_tools + docmancer_call_tool