How Docmancer Works

Two cooperating local pipelines

Docmancer runs two pipelines on your machine, sharing the same Docmancer home (~/.docmancer/):

Docs RAG. docmancer add (URLs) and docmancer ingest (local files) normalize content into sections, index them in SQLite FTS5 and a managed local Qdrant for dense + sparse vectors, and serve compact context packs through docmancer query with hybrid Reciprocal Rank Fusion.
MCP runtime. docmancer install-pack <pkg>@<version> installs a version-pinned API tool pack from the registry. docmancer mcp serve exposes every installed pack to your agent through a single stdio MCP server using the Tool Search pattern.

No hosted query API. The only background process is the Docmancer-owned Qdrant binary, and even that is optional (set DOCMANCER_AUTO_VECTORS=0 or pass --no-vectors to stay on FTS5 only).

Docs RAG: ingest

When you run docmancer add <url> or docmancer ingest <path>, Docmancer:

Fetches content. URLs go through provider detection (GitBook llms-full.txt, Mintlify llms.txt / sitemap, GitHub README + docs dir, generic sitemap or nav crawl). Local paths read files directly.
Loads by format. Markdown, plain text, PDF, DOCX, RTF, and HTML are supported. PDF/DOCX/RTF/HTML need pip install 'docmancer[local]' for the optional parser dependencies.
Normalizes into sections, splitting on heading structure. Each section carries its full heading path ("Authentication > OAuth 2.0 > Scopes") and source URL.
Indexes into SQLite FTS5 (BM25 ranking) at ~/.docmancer/docmancer.db.
Embeds and upserts dense + sparse vectors into the managed local Qdrant. FastEmbed produces a bge-base-en-v1.5 dense vector and a SPLADE sparse vector per section. A content-hash-keyed cache under ~/.docmancer/embeddings-cache/ skips chunks whose text has not changed since the last ingest.

Extracted Markdown and JSON for every section are written to ~/.docmancer/extracted/ so the indexed content is inspectable on disk.

Docs RAG: managed Qdrant

The local Qdrant runs as a Docmancer-owned process:

Pinned binary. Acquired from the v1.14.1 GitHub release with platform-aware download. Falls back to SqliteVecStore (sqlite-vec) when the platform has no matching binary.
Port selection under filelock so parallel CLI invocations don't fight for ports.
Telemetry disabled.
Ownership sentinel. QdrantStore.ensure_collection refuses to claim a pre-existing collection that lacks Docmancer's ownership marker; delete_collection will only operate on owned collections. Foreign Qdrants are protected.

Lifecycle is controlled by docmancer qdrant {up,down,status,upgrade,logs}.

Docs RAG: hybrid retrieval

docmancer query --mode {lexical,dense,sparse,hybrid} --explain runs through the retrieval dispatcher:

Lexical queries the FTS5 index using BM25, dominated by exact API names, option flags, config keys, error strings, and code identifiers.
Dense queries the Qdrant collection by FastEmbed bge-base-en-v1.5 embeddings, catching semantic neighbours ("login" matches "sign in").
Sparse queries the SPLADE column in Qdrant, splitting the difference between lexical exactness and dense recall.
Hybrid fans out across all three in a thread pool and fuses ranks with Reciprocal Rank Fusion (vanilla or weighted).

--explain annotates each result with the per-source rank that contributed (e.g. lexical#1, dense#3, sparse#2).

Two extensions ride this path:

Hierarchical retrieval runs a wide-net first pass, aggregates by document, picks the top-N documents, then re-retrieves sections filtered to those documents before fusion. Auto-enabled per index once the corpus has at least retrieval.hierarchical.auto_min_documents (default 10) distinct documents.
Query routers walk an ordered list of regex matchers. The first match merges its declared filters into the dispatcher call (e.g. route "billing" queries to a specific docset). See Router recipes.

The default mode is lexical for bare configs and auto-flips to hybrid when the YAML explicitly opts in by including a non-empty vector_store: block.

Docs RAG: context packs

The output of docmancer query is a compact context pack: the top sections, their heading paths, source URLs, a token estimate, and a savings line:

Context pack: ~900 tokens vs ~4800 raw docs tokens
(81.2% less docs overhead, 5.33x agentic runway)

--expand includes adjacent sections; --expand page returns full page content within the token budget.

MCP runtime: Tool Search

docmancer install-pack <pkg>@<version> resolves pack artifacts (in order):

Local cache.
Hosted Docmancer artifact API.
Built-in known-source fallback (compiles supported packs like open-meteo from their public OpenAPI when prebuilt artifacts are missing).

Pack files (contract.json, tools.curated.json, tools.full.json, auth.schema.json, provenance.json) plus a SHA-256 manifest.json are verified and written under ~/.docmancer/servers/<pkg>@<version>/. The package is registered in ~/.docmancer/mcp/manifest.json with its mode (curated / expanded), destructive-call permission, executor permission, and enabled state.

When an agent launches docmancer mcp serve, the server exposes exactly two meta-tools regardless of how many packs are installed:

docmancer_search_tools(query, package?, limit) — token-overlap search across the curated tool surfaces of every enabled pack.
docmancer_call_tool(name, args) — dispatches the resolved tool through the matching executor.

This Tool Search pattern keeps startup small even when packs contain hundreds of operations.

MCP runtime: dispatch gate chain

Every docmancer_call_tool invocation passes through:

Resolve the slug package__version__operation against the manifest.
Validate args against the operation's inputSchema with jsonschema.
Auth. Resolve credentials by precedence: per-call override → process env → agent-config env → per-package fallback.
Safety gate. Destructive ops refuse unless the pack was installed with --allow-destructive; python_import / shell executors refuse unless installed with --allow-execute.
Idempotency. For non-idempotent operations on sources that declare an idempotency header, generate a UUID4 Idempotency-Key. A 24-hour SQLite fingerprint cache reuses the same key on retry.
Execute through http, noop_doc, or python_import executors.
Log a redacted entry to ~/.docmancer/mcp/calls.jsonl (arg keys only, never values).

Concurrency

Multiple CLI calls from parallel agents or terminals are safe. SQLite handles concurrent reads natively, writes serialize through SQLite's built-in locking, and the Qdrant binary uses filelock for ports and PID files.

Flow

Docs:
  GitBook / Mintlify / web / GitHub / local files (md, pdf, docx, rtf, html)
    -> SQLite FTS5 + Qdrant (dense + sparse)
    -> docmancer query --mode hybrid
    -> context pack + token savings

Agents:
  docmancer setup
    -> skill files for Claude Code, Cursor, Codex, Cline, Gemini, OpenCode,
       Claude Desktop, GitHub Copilot

MCP packs:
  docmancer install-pack <pkg>@<version>
    -> contract.json, tools.curated.json, tools.full.json, auth.schema.json,
       provenance.json, manifest.json (SHA-256s)
    -> docmancer mcp serve
    -> docmancer_search_tools + docmancer_call_tool