Config file locations
Docmancer uses YAML config files. It checks in this order:
--config <path>on any command (explicit path)../docmancer.yamlin the current directory (project-local).~/.docmancer/docmancer.yaml(global, auto-created on first use).
Global config
Created automatically when you first run docmancer setup, docmancer add, or docmancer ingest. Located at ~/.docmancer/docmancer.yaml.
Minimal default:
index:
db_path: ~/.docmancer/docmancer.db
extracted_dir: ~/.docmancer/extracted/
A bare YAML like this stays lexical-only. retrieval.default_mode only auto-flips to hybrid when a non-empty vector_store: block is present.
Project-local config
Create a project-specific config with:
docmancer init
This writes docmancer.yaml in the current directory. When present, it takes precedence over the global config for any command run from that directory. Relative index.db_path values resolve against the config file's directory, not the shell's current directory.
index
Controls the SQLite FTS5 index.
| Key | Default | Description |
|---|---|---|
index.provider | sqlite | Index backend (only sqlite is supported) |
index.db_path | ~/.docmancer/docmancer.db | Path to the SQLite database |
index.extracted_dir | ~/.docmancer/extracted | Directory for extracted Markdown / JSON inspection files |
vector_store
Controls the dense + sparse vector backend. Include this block to opt into hybrid retrieval; the dispatcher auto-flips to hybrid mode when it is present and non-empty.
| Key | Default | Description |
|---|---|---|
vector_store.provider | qdrant | qdrant (managed local binary) or sqlite_vec (pure SQLite fallback) |
vector_store.url | auto | Override for an existing Qdrant URL (e.g. http://localhost:6333). Honours DOCMANCER_QDRANT_URL. |
vector_store.collection | docmancer | Qdrant collection name. Refuses to claim a pre-existing collection without the Docmancer ownership sentinel. |
vector_store.managed | true | When true, Docmancer manages the Qdrant binary lifecycle (docmancer qdrant up/down/status/upgrade/logs). |
embeddings
Selects the embeddings provider for dense + sparse vectors.
| Key | Default | Description |
|---|---|---|
embeddings.provider | fastembed | fastembed (local), openai, voyage, or cohere |
embeddings.dense_model | BAAI/bge-base-en-v1.5 | Dense model identifier |
embeddings.sparse_model | prithivida/Splade_PP_en_v1 | Sparse (SPLADE) model identifier |
Content-hash-keyed cache at ~/.docmancer/embeddings-cache/ skips re-embedding unchanged chunks. Cloud providers retry on 429/5xx with bounded exponential backoff; when a configured cloud provider has no API key in env, ingest falls back to FTS5-only with a warning rather than aborting.
retrieval
Controls the retrieval dispatcher: hybrid fusion, hierarchical two-stage retrieval, query routing, and neighbor expansion.
| Key | Default | Description |
|---|---|---|
retrieval.default_mode | lexical (auto-flips to hybrid when vector_store: is configured) | One of lexical, dense, sparse, hybrid |
retrieval.expand | null | Override neighbor expansion for the dispatcher path. Falls back to query.default_expand. |
retrieval.hierarchical.enabled | false | Force the two-stage retrieval pass on |
retrieval.hierarchical.auto | true | Auto-enable two-stage retrieval per index once the corpus has at least auto_min_documents distinct documents |
retrieval.hierarchical.auto_min_documents | 10 | Threshold for the auto path |
retrieval.hierarchical.documents_limit | 5 | Top-N documents kept after stage 1 |
retrieval.hierarchical.candidate_pool | 200 | Wide-net size for stage 1 |
retrieval.hierarchical.sections_per_document | 10 | Stage 2 cap on sections per surviving document |
retrieval.routers | [] | Ordered {match, filters} entries. First regex match merges filters into the dispatcher call. Only fires under dispatcher modes (dense, sparse, hybrid). See Router recipes. |
retrieval.fusion.method | rrf | rrf or weighted_rrf |
retrieval.fusion.rrf_k | 60 | RRF constant |
query
Defaults for docmancer query.
| Key | Default | Description |
|---|---|---|
query.default_budget | 2400 | Default token budget for context packs |
query.default_limit | 8 | Maximum sections returned per query |
query.default_expand | adjacent | none, adjacent, or page |
web_fetch
Defaults for URL-based docmancer add.
| Key | Default | Description |
|---|---|---|
web_fetch.workers | 8 | Parallelism for web page fetching |
web_fetch.default_page_cap | 500 | Default page limit |
web_fetch.browser_fallback | false | Enable Playwright browser fallback by default |
Environment variables
Any field above can be overridden via its prefixed environment variable.
| Variable prefix | Scope |
|---|---|
DOCMANCER_INDEX_* | index.* fields |
DOCMANCER_VECTOR_STORE_* | vector_store.* fields |
DOCMANCER_EMBEDDINGS_* | embeddings.* fields |
DOCMANCER_RETRIEVAL_* | retrieval.* fields |
DOCMANCER_QUERY_* | query.* fields |
DOCMANCER_WEB_FETCH_* | web_fetch.* fields |
DOCMANCER_QDRANT_URL | Override the managed Qdrant URL |
DOCMANCER_AUTO_VECTORS | 0 to disable auto vector sync; 1 to force it on |
API MCP pack credentials are not stored in docmancer.yaml. For keyed packs, export the relevant <PACKAGE>_API_KEY env var in your shell, then run docmancer mcp doctor.
Example: full hybrid config
index:
provider: sqlite
db_path: ~/.docmancer/docmancer.db
extracted_dir: ~/.docmancer/extracted
vector_store:
provider: qdrant
collection: docmancer
managed: true
embeddings:
provider: fastembed
dense_model: BAAI/bge-base-en-v1.5
retrieval:
default_mode: hybrid
hierarchical:
auto: true
auto_min_documents: 10
fusion:
method: rrf
rrf_k: 60
query:
default_budget: 2400
default_limit: 8
default_expand: adjacent
Deprecated and removed keys
registry:— ignored with a one-time deprecation warning.packs:— dropped silently. Usedocmancer install-pack <package>@<version>instead.bench:— removed in 0.5.0 along with thedocmancer benchcommand. YAML that still contains it loads with a deprecation warning.eval:— removed in 0.5.0. Same warning behaviour asbench:.
Data locations
| Path | Content |
|---|---|
~/.docmancer/docmancer.yaml | Global config |
~/.docmancer/docmancer.db | SQLite FTS5 index |
~/.docmancer/extracted/ | Extracted Markdown / JSON (inspectable) |
~/.docmancer/qdrant/ | Pinned Qdrant binary + storage |
~/.docmancer/embeddings-cache/ | FastEmbed models + content-hash cache |
~/.docmancer/mcp/manifest.json | Installed API MCP pack manifest |
~/.docmancer/servers/ | Installed API MCP pack artifacts |
Resetting
To clear the index but keep config:
docmancer remove --all