reference

Configuration

Full docmancer.yaml reference for hybrid retrieval, vector store, embeddings, and web fetch.

Updated

Config file locations

Docmancer uses YAML config files. It checks in this order:

  1. --config <path> on any command (explicit path).
  2. ./docmancer.yaml in the current directory (project-local).
  3. ~/.docmancer/docmancer.yaml (global, auto-created on first use).

Global config

Created automatically when you first run docmancer setup, docmancer add, or docmancer ingest. Located at ~/.docmancer/docmancer.yaml.

Minimal default:

index:
  db_path: ~/.docmancer/docmancer.db
  extracted_dir: ~/.docmancer/extracted/

A bare YAML like this stays lexical-only. retrieval.default_mode only auto-flips to hybrid when a non-empty vector_store: block is present.

Project-local config

Create a project-specific config with:

docmancer init

This writes docmancer.yaml in the current directory. When present, it takes precedence over the global config for any command run from that directory. Relative index.db_path values resolve against the config file's directory, not the shell's current directory.

index

Controls the SQLite FTS5 index.

KeyDefaultDescription
index.providersqliteIndex backend (only sqlite is supported)
index.db_path~/.docmancer/docmancer.dbPath to the SQLite database
index.extracted_dir~/.docmancer/extractedDirectory for extracted Markdown / JSON inspection files

vector_store

Controls the dense + sparse vector backend. Include this block to opt into hybrid retrieval; the dispatcher auto-flips to hybrid mode when it is present and non-empty.

KeyDefaultDescription
vector_store.providerqdrantqdrant (managed local binary) or sqlite_vec (pure SQLite fallback)
vector_store.urlautoOverride for an existing Qdrant URL (e.g. http://localhost:6333). Honours DOCMANCER_QDRANT_URL.
vector_store.collectiondocmancerQdrant collection name. Refuses to claim a pre-existing collection without the Docmancer ownership sentinel.
vector_store.managedtrueWhen true, Docmancer manages the Qdrant binary lifecycle (docmancer qdrant up/down/status/upgrade/logs).

embeddings

Selects the embeddings provider for dense + sparse vectors.

KeyDefaultDescription
embeddings.providerfastembedfastembed (local), openai, voyage, or cohere
embeddings.dense_modelBAAI/bge-base-en-v1.5Dense model identifier
embeddings.sparse_modelprithivida/Splade_PP_en_v1Sparse (SPLADE) model identifier

Content-hash-keyed cache at ~/.docmancer/embeddings-cache/ skips re-embedding unchanged chunks. Cloud providers retry on 429/5xx with bounded exponential backoff; when a configured cloud provider has no API key in env, ingest falls back to FTS5-only with a warning rather than aborting.

retrieval

Controls the retrieval dispatcher: hybrid fusion, hierarchical two-stage retrieval, query routing, and neighbor expansion.

KeyDefaultDescription
retrieval.default_modelexical (auto-flips to hybrid when vector_store: is configured)One of lexical, dense, sparse, hybrid
retrieval.expandnullOverride neighbor expansion for the dispatcher path. Falls back to query.default_expand.
retrieval.hierarchical.enabledfalseForce the two-stage retrieval pass on
retrieval.hierarchical.autotrueAuto-enable two-stage retrieval per index once the corpus has at least auto_min_documents distinct documents
retrieval.hierarchical.auto_min_documents10Threshold for the auto path
retrieval.hierarchical.documents_limit5Top-N documents kept after stage 1
retrieval.hierarchical.candidate_pool200Wide-net size for stage 1
retrieval.hierarchical.sections_per_document10Stage 2 cap on sections per surviving document
retrieval.routers[]Ordered {match, filters} entries. First regex match merges filters into the dispatcher call. Only fires under dispatcher modes (dense, sparse, hybrid). See Router recipes.
retrieval.fusion.methodrrfrrf or weighted_rrf
retrieval.fusion.rrf_k60RRF constant

query

Defaults for docmancer query.

KeyDefaultDescription
query.default_budget2400Default token budget for context packs
query.default_limit8Maximum sections returned per query
query.default_expandadjacentnone, adjacent, or page

web_fetch

Defaults for URL-based docmancer add.

KeyDefaultDescription
web_fetch.workers8Parallelism for web page fetching
web_fetch.default_page_cap500Default page limit
web_fetch.browser_fallbackfalseEnable Playwright browser fallback by default

Environment variables

Any field above can be overridden via its prefixed environment variable.

Variable prefixScope
DOCMANCER_INDEX_*index.* fields
DOCMANCER_VECTOR_STORE_*vector_store.* fields
DOCMANCER_EMBEDDINGS_*embeddings.* fields
DOCMANCER_RETRIEVAL_*retrieval.* fields
DOCMANCER_QUERY_*query.* fields
DOCMANCER_WEB_FETCH_*web_fetch.* fields
DOCMANCER_QDRANT_URLOverride the managed Qdrant URL
DOCMANCER_AUTO_VECTORS0 to disable auto vector sync; 1 to force it on

API MCP pack credentials are not stored in docmancer.yaml. For keyed packs, export the relevant <PACKAGE>_API_KEY env var in your shell, then run docmancer mcp doctor.

Example: full hybrid config

index:
  provider: sqlite
  db_path: ~/.docmancer/docmancer.db
  extracted_dir: ~/.docmancer/extracted

vector_store:
  provider: qdrant
  collection: docmancer
  managed: true

embeddings:
  provider: fastembed
  dense_model: BAAI/bge-base-en-v1.5

retrieval:
  default_mode: hybrid
  hierarchical:
    auto: true
    auto_min_documents: 10
  fusion:
    method: rrf
    rrf_k: 60

query:
  default_budget: 2400
  default_limit: 8
  default_expand: adjacent

Deprecated and removed keys

  • registry: — ignored with a one-time deprecation warning.
  • packs: — dropped silently. Use docmancer install-pack <package>@<version> instead.
  • bench: — removed in 0.5.0 along with the docmancer bench command. YAML that still contains it loads with a deprecation warning.
  • eval: — removed in 0.5.0. Same warning behaviour as bench:.

Data locations

PathContent
~/.docmancer/docmancer.yamlGlobal config
~/.docmancer/docmancer.dbSQLite FTS5 index
~/.docmancer/extracted/Extracted Markdown / JSON (inspectable)
~/.docmancer/qdrant/Pinned Qdrant binary + storage
~/.docmancer/embeddings-cache/FastEmbed models + content-hash cache
~/.docmancer/mcp/manifest.jsonInstalled API MCP pack manifest
~/.docmancer/servers/Installed API MCP pack artifacts

Resetting

To clear the index but keep config:

docmancer remove --all