Give your AI coding agent a brain for your repo.
basemind is a code-map MCP server: it indexes your codebase into a queryable map so AI coding agents — Claude Code, Cursor, Continue, anything that speaks MCP — get instant semantic answers about your code. Where is this defined? Who calls it? When did it change? What's churning?
Sub-millisecond queries. 300+ languages out of the box. Local-only. Built in Rust.
Today, agents read code by grepping blind. Ask Claude "who calls parseQuery?"
and it ripgreps the string — you get hits in docs, tests, comments, and 14 unrelated
files. The agent burns context filtering noise, then guesses.
LSPs are the semantic answer, but they're single-language, slow to start, and useless across a polyglot monorepo.
basemind is the missing layer. One index, every language, semantic-quality answers
at grep speed — exposed to the agent over MCP as concrete tools (find_callers,
find_references, outline, symbol_history, blame_symbol, hot_files, …)
instead of "go grep again."
Install (pick one):
brew install Goldziher/tap/basemind # macOS, Linux
npm install -g basemind # any Node 14+ platform
pip install basemind # any Python 3.8+ platform
cargo install basemind --locked # build from sourceOpt-in intelligence build (PDF/Office ingestion, semantic doc search, shared agent memory backed by LanceDB):
cargo install basemind --locked --features fullfull is the meta-feature that turns on both documents (PDF / Office / HTML
ingestion + OCR + layout) and memory (shared agent memory + vector search).
Pulls in kreuzberg (Elastic-2.0; document parsing + bundled ONNX embeddings)
and lancedb (embedded vector store). First scan after enabling downloads the
embedding model into the kreuzberg cache; subsequent scans are warm.
Index your repo:
cd /path/to/your/repo
basemind scanWire it into Claude Code — install as a plugin:
/plugin marketplace add Goldziher/basemind
/plugin install basemind@basemind
This registers basemind as an MCP server plus a basemind skill that tells
the model when to reach for code-map tools (instead of grepping or reading
files one by one). Restart the session and the agent has all the tools listed
below.
Codex — install via Codex's plugin / marketplace UI from the same repo
(the .codex-plugin/plugin.json manifest is shipped alongside Claude's), then
add the MCP server entry to ~/.codex/config.toml:
[mcp_servers.basemind]
command = "basemind"
args = ["serve"]Other MCP clients (Cursor, Continue, Cline, …) — drop the standard
mcpServers entry into the client's MCP config:
{
"mcpServers": {
"basemind": {
"command": "basemind",
"args": ["serve"]
}
}
}| Tool | What the agent can finally do |
|---|---|
outline |
"Give me this file's structure" — symbols, line/col, signatures, imports. One call replaces five Reads. |
search_symbols |
"Find anything named useAuth" — substring match across every indexed symbol, kind-filterable. |
find_references |
"Where is parseQuery called?" — indexed call-site lookup. No regex noise. |
find_callers |
"Who calls User.save()?" — resolves the definition first, then scans. |
dependents |
"What imports this module?" — reverse import lookup. |
list_files |
"What files are in src/auth/?" — indexed path + language filters. |
status |
"What languages does this repo use?" — file count + language breakdown. |
repo_info |
Branch, HEAD, workdir at a glance. |
| Tool | What the agent can finally do |
|---|---|
symbol_history |
"When did validateToken actually change?" — tree-sitter × git, comment/format-stable diffs. |
blame_file / blame_symbol |
"Who wrote this and why?" — line-range or symbol-scoped blame. |
hot_files |
"What's been churning?" — top-K most-changed files in the last N commits. |
recent_changes |
"What changed recently on this branch?" |
commits_touching |
"Show me every commit that touched auth.rs." |
diff_outline |
"What symbols differ between main and HEAD?" — structural diff. |
diff_file |
"Give me the unified diff for auth.rs across these revs." |
working_tree_status |
"What's staged / unstaged / untracked right now?" |
| Tool | What the agent can finally do |
|---|---|
search_documents |
"Find the auth design doc" — semantic KNN over PDFs / Office / HTML / emails. |
memory_put / memory_get / memory_list |
Persist scoped notes — exact-key store and prefix / tag scans. |
memory_search |
Semantic recall across stored memory entries — KNN over the LanceDB memory table. |
memory_delete |
Drop an entry from both Fjall and LanceDB. |
Memory is scoped by the repo's normalised origin URL so clones share entries.
A repo with no remote falls back to a workdir-keyed scope (configurable via
[memory].scope_strategy in .basemind/basemind.toml).
Every tool returns JSON. Responses are capped (limit, default 100, max 1000) so
the agent's context doesn't explode.
basemind writes one row per MCP tool call to .basemind/telemetry.jsonl (always on,
best-effort, ~200 bytes per row). Two surfaces consume it:
Live statusline — three lines in ~/.claude/settings.json:
{
"statusLine": {
"type": "command",
"command": "$HOME/.claude/plugins/basemind/.claude-plugin/statusline.sh",
"refreshInterval": 5
}
}Renders bm ~103f · scan 2m ago · 47 calls · ~14k tok saved at the bottom of the
Claude Code terminal. Refreshes every 5 seconds. The script is shipped in the
plugin tree; Claude Code cannot auto-install statusline scripts so the wiring is
manual (one-time).
On-demand dashboard — the new telemetry_summary MCP tool returns the full
breakdown (per-tool histogram, per-baseline savings, last 10 calls). The
/basemind-stats skill renders it as markdown in the conversation.
The est_tokens_saved numbers are heuristics vs a disclosed grep+Read baseline.
Every row carries a saved_baseline label so the model is auditable. Tools without
a realistic baseline (memory_*, search_documents, git wrappers) record their
calls but report zero savings — we don't claim what we can't honestly measure.
A 39 270-file TypeScript repo. Apple Silicon, release build:
| What | Time |
|---|---|
| Cold scan (full index) | 12.4 s |
| Cached scan (no changes) | 1.6 s |
| MCP server startup | 3.1 s, 77 MB RSS |
status query |
1.2 ms |
outline (1571 symbols) |
1.9 ms |
search_symbols |
1–3 ms |
find_references("spawn") (tokio) |
< 5 ms |
basemind preloads L1 outlines into RAM on serve start, so cross-file queries
are sub-millisecond. The Fjall LSM inverted index handles ref/caller lookups
without scanning blobs.
300+ tree-sitter grammars ship via tree-sitter-language-pack. basemind dynamically loads them on first use and caches them locally.
First-class outlines — full signatures, kinds, decorators, calls, imports, docstrings — ship for:
Rust · Python · TypeScript · TSX · JavaScript · Go
Best-effort outlines via the TSLP tags.scm fallback — covers ~100 grammars
including Kotlin, C#, Swift, C++, Scala, Solidity, Lua, Ruby, PHP, Java, …
Languages without an upstream tags.scm (JSON, YAML, TOML) still parse and
appear in list_files; they just don't expose symbols.
- Built for agents, not humans. Every tool exists because an agent needs it, not because it makes a cute terminal demo.
- Semantic quality, grep speed. Tree-sitter parses → content-addressed blobs → Fjall LSM inverted index → sub-millisecond MCP responses.
- Polyglot by default. One index, every language. No LSP-per-language zoo. No "we don't support that yet."
- Local-only. No SaaS. No telemetry. No cloud round-trip. Your code never leaves the machine.
- Deterministic. Content-addressed blobs (blake3), stable hashes, reproducible across machines.
- Pure Rust. One static binary. No Python runtime, no Node runtime, no JVM.
basemind serveadds < 80 MB to your agent's stack.
basemind is also a CLI — useful for piping into shell tools, CI checks, or just inspecting a repo without spinning up an MCP server.
basemind init # write .basemind/basemind.toml with defaults
basemind scan # index the working tree
basemind scan --staged # index what's in git's staging area
basemind scan --rev <REV> # index a commit / branch / sha
basemind watch # long-running watcher; index on file change
basemind serve [--view <name>] # MCP stdio server for agents
basemind query outline <path> [--l2] # symbols, imports (+ docs/calls with --l2)
basemind query symbol <needle> [--kind K] # substring search across symbols
basemind query dependents <module> # reverse-lookup via imports
basemind hook install # install pre-commit hook (--staged scan)
basemind lang {list, install, clean} # manage downloaded tree-sitter grammars
basemind cache clear # drop .basemind/git-cache/
Global flags: -q/--quiet, -v/--verbose, --no-color (NO_COLOR honored).
A short tour. See docs/ARCHITECTURE.md for the long
version.
- Scanner (
src/scanner.rs) — rayon-parallel walker over the gitignore-aware file set. Extracts L1 (symbols + imports), L2 (calls + docs), L3 (structural hashes) per file. - Content-addressed blobs (
src/store.rs) — msgpack at.basemind/blobs/<blake3>.{l1,l2,l3}.msgpack. Two files with identical content share the same blob. Re-scan skips unchanged hashes. - Inverted index (
src/index/) — pure-Rust Fjall LSM keyspace at.basemind/views/<view>/index.fjall/. Six keyspaces drive symbol search, reference lookup, dependents. - MCP surface (
src/mcp/) — stdio JSON-RPC viarmcp. Tool descriptions are the routing surface for agents; semantics (substring vs prefix, scope-aware vs name-only, capped) are stated honestly. - Git layer (
src/git.rs,src/git_cache.rs) —gix-backed blame, log, diff, status. Sha-keyed disk cache (.basemind/git-cache/) makes warm queries free.
A view is a code map for a snapshot of the repo. Each view has its own index
under .basemind/views/<view>/; blobs are shared in .basemind/blobs/.
working(default) — the on-disk working treestaged— git staging area; what's about to be committedrev-<sha7>— whatever you scanned withbasemind scan --rev <REV>
They coexist — running one doesn't clobber the others. The pre-commit hook
installed by basemind hook install indexes staged, so the hook reflects
exactly what's being committed.
Run basemind watch in one terminal and basemind serve in another: the server
watches the index, rebuilds its in-RAM map off-thread, and atomically swaps.
Queries reflect filesystem changes within ~150 ms with no serve restart.
basemind ships with a real-OSS hardening harness — 8 upstream repos (ripgrep, tokio, microsoft/TypeScript, facebook/react, django, requests, gin, plus a shallow ripgrep variant) cloned, scanned, and MCP-swept on every release. Canary assertions catch regressions before they ship:
./scripts/harden.sh # ~10 minutes; produces /tmp/basemind-harden/results.ndjsonThe harness is #[ignore]-gated from normal cargo test. Invoked nightly and
on-dispatch from CI.
git clone https://github.com/Goldziher/basemind && cd basemind
task setup # cargo fetch + prek install
task check # lint + test
task build # release binaryPre-commit hooks via prek cover Rust
(cargo fmt/clippy/sort/machete/deny/rustdoc-lint), markdown, shell,
JSON/YAML/TOML, file-safety basics, and commit-message linting via
gitfluff.
Contributing guidelines: see CONTRIBUTING.md.
MIT.