basemind

Give your AI coding agent a brain for your repo.

basemind is a code-map MCP server: it indexes your codebase into a queryable map so AI coding agents — Claude Code, Cursor, Continue, anything that speaks MCP — get instant semantic answers about your code. Where is this defined? Who calls it? When did it change? What's churning?

Sub-millisecond queries. 300+ languages out of the box. Local-only. Built in Rust.

Why your agent needs this

Today, agents read code by grepping blind. Ask Claude "who calls parseQuery?" and it ripgreps the string — you get hits in docs, tests, comments, and 14 unrelated files. The agent burns context filtering noise, then guesses.

LSPs are the semantic answer, but they're single-language, slow to start, and useless across a polyglot monorepo.

basemind is the missing layer. One index, every language, semantic-quality answers at grep speed — exposed to the agent over MCP as concrete tools (find_callers, find_references, outline, symbol_history, blame_symbol, hot_files, …) instead of "go grep again."

30-second setup

Install (pick one):

brew install Goldziher/tap/basemind     # macOS, Linux
npm install -g basemind                 # any Node 14+ platform
pip install basemind                    # any Python 3.8+ platform
cargo install basemind --locked         # build from source

Opt-in intelligence build (PDF/Office ingestion, semantic doc search, shared agent memory backed by LanceDB):

cargo install basemind --locked --features full

full is the meta-feature that turns on both documents (PDF / Office / HTML ingestion + OCR + layout) and memory (shared agent memory + vector search). Pulls in kreuzberg (Elastic-2.0; document parsing + bundled ONNX embeddings) and lancedb (embedded vector store). First scan after enabling downloads the embedding model into the kreuzberg cache; subsequent scans are warm.

Index your repo:

cd /path/to/your/repo
basemind scan

Wire it into Claude Code — install as a plugin:

/plugin marketplace add Goldziher/basemind
/plugin install basemind@basemind

This registers basemind as an MCP server plus a basemind skill that tells the model when to reach for code-map tools (instead of grepping or reading files one by one). Restart the session and the agent has all the tools listed below.

Codex — install via Codex's plugin / marketplace UI from the same repo (the .codex-plugin/plugin.json manifest is shipped alongside Claude's), then add the MCP server entry to ~/.codex/config.toml:

[mcp_servers.basemind]
command = "basemind"
args = ["serve"]

Other MCP clients (Cursor, Continue, Cline, …) — drop the standard mcpServers entry into the client's MCP config:

{
  "mcpServers": {
    "basemind": {
      "command": "basemind",
      "args": ["serve"]
    }
  }
}

What your agent gets

Code-map tools

Tool	What the agent can finally do
`outline`	"Give me this file's structure" — symbols, line/col, signatures, imports. One call replaces five Reads.
`search_symbols`	"Find anything named `useAuth`" — substring match across every indexed symbol, kind-filterable.
`find_references`	"Where is `parseQuery` called?" — indexed call-site lookup. No regex noise.
`find_callers`	"Who calls `User.save()`?" — resolves the definition first, then scans.
`dependents`	"What imports this module?" — reverse import lookup.
`list_files`	"What files are in `src/auth/`?" — indexed path + language filters.
`status`	"What languages does this repo use?" — file count + language breakdown.
`repo_info`	Branch, HEAD, workdir at a glance.

Git-aware tools

Tool	What the agent can finally do
`symbol_history`	"When did `validateToken` actually change?" — tree-sitter × git, comment/format-stable diffs.
`blame_file` / `blame_symbol`	"Who wrote this and why?" — line-range or symbol-scoped blame.
`hot_files`	"What's been churning?" — top-K most-changed files in the last N commits.
`recent_changes`	"What changed recently on this branch?"
`commits_touching`	"Show me every commit that touched `auth.rs`."
`diff_outline`	"What symbols differ between `main` and `HEAD`?" — structural diff.
`diff_file`	"Give me the unified diff for `auth.rs` across these revs."
`working_tree_status`	"What's staged / unstaged / untracked right now?"

Intelligence tools (opt-in: `--features full`)

Tool	What the agent can finally do
`search_documents`	"Find the auth design doc" — semantic KNN over PDFs / Office / HTML / emails.
`memory_put` / `memory_get` / `memory_list`	Persist scoped notes — exact-key store and prefix / tag scans.
`memory_search`	Semantic recall across stored memory entries — KNN over the LanceDB memory table.
`memory_delete`	Drop an entry from both Fjall and LanceDB.

Memory is scoped by the repo's normalised origin URL so clones share entries. A repo with no remote falls back to a workdir-keyed scope (configurable via [memory].scope_strategy in .basemind/basemind.toml).

Every tool returns JSON. Responses are capped (limit, default 100, max 1000) so the agent's context doesn't explode.

Visual integration: live stats in Claude Code

basemind writes one row per MCP tool call to .basemind/telemetry.jsonl (always on, best-effort, ~200 bytes per row). Two surfaces consume it:

Live statusline — three lines in ~/.claude/settings.json:

{
  "statusLine": {
    "type": "command",
    "command": "$HOME/.claude/plugins/basemind/.claude-plugin/statusline.sh",
    "refreshInterval": 5
  }
}

Renders bm ~103f · scan 2m ago · 47 calls · ~14k tok saved at the bottom of the Claude Code terminal. Refreshes every 5 seconds. The script is shipped in the plugin tree; Claude Code cannot auto-install statusline scripts so the wiring is manual (one-time).

On-demand dashboard — the new telemetry_summary MCP tool returns the full breakdown (per-tool histogram, per-baseline savings, last 10 calls). The /basemind-stats skill renders it as markdown in the conversation.

The est_tokens_saved numbers are heuristics vs a disclosed grep+Read baseline. Every row carries a saved_baseline label so the model is auditable. Tools without a realistic baseline (memory_*, search_documents, git wrappers) record their calls but report zero savings — we don't claim what we can't honestly measure.

Performance

A 39 270-file TypeScript repo. Apple Silicon, release build:

What	Time
Cold scan (full index)	12.4 s
Cached scan (no changes)	1.6 s
MCP server startup	3.1 s, 77 MB RSS
`status` query	1.2 ms
`outline` (1571 symbols)	1.9 ms
`search_symbols`	1–3 ms
`find_references("spawn")` (tokio)	< 5 ms

basemind preloads L1 outlines into RAM on serve start, so cross-file queries are sub-millisecond. The Fjall LSM inverted index handles ref/caller lookups without scanning blobs.

Languages

300+ tree-sitter grammars ship via tree-sitter-language-pack. basemind dynamically loads them on first use and caches them locally.

First-class outlines — full signatures, kinds, decorators, calls, imports, docstrings — ship for:

Rust · Python · TypeScript · TSX · JavaScript · Go

Best-effort outlines via the TSLP tags.scm fallback — covers ~100 grammars including Kotlin, C#, Swift, C++, Scala, Solidity, Lua, Ruby, PHP, Java, …

Languages without an upstream tags.scm (JSON, YAML, TOML) still parse and appear in list_files; they just don't expose symbols.

Why basemind, specifically

Built for agents, not humans. Every tool exists because an agent needs it, not because it makes a cute terminal demo.
Semantic quality, grep speed. Tree-sitter parses → content-addressed blobs → Fjall LSM inverted index → sub-millisecond MCP responses.
Polyglot by default. One index, every language. No LSP-per-language zoo. No "we don't support that yet."
Local-only. No SaaS. No telemetry. No cloud round-trip. Your code never leaves the machine.
Deterministic. Content-addressed blobs (blake3), stable hashes, reproducible across machines.
Pure Rust. One static binary. No Python runtime, no Node runtime, no JVM. basemind serve adds < 80 MB to your agent's stack.

CLI

basemind is also a CLI — useful for piping into shell tools, CI checks, or just inspecting a repo without spinning up an MCP server.

basemind init                              # write .basemind/basemind.toml with defaults
basemind scan                              # index the working tree
basemind scan --staged                     # index what's in git's staging area
basemind scan --rev <REV>                  # index a commit / branch / sha
basemind watch                             # long-running watcher; index on file change
basemind serve [--view <name>]             # MCP stdio server for agents
basemind query outline <path> [--l2]       # symbols, imports (+ docs/calls with --l2)
basemind query symbol <needle> [--kind K]  # substring search across symbols
basemind query dependents <module>         # reverse-lookup via imports
basemind hook install                      # install pre-commit hook (--staged scan)
basemind lang {list, install, clean}       # manage downloaded tree-sitter grammars
basemind cache clear                       # drop .basemind/git-cache/

Global flags: -q/--quiet, -v/--verbose, --no-color (NO_COLOR honored).

Architecture

A short tour. See docs/ARCHITECTURE.md for the long version.

Scanner (src/scanner.rs) — rayon-parallel walker over the gitignore-aware file set. Extracts L1 (symbols + imports), L2 (calls + docs), L3 (structural hashes) per file.
Content-addressed blobs (src/store.rs) — msgpack at .basemind/blobs/<blake3>.{l1,l2,l3}.msgpack. Two files with identical content share the same blob. Re-scan skips unchanged hashes.
Inverted index (src/index/) — pure-Rust Fjall LSM keyspace at .basemind/views/<view>/index.fjall/. Six keyspaces drive symbol search, reference lookup, dependents.
MCP surface (src/mcp/) — stdio JSON-RPC via rmcp. Tool descriptions are the routing surface for agents; semantics (substring vs prefix, scope-aware vs name-only, capped) are stated honestly.
Git layer (src/git.rs, src/git_cache.rs) — gix-backed blame, log, diff, status. Sha-keyed disk cache (.basemind/git-cache/) makes warm queries free.

Views

A view is a code map for a snapshot of the repo. Each view has its own index under .basemind/views/<view>/; blobs are shared in .basemind/blobs/.

working (default) — the on-disk working tree
staged — git staging area; what's about to be committed
rev-<sha7> — whatever you scanned with basemind scan --rev <REV>

They coexist — running one doesn't clobber the others. The pre-commit hook installed by basemind hook install indexes staged, so the hook reflects exactly what's being committed.

Live refresh

Run basemind watch in one terminal and basemind serve in another: the server watches the index, rebuilds its in-RAM map off-thread, and atomically swaps. Queries reflect filesystem changes within ~150 ms with no serve restart.

Hardening

basemind ships with a real-OSS hardening harness — 8 upstream repos (ripgrep, tokio, microsoft/TypeScript, facebook/react, django, requests, gin, plus a shallow ripgrep variant) cloned, scanned, and MCP-swept on every release. Canary assertions catch regressions before they ship:

./scripts/harden.sh    # ~10 minutes; produces /tmp/basemind-harden/results.ndjson

The harness is #[ignore]-gated from normal cargo test. Invoked nightly and on-dispatch from CI.

Development

git clone https://github.com/Goldziher/basemind && cd basemind
task setup     # cargo fetch + prek install
task check     # lint + test
task build     # release binary

Pre-commit hooks via prek cover Rust (cargo fmt/clippy/sort/machete/deny/rustdoc-lint), markdown, shell, JSON/YAML/TOML, file-safety basics, and commit-message linting via gitfluff.

Contributing guidelines: see CONTRIBUTING.md.

License

MIT.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.ai-rulez		.ai-rulez
.claude-plugin		.claude-plugin
.codex-plugin		.codex-plugin
.github/workflows		.github/workflows
docs		docs
npm-package		npm-package
pip-package		pip-package
schema		schema
scripts		scripts
skills		skills
src		src
tests		tests
.gitignore		.gitignore
.goreleaser.yaml		.goreleaser.yaml
.pre-commit-config.yaml		.pre-commit-config.yaml
.rumdl.toml		.rumdl.toml
.typos.toml		.typos.toml
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
Taskfile.yaml		Taskfile.yaml
build.rs		build.rs
deny.toml		deny.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

basemind

Why your agent needs this

30-second setup

What your agent gets

Code-map tools

Git-aware tools

Intelligence tools (opt-in: `--features full`)

Visual integration: live stats in Claude Code

Performance

Languages

Why basemind, specifically

CLI

Architecture

Views

Live refresh

Hardening

Development

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

basemind

Why your agent needs this

30-second setup

What your agent gets

Code-map tools

Git-aware tools

Intelligence tools (opt-in: --features full)

Visual integration: live stats in Claude Code

Performance

Languages

Why basemind, specifically

CLI

Architecture

Views

Live refresh

Hardening

Development

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Intelligence tools (opt-in: `--features full`)

Packages