DeCodifier gets code agents to the full behavioral change surface first.
DeCodifier is a local AI coding engine with deterministic method-first retrieval that lets LLMs safely inspect and modify real projects. It provides the file operations, project registry, and tool-calling plumbing an LLM needs to write shippable code without sending your repo to a cloud.
- Local-first, runs on your machine
- Model-agnostic: works with GPT / Claude / other tool-capable LLMs
- Deterministic tools: consistent return structures for reliable parsing
DeCodifier now exposes a deterministic retrieval layer for agent-friendly code lookup:
search_symbols(query)for ranked method/class hitsget_context_read_plan(query)for bounded read planningmaterialize_context(plan)for budgeted context rendering- behavior-surface bundles for entrypoints, callers, implementations, guards, dispatchers, REPLs, simulators, and bridges
- per-hit rationale/debug metadata so agents can inspect why a symbol ranked where it did
DeCodifier is built around a simple idea:
Code agents usually need the full behavioral change surface first, not just the right file.
That means retrieving the set of implementation surfaces that must stay aligned in a real change, such as the entrypoint, the caller, the implementation, the guard, and supporting surfaces like a bridge, simulation, or REPL path.
The current benchmark suite covers three realistic repo archetypes:
harbor_api- clean Python auth serviceatlas_workspace- noisy multi-language workspacefastapi_full_stack_backend- FastAPI backend with dependency-injection auth
| System | Anchor Recall | Surface-Bundle Recall | Full Change-Set Rate | False Positives |
|---|---|---|---|---|
| DeCodifier | 100% | 100% | 100% | 0% |
| Embedding baseline | 85% | 0% | 20% | 28% |
| Lexical baseline | 65% | 0% | 20% | 44% |
| System | Context Precision | Recall | False Positives |
|---|---|---|---|
| DeCodifier | 58% | 100% | 0% |
| Embedding baseline | 36% | 69% | 28% |
| Lexical baseline | 28% | 62% | 44% |
Across the current benchmark repos, DeCodifier also achieves:
- 100% top-1 correctness
- 100% top-k correctness
- 100% caller correctness
- 100% trace correctness
- 100% no-answer correctness
The lexical baseline falls to:
- 39% top-1 correctness
- 33% top-k correctness
- 0% caller correctness
- 0% trace correctness
- 0% no-answer correctness
Anchor Recall- whether retrieval surfaced the main behavioral anchor for the taskSurface-Bundle Recall- whether retrieval returned the bundle of surfaces that should be changed togetherFull Change-Set Rate- whether an agent could recover the complete coordinated change surface, not just one relevant snippetFalse Positives- how often retrieval confidently returned junk
This matters because code agents often fail by finding one plausible file and missing the rest of the change surface. DeCodifier is designed to return the full behavioral bundle instead.
Traditional retrieval usually operates at the file or chunk level.
DeCodifier operates at the behavior level:
- method-first retrieval
- caller anchoring
- framework entrypoint detection
- trace query handling
- grouped surface bundles
- strict no-answer protection
That makes it better suited for questions like:
where is token validation enforcedwhere are permissions checkedtrace login -> token validationwhat surfaces need to change together
A recent live Codex + MCP run used DeCodifier on a separate local OS project to expand a calculator from basic arithmetic to a broader integer-math feature set.
What DeCodifier changed
Without DeCodifier, this task looked like a likely single-file patch in the guest calculator source.
With DeCodifier retrieval first, Codex surfaced the actual behavioral change surface:
- the guest calculator implementation in
user/calc.c - the mirrored host-side calculator logic in
model/qwen_chat.py - the user-facing capability and help text that needed to stay aligned
That changed the plan from “patch the obvious file” to “patch the full surface that defines calculator behavior across guest and host paths.”
Why this mattered
The calculator logic existed in more than one place. Updating only the guest implementation would have created drift between:
- the real interactive calculator in the OS
- the host-side mirrored and simulated path used by the bridge
DeCodifier made Codex less likely to stop at the first plausible file and more likely to update the full behavior surface safely.
What changed
The final patch added a broader integer-math feature set and kept both execution paths aligned, including:
- new unary helpers
- new binary and bitwise helpers
- parser and alias updates
- help text updates
- host-side mirrored behavior updates
The bridge transport itself did not require protocol changes. The important work was keeping the behavior layers aligned above it.
Verification
Focused verification passed:
python3 -m py_compile model/qwen_chat.pymake kernel.elf- a targeted calculator harness covering valid, chained, function-style, and invalid or overflow cases
Note: this was a focused verification pass, not a full QEMU or full OS regression run. In that environment, direct import of the host bridge module was blocked by a missing torch dependency, so the calculator verification used a calc-only extracted harness from the mirrored Python logic.
Takeaway
This is the kind of change DeCodifier is built for: not just finding a relevant file, but recovering the full behavioral change surface that a code agent needs to modify together.
See docs/case_studies.md for the longer case-study version.
The benchmark runs under fixed token budgets and evaluates the actual materialized context returned
to the model, not just raw search hits. Current results are stable across 2000, 1000, and
500 token budgets.
The goal is not just to retrieve something relevant. The goal is to retrieve the right behavioral surface under tight context limits.
This is still an early benchmark suite.
Current strengths:
- deterministic structural retrieval
- method / caller / entrypoint ranking
- change-surface recovery
- strict no-answer behavior
What still needs broader validation:
- more external public repos
- larger monorepos
- additional frameworks beyond the current benchmark set
DeCodifier is not just optimized to find the right method. It is optimized to recover the full behavioral change surface a code agent must modify together.
You can also test retrieval locally from the CLI:
decodifier query "where is token validation enforced" --path /path/to/repoAnd benchmark the static fixture repos with DeCodifier plus the lexical and embedding baselines across the default 2000, 1000, and 500 token budgets:
decodifier benchmarkThe benchmark now tracks change-oriented retrieval quality as well as first-hit accuracy, including anchor-set recall, surface-bundle recall, full change-surface success, and tokens to the full retrieval set.
For Codex, Claude Code, and other MCP-capable agents, you can expose the retrieval tools over stdio MCP:
decodifier mcp-serverRun the agent from the root of the repo you want DeCodifier to index. If you need to target a
different repo explicitly, pass --path /path/to/repo.
You can also print ready-to-use adapter snippets for Codex and Claude Code:
decodifier adapter codex
decodifier adapter claude-codeThe adapter output includes:
- a one-line install command for the target agent
- a publish-safe config snippet for
~/.codex/config.tomlor.mcp.json - a short instruction block you can drop into
AGENTS.mdorCLAUDE.md
Pass --path /path/to/repo only when you want to pin the config to a specific local checkout.
For legacy local integrations, the older newline-delimited JSON tool server is still available:
decodifier tool-server --path /path/to/repoSend newline-delimited requests like:
{"id":1,"tool":"search_symbols","arguments":{"query":"where are permissions checked","max_symbols":3}}python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -e .[dev]
uvicorn engine.app.main:app --reloadDeCodifier stores project registry & conversations in ~/.decodifier.
To override:
export DECODIFIER_DATA_DIR=/your/path- Deterministic method/class retrieval
- Bounded context planning and materialization
- List projects / select target
- Read files
- Save/patch files
- Create and scaffold new modules
- Build features end-to-end
- Diff-level patching (WIP)
from decodifier.client import DeCodifierClient, handle_decodifier_tool_call
from decodifier.tool_registry import DECODIFIER_TOOLS
client = DeCodifierClient(base_url="http://127.0.0.1:8000")
result = handle_decodifier_tool_call(client, "decodifier_read_file", {
"project_id": "core_backend",
"path": "engine/app/main.py",
})
print(result)Available tools are listed in decodifier/tool_registry.py and documented in docs/tool_reference.md.
LLM <-> DeCodifier tools <-> FastAPI backend <-> Project on disk
Local-only unless configured otherwise. No repo uploads. No vendor lock-in.
DeCodifier is not a production SaaS. It is ready for:
- Solo devs
- AI developers
- Early-stage builders
- Local R&D
- Notebook + VSCode workflows
- Agentic system research
Not yet ready for:
- Multi-tenant cloud deployments
- Enterprise access controls
- Repo-scale concurrency
- Untrusted user input
This is the alpha. Expect rough edges. Open issues, PRs, crashes, and questions welcome.