A fork of CheetahCode (Nano Claude Code) by BouzéLab, itself inspired by Claude Code.
This project is a PoC. What matters here are the ideas for shrinking the token cost of a code agent at equal model capability — not the implementation quality, which is honestly rough, because we didn't really have the time to polish it. If an idea speaks to you, feel free to re-implement it cleanly in your own stack.
The angle is strict: reduce the number of tokens consumed without degrading the reasoning capability of the underlying model. We therefore deliberately avoid downgrades to smaller models, and only work on the shape of the context sent to Opus 4.6.
▶ View the interactive presentation — Token savings explained
A visual walkthrough of how Bouzecode achieves ~10× token reduction in agentic coding workflows.
- Python ≥ 3.11 (3.13 recommended, matches what we test against).
- uv — used for venv + dependency resolution. Install via
curl -LsSf https://astral.sh/uv/install.sh | sh(macOS/Linux) orpowershell -c "irm https://astral.sh/uv/install.ps1 | iex"(Windows). - ripgrep (
rg) — highly recommended; bouzecode'sGreptool falls back to a slow Python implementation without it. Install viabrew install ripgrep,apt install ripgrep, orwinget install BurntSushi.ripgrep.MSVC. - An Anthropic API key in
ANTHROPIC_API_KEY(other providers — OpenAI, Gemini, DeepSeek, etc. — are wired up inproviders/registry.pywith their own env vars).
git clone https://github.com/Simon-Free/bouzecode.git
cd bouzecode
$env:ANTHROPIC_API_KEY = "sk-ant-..." # or put it in a .env file
.\bouzecode.ps1 # launches the REPL
.\bouzegui.ps1 # launches BouzéGUI on http://127.0.0.1:5055bouzecode.ps1 and bouzegui.ps1 handle venv creation, dependency sync, ripgrep install, and .env loading automatically.
git clone https://github.com/Simon-Free/bouzecode.git
cd bouzecode
uv venv --python 3.13
uv pip install -e .
export ANTHROPIC_API_KEY=sk-ant-...
.venv/bin/bouzecode # REPL
.venv/bin/bouzegui # BouzéGUI web UIA .env file at the repo root is picked up automatically on Windows; on Linux/macOS either source it or export the variables yourself.
From a fresh clone, uv pip install -e . should pull in anthropic / openai / flask / rich / markdown / psutil, and install two entry-point scripts (bouzecode and bouzegui) into .venv/Scripts/ (Windows) or .venv/bin/ (Unix). That is the full install — there is no compiled step, no external service to wire up beyond the API key.
src/bouzecode/
├── backend/ # Engine core: agent loop, DAG executor, providers, tools, sessions
│ ├── agent/ # Agent loop, methodology, enforcement, context GC
│ ├── providers/ # LLM provider registry (Anthropic, OpenAI, Gemini, DeepSeek, Bedrock)
│ ├── tools/ # Tool runner, schemas, plan mode, diagnostics
│ └── sessions/ # Session persistence, checkpoint, resume
├── ui/ # Terminal UI (CLI display, paste input)
├── web/ # Internal web IPC layer
└── web_v2/ # Web V2 server (FastAPI/Starlette, WebSocket, streaming)
xml_tool_protocol/ # XML tool-call protocol (parser, emitter, docs generation)
agent/ # High-level agent orchestration, DAG scheduling
tools/ # Tool implementations (Read, Write, Edit, Bash, Grep, Glob, etc.)
folder_desc/ # GetFolderDescription tool (pre-indexed exploration)
skill/ # Skill system (reusable prompt templates)
task/ # Task management (multi-step plans)
memory/ # Persistent memory (notes across sessions)
plugin/ # Plugin loading system
mcp/ # Model Context Protocol (stdio transport, JSON-RPC)
voice/ # /voice command (TTS/STT sentinel flow)
video/ # Video generation pipeline (mocked shim)
multi_agent/ # Multi-agent orchestration
html_renderer/ # Session export to HTML
web/ # BouzéGUI (Flask sidecar for debugging)
tests/ # ~1434 tests (pytest)
- User prompt → XML tool-call protocol → LLM generates
<tool_use>blocks - DAG executor (
agent/dag.py) builds dependency graph fromdepends_ondeclarations - Tools execute level-by-level (parallel within each level)
- Results injected back into context → next LLM turn
The system prompt steers the model toward a minimum-turn workflow:
- Phase 1 — READ:
Read,Glob,Grep,GetFolderDescription,Skill(...)in a single parallel batch - Phase 2 — PLAN: one
WritePlancall freezing the complete plan - Phase 3 — EXECUTE: edits + tests + diagnostics in one block, ordered via
depends_on
| Category | Feature | Description |
|---|---|---|
| Engine | DAG executor | Parallel tool execution with depends_on scheduling |
| Engine | Context GC | Model-driven context pruning (trash, snippets, notes) |
| Engine | GetFolderDescription | Pre-indexed directory exploration, no serial sub-agent |
| Engine | Methodology & Enforcement | Persistent working memory, mandatory tool-call discipline |
| Engine | Multi-provider | Anthropic, OpenAI, Gemini, DeepSeek, AWS Bedrock |
| Tools | XML tool protocol | Streaming parser, tool docs generation, typed schemas |
| Tools | Read/Write/Edit/Bash/Grep/Glob | Standard filesystem + shell tools |
| Tools | WritePlan | Human-gated planning with UI tab |
| Tools | GetDiagnostics | LSP-style diagnostics post-edit |
| Subsystems | Skills | Reusable prompt templates with triggers |
| Subsystems | Tasks | Multi-step plan management |
| Subsystems | Memory | Persistent notes across sessions |
| Subsystems | Plugins | Dynamic tool/command loading |
| Subsystems | MCP | Model Context Protocol server management (stdio) |
| Integrations | Voice | /voice command with TTS/STT sentinel |
| Integrations | Video | Video generation pipeline (shim) |
| Integrations | Telegram | /telegram command + proactive sentinel |
| Integrations | Multi-agent | Orchestrated sub-agents |
| UI | Terminal REPL | Rich CLI with streaming display |
| UI | BouzéGUI | Flask web UI (agent browser, plan/diff viewer, skill editor) |
| UI | Web V2 | FastAPI/Starlette server with WebSocket streaming |
| Export | HTML renderer | Session export to standalone HTML |
A coding agent that wraps Opus 4.6 only stays affordable if you look squarely at where the money actually goes.
Output tokens are essentially free compared to input tokens. The model emits a few hundred to a few thousand output tokens per turn. The input, though, is the cumulative arithmetic sum of the growing prefix re-sent on every round-trip: system prompt + tool schemas + full conversation so far + last tool result.
A back-of-the-envelope: 10k-token system prompt, 2k new input tokens per turn. Three turns already bills 10k + 12k + 14k = 36k input tokens. A typical coding task does 50–100 tool calls, each with a full LLM round-trip, and the provider happily charges you for every prefix re-transmission.
Money is only half of it. Each round-trip is also a queue wait. Time-to-first-token of 30–40 s under load is routine — 40 round-trips at that rate burn real wall-clock time regardless of your budget.
Saving money = cutting round-trips. Everything else in this README follows from that.
Evaluated on a handful of internal tasks (n = 5, so think orders of magnitude, not absolute numbers): usual bug fixes and feature additions on a ~100k-LoC codebase, running both systems on the same prompts.
- ~10× reduction in tokens consumed by the top-tier model, and total elimination of small-model consumption (typical: 1.5 M Opus + 3 M Haiku → 200k Opus, 0 Haiku).
- Roughly the same ~10× reduction in price.
- ~3–5× reduction in wall-clock time to reach a fix, thanks to fewer round-trips and fewer queue waits.
- Gains widen further on long sessions, thanks to context pruning.
These are ballpark figures on a small n; treat them as signals, not benchmarks.
Pack into one turn everything that could have fit in one turn. The 3-Phase Workflow (READ → PLAN → EXECUTE) targets 3 LLM turns as the floor for a complete task. The DAG executor (agent/dag.py) builds a dependency graph from depends_on declarations and executes tools level-by-level in parallel within a single turn.
Replaces the serial "Explore sub-agent" pattern (which typically burns tens of thousands of tokens) with a native tool that returns an annotated file tree in a few hundred tokens. Descriptions are auto-generated and auto-maintained via write hooks.
The model marks stale tool results for removal (trash), keeps only relevant slices (keep_snippets), and maintains a named scratchpad (notes). The context stays flat instead of inflating turn after turn.
Observe the mistakes the model makes systematically and prevent them via prompt or tool defaults. Examples: switching Grep default to content mode, steering away from python -c on Windows, forbidding pointless re-reads.
The test suite uses mocked LLM responses — no real API calls are needed to run tests.
# Full suite (~1434 tests, ~2 min)
python -m pytest tests/ -q
# Quick subset (skip slow subprocess tests)
python -m pytest tests/ -q -x -k "not subprocess"
# Web V2 only
python -m pytest tests/web_v2/ -qCurrent status: 1434 tests passing (0 failures, 44 skipped, 2 xfailed).
Test categories: agent_loop, enforcement, cache, checkpoint, DAG, methodology, thinking, tools (registry, runner, search, symbols), providers, sessions, commands, compaction, profiles, prompts, plan_mode, regression, UI, web_v2, e2e (voice, video, mcp, telegram, plugin, skill, task, memory, html_renderer, folder_desc).
web/ contains BouzéGUI, a small Flask app that launches via bouzegui (or bouzegui.ps1 on Windows) and opens on http://127.0.0.1:5055. It lets you:
- browse current / finished agents and their live stdout,
- inspect the plan and the edited-files diff for each run,
- view / edit skills on disk,
- list the registered tools.
Treat BouzéGUI as a single-user, localhost-only developer tool. It has the same risk profile as running a Jupyter notebook with no token:
- No authentication. No CSRF protection.
- Remote-code-execution by design (
POST /agents/newspawns an agent). - Do not bind to
0.0.0.0, do not expose through a reverse proxy.
Untested ideas left open:
- Pre-reads by a smaller model. Have scoping
Reads executed by Sonnet/Haiku, which returns only relevant extracts to Opus. - System prompt caching. Provider-dependent; estimated ~30% additional saving with shared prompts.
- Broader recurring error catalogue. Continuous observation work that depends on stack-specific LLM mistakes.
Base: CheetahCode / Nano Claude Code (SafeRL-Lab). Original project license preserved (LICENSE).
See LICENSE.