A minimal, educational AI agent harness — built to learn the plumbing behind tools like Claude Code: dynamic tool calling, progressive disclosure of skills, deferred tool loading, subagents, permission gating, and context compaction.
It is driven by a deterministic mock LLM that speaks the real Anthropic wire
format (text / tool_use / tool_result content blocks). The "brain" is faked
so runs are free and reproducible — but every other line is authentic harness code.
Swapping in a real model would be a single new file implementing the LLM interface.
The goal is to see the machinery. Every demo prints a step-by-step trace of the loop: what was sent, what the model asked for, what ran, and what was fed back.
npm install
npm run demo:01 # run one demo
npm run demo:all # run them all in order
npm run typecheck # tsc --noEmitNo build step — tsx runs the TypeScript
directly. Zero runtime dependencies (the skill frontmatter is hand-parsed so
there's no magic).
A harness is a small agentic loop (src/core/loop.ts, ~60 lines):
user → [ model → tool_use → execute → tool_result ] → … → model → text
The model never runs anything; it only asks. The harness runs the tools and feeds
results back as a user message (that's where tool_result blocks live). Every
"feature" below is just a tool or a thin wrapper around this loop — the loop itself
never grows.
| Demo | Concept | What to watch for in the trace |
|---|---|---|
demo:01 |
Core loop + dynamic tool calling | tool_use → ✓ tool → model answers using the result |
demo:02 |
Permission gating | gate [ALLOW] runs the tool; gate [DENY] blocks it before execution |
demo:03 |
Parallel tool calls | 3 tools requested in one turn; wall-clock ≈ slowest tool, not the sum |
demo:04 |
Progressive disclosure of skills | system prompt lists skills by description only; Skill tool loads the body on demand |
demo:05 |
Tool search / deferred loading | "tools available" count grows 1→2 after ToolSearch registers a match |
demo:06 |
Subagents / sub-loops | indented nested transcript; parent context stays tiny, child does the work |
demo:07 |
Context compaction | message count climbs, then a compaction: line drops it back under budget |
demo:08 |
Skill script execution | a skill loads its instructions, then runs its bundled async script |
src/
core/
types.ts Anthropic-shaped Message / ContentBlock / Tool contract (the keystone)
loop.ts runLoop(): the agentic loop
registry.ts ToolRegistry: register / expose schemas / execute (can grow at runtime)
trace.ts step-by-step console observability
llm/
types.ts LLM interface — the single swap point for a real model
mock.ts MockLLM: replays a scripted Scenario (deterministic)
permissions/
gate.ts Allow / Deny policy gate (+ an async approval gate)
skills/
loader.ts scan dir, parse frontmatter, lazy-read bodies & scripts
skillTool.ts built-in `Skill` tool (discloses instructions)
skillScriptTool.ts built-in `run_skill_script` tool (runs bundled scripts)
skills/ the skill files: *.md (+ wordcount.mjs bundled script)
deferred/
catalog.ts dormant tool pool + keyword search
toolSearchTool.ts built-in `ToolSearch` tool (registers matches at runtime)
subagents/
spawn.ts runSubLoop(): a nested loop with context isolation
agentTool.ts built-in `Agent` tool
context/
compaction.ts token estimate + summarize-old-turns compactor
scenarios/ one runnable demo per concept (the verification)
Three of the built-in tools mirror real Claude Code mechanisms exactly: Skill
(progressive disclosure), ToolSearch (deferred tools), and Agent (subagents).
Each is "a normal tool whose handler does something interesting" — Skill reads a
file, ToolSearch mutates the registry, Agent recurses into the loop.
Anywhere the harness executes something, a real implementation does I/O — so these
seams are all async-capable (awaited by the loop), even though the mock
implementations are often synchronous:
- LLM completion —
LLM.complete()(network call) - Tool handlers —
Tool.handlerreturnsstring | Promise<string>(exercised live in demos 03 & 06) - Permission gate —
gate.check()(a real "ask" mode awaits a human; seeasyncApprovalGate) - Compactor —
maybeCompact()(real compaction awaits an LLM to write the summary) - Skill script execution —
run_skill_scriptdynamically imports and runs a bundled script (demo 08)
Designing these async from the start means a real model, real approval prompts, and real scripts all drop in without reworking the loop.
Real Anthropic adapter (the LLM interface leaves room for it), streaming, a
REPL/TUI, and persistence — all excluded to keep the focus on the plumbing.