Skip to content

CodeOfficer/ai-harness-from-scratch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

harness-builder

A minimal, educational AI agent harness — built to learn the plumbing behind tools like Claude Code: dynamic tool calling, progressive disclosure of skills, deferred tool loading, subagents, permission gating, and context compaction.

It is driven by a deterministic mock LLM that speaks the real Anthropic wire format (text / tool_use / tool_result content blocks). The "brain" is faked so runs are free and reproducible — but every other line is authentic harness code. Swapping in a real model would be a single new file implementing the LLM interface.

The goal is to see the machinery. Every demo prints a step-by-step trace of the loop: what was sent, what the model asked for, what ran, and what was fed back.

Quick start

npm install
npm run demo:01      # run one demo
npm run demo:all     # run them all in order
npm run typecheck    # tsc --noEmit

No build step — tsx runs the TypeScript directly. Zero runtime dependencies (the skill frontmatter is hand-parsed so there's no magic).

The one idea

A harness is a small agentic loop (src/core/loop.ts, ~60 lines):

user → [ model → tool_use → execute → tool_result ] → … → model → text

The model never runs anything; it only asks. The harness runs the tools and feeds results back as a user message (that's where tool_result blocks live). Every "feature" below is just a tool or a thin wrapper around this loop — the loop itself never grows.

The demos

Demo Concept What to watch for in the trace
demo:01 Core loop + dynamic tool calling tool_use✓ tool → model answers using the result
demo:02 Permission gating gate [ALLOW] runs the tool; gate [DENY] blocks it before execution
demo:03 Parallel tool calls 3 tools requested in one turn; wall-clock ≈ slowest tool, not the sum
demo:04 Progressive disclosure of skills system prompt lists skills by description only; Skill tool loads the body on demand
demo:05 Tool search / deferred loading "tools available" count grows 1→2 after ToolSearch registers a match
demo:06 Subagents / sub-loops indented nested transcript; parent context stays tiny, child does the work
demo:07 Context compaction message count climbs, then a compaction: line drops it back under budget
demo:08 Skill script execution a skill loads its instructions, then runs its bundled async script

Architecture

src/
  core/
    types.ts       Anthropic-shaped Message / ContentBlock / Tool contract (the keystone)
    loop.ts        runLoop(): the agentic loop
    registry.ts    ToolRegistry: register / expose schemas / execute (can grow at runtime)
    trace.ts       step-by-step console observability
  llm/
    types.ts       LLM interface — the single swap point for a real model
    mock.ts        MockLLM: replays a scripted Scenario (deterministic)
  permissions/
    gate.ts        Allow / Deny policy gate (+ an async approval gate)
  skills/
    loader.ts      scan dir, parse frontmatter, lazy-read bodies & scripts
    skillTool.ts   built-in `Skill` tool (discloses instructions)
    skillScriptTool.ts  built-in `run_skill_script` tool (runs bundled scripts)
    skills/        the skill files: *.md (+ wordcount.mjs bundled script)
  deferred/
    catalog.ts     dormant tool pool + keyword search
    toolSearchTool.ts  built-in `ToolSearch` tool (registers matches at runtime)
  subagents/
    spawn.ts       runSubLoop(): a nested loop with context isolation
    agentTool.ts   built-in `Agent` tool
  context/
    compaction.ts  token estimate + summarize-old-turns compactor
scenarios/         one runnable demo per concept (the verification)

Three of the built-in tools mirror real Claude Code mechanisms exactly: Skill (progressive disclosure), ToolSearch (deferred tools), and Agent (subagents). Each is "a normal tool whose handler does something interesting" — Skill reads a file, ToolSearch mutates the registry, Agent recurses into the loop.

Sync vs. async: the five execution seams

Anywhere the harness executes something, a real implementation does I/O — so these seams are all async-capable (awaited by the loop), even though the mock implementations are often synchronous:

  1. LLM completionLLM.complete() (network call)
  2. Tool handlersTool.handler returns string | Promise<string> (exercised live in demos 03 & 06)
  3. Permission gategate.check() (a real "ask" mode awaits a human; see asyncApprovalGate)
  4. CompactormaybeCompact() (real compaction awaits an LLM to write the summary)
  5. Skill script executionrun_skill_script dynamically imports and runs a bundled script (demo 08)

Designing these async from the start means a real model, real approval prompts, and real scripts all drop in without reworking the loop.

Deliberately left out

Real Anthropic adapter (the LLM interface leaves room for it), streaming, a REPL/TUI, and persistence — all excluded to keep the focus on the plumbing.

About

An educational, build-it-from-scratch AI agent harness for learning the plumbing behind Claude Code: the agentic loop, dynamic tool calling, progressive disclosure of skills, deferred tool loading, subagents, permission gating, and context compaction — driven by a deterministic mock LLM.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors