We reverse-engineered a 512K-line CC codebase. Here are the 91 patterns that actually matter.

tl;dr — I used 8 AI agents to systematically analyze a production agentic codebase (512K lines, ~1890 files, millions of sessions). I distilled everything into 91 battle-tested patterns across 9 domains, with the anti-patterns that actually caused production incidents. Every pattern includes implementation-ready pseudocode. This is the engineering playbook I wish existed when I started building agents.

Why most AI agents break in production

Everyone is building AI agents. Almost nobody is building them well.

The hard problems aren't the LLM calls — they're everything around them:

Your prompt cache busts silently and your API bill 12x's overnight
Your agent loop crashes mid-conversation and the user's 30-minute session is gone
Your MCP tools run on a different pipeline than built-in tools, so bugs only surface in plugins
Your permission model has TOCTOU races that a crafted hook can exploit mid-session
Your context window fills up and your "smart" compaction deletes the file state the model needs next
Your 292 concurrent agents OOM because nobody set a message queue cap

I've hit every one of these building Abel AI. This repo is the result of turning those lessons into reusable engineering patterns — not as theory, but as implementation-ready specifications with the exact anti-patterns to avoid.

What's inside

91 patterns. 9 domains. 58-item audit checklist. Zero hand-waving.

Module	Patterns	You'll learn...
Architecture	7	Why one binary with four execution modes beats four codebases. How a 34-line reactive store outperforms Redux. Why your bootstrap state module should import nothing.
Agentic Loop	8	The `while(true)` AsyncGenerator pattern that handles streaming, cancellation, and backpressure in one construct. Context management as an ordered pipeline. Autocompact with circuit breakers.
LLM Integration	9	Why you should disable SDK retry and build your own. The 6-layer system prompt pipeline. How beta header latching saves millions in cache costs.
Tool System	10	The three-tier tool interface where Tier 2 defaults fail-closed. The seven-step lifecycle that prevents permission bypass. Why tool order matters for your API bill.
Agent Orchestration	9	Spawning a sub-agent IS running another conversation — same `query()`, zero feature drift. How to share prompt cache across N forked agents. The 50-message cap that prevented OOM at 292 concurrent agents.
Permission & Safety	11	Six permission modes as strategy objects. The six-layer Bash permission cascade with tree-sitter AST analysis. Why stripping code interpreter rules in auto mode prevents the AI from approving arbitrary code.
Hooks & Extensibility	12	26 lifecycle events with typed frozen payloads. Six hook types from shell scripts to LLM calls. Why your hook config must be snapshot-frozen at startup (TOCTOU injection vector).
UI & Infrastructure	13	Integer interning for 60fps terminal rendering. Hardware scroll via DECSTBM. Virtual scroll with quantized React commits. Git ref validation that blocks injection.
Philosophy	12	The 12 principles that generate correct patterns — so you can derive the right answer for situations these patterns don't cover.
Audit Checklist	58 items	Grade any agentic tool across 8 categories. Minimum viable = 70%. Production-grade = 90%.

How to design an agentic loop that doesn't break

The core of any AI agent is a loop: call the LLM, execute tools, repeat. The 7 principles below govern every design decision in that loop. They're extracted from the Philosophy module — the "why behind the why" that generates correct patterns for situations this repo doesn't explicitly cover.

1. AsyncGenerator as lingua franca
   → One composition primitive. Streaming, backpressure, cancellation, type safety.
     If you're using callbacks AND promises AND event emitters, you're paying
     complexity tax for zero compositional benefit.

2. Prompt cache is sacred
   → Cache bust = 12x cost at fleet scale. Sort tool pools deterministically.
     Latch beta headers. Hash content paths. Never put timestamps in your prefix.
     This is not an optimization — it's architecture.

3. Fail-fast for safety, fail-open for UX
   → Permission denied? exit(1). MCP server unreachable? Show what you have.
     One strategy for both = guaranteed wrong for one.

4. Just enough complexity
   → 34-line store > Redux (when you have 3 pieces of state).
     If you can't explain a component in one sentence, it's too complex.
     If you can't fill one sentence, it's too simple.

5. Tool interface IS the extension point
   → MCP tools = built-in tools. Same pipeline, same validation, same permissions.
     If external contributors learn a different abstraction than internal code uses,
     you've created an unnecessary seam.

6. Persist before the crash boundary
   → If the process can die at line X, state must be on disk before X.
     User messages saved before the API call. Transcript saved before compaction.
     "I lost my conversation" is an architecture bug, not bad luck.

7. Hide latency, don't reduce it
   → Start I/O before you need the result. Preconnect during setup.
     Read files A, B, C concurrently. If every I/O call blocks something,
     you're leaving latency on the table.

How to build an MCP tool system with proper security

The Tool System module covers the 10 patterns that make MCP tools first-class citizens — same validation pipeline, same permissions, same lifecycle as built-in tools. Key insight: MCP tools default fail-closed (isConcurrencySafe: false, isReadOnly: false). Omitting a security field is safe, not dangerous.

The Permission & Safety module covers the 11 patterns for the full permission model — from six strategy-based modes to the six-layer Bash permission cascade that uses tree-sitter AST analysis (not regex) to catch rm -rf "$VAR".

How to manage context windows without losing state

The Agentic Loop module covers the 5-stage context management pipeline: tool result budgets → history snip → microcompact (per-tool-type retention thresholds) → context collapse → autocompact (forked summarization agent with circuit breaker). Each stage is a pure function. Cheap stages run first. Expensive summarization fires only when everything else fails.

How to orchestrate multiple AI agents

The Agent Orchestration module covers 9 patterns for multi-agent systems. The key pattern: spawning a sub-agent IS running another conversation — the Agent tool calls the same query() function as the main loop. Zero feature drift between parent and child. Also covers: fork cache sharing (N children share one prompt-cache entry), the 50-message queue cap that prevented OOM at 292 concurrent agents, and mailbox-based permission synchronization for swarms.

How to build a hook and plugin system for AI tools

The Hooks & Extensibility module covers 12 patterns: 26 typed lifecycle events, 6 hook execution types (shell → LLM call → subagent), exit-code-as-contract, TOCTOU-safe snapshot isolation, frontmatter-driven skill configuration, conditional path-based activation, namespaced plugin architecture with impersonation protection, and self-authoring via /skillify.

Quick start

Use as a Claude Code / Codex skill

git clone https://github.com/cauchyturing/agent-harness-engineering.git
ln -s "$(pwd)/agent-harness-engineering" ~/.claude/skills/agent-harness-engineering

Then invoke with /agent-harness-engineering. The skill only loads the 1-2 modules relevant to your current task — context discipline is principle #1.

Works with Claude Code, Codex, and any AI coding assistant that supports markdown skills.

Use as a standalone reference

Each module is self-contained. Read what you need:

I'm building...	Start here
An agentic loop	01-agentic-loop.md
A tool execution system	03-tool-system.md
A permission model	05-permission-safety.md
A hook/plugin system	06-hooks-extensibility.md
Nothing yet — just want to understand	08-philosophy.md

Audit an existing agentic tool

Run the 58-item checklist against your harness:

70% overall, no category < 50% → Minimum viable
90% overall, no category < 75% → Production-grade

Pattern format

Every pattern follows a consistent structure:

### N. Pattern Name
Problem:        What engineering challenge does this solve?
Pattern:        The solution in 2-3 sentences.
Implementation: Concrete pseudocode (language-agnostic principles, TypeScript-flavored examples).
Why it works:   The engineering reasoning — not "because best practice", but the actual mechanism.
Anti-pattern:   What to avoid and why — often from real incidents.
See also:       Cross-references to related patterns in other modules.

How this was made

8 parallel deep-analysis agents, each specializing in one subsystem of a 512K-line production agentic codebase:

Core Runtime & Bootstrap
Tool System Architecture
Hook & Permission Model
Services & LLM Integration
UI Components & Rendering
Skill & Command System
Bridge, Remote & Task System
Utils & Infrastructure

Raw analysis → 3,100-line synthesis → modular distillation → this repo.

The agents found the patterns. I verified them against production incidents. If a pattern didn't have a real anti-pattern that actually went wrong, it didn't make the cut.

Repo structure

agent-harness-engineering/
├── SKILL.md                     # AI skill entry point (routing table)
├── references/
│   ├── 00-architecture.md       #  7 patterns — system design
│   ├── 01-agentic-loop.md      #  8 patterns — the core loop
│   ├── 02-llm-integration.md   #  9 patterns — LLM API layer
│   ├── 03-tool-system.md       # 10 patterns — tool execution
│   ├── 04-agent-orchestration.md #  9 patterns — multi-agent
│   ├── 05-permission-safety.md  # 11 patterns — security model
│   ├── 06-hooks-extensibility.md # 12 patterns — extension system
│   ├── 07-ui-infrastructure.md  # 13 patterns — terminal & infra
│   └── 08-philosophy.md        # 12 principles — generative wisdom
└── checklists/
    └── harness-audit.md         # 58-item evaluation checklist

Contributing

Patterns must be:

Proven — from production code, not whiteboards
Generalizable — language/framework agnostic where possible
Actionable — pseudocode or it didn't happen
Honest — every pattern needs an anti-pattern from a real failure

License

MIT

Built by Stephen — founder of Abel AI, the social-physical engine driven by causal AI.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
checklists		checklists
references		references
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
SKILL.md		SKILL.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

We reverse-engineered a 512K-line CC codebase. Here are the 91 patterns that actually matter.

Why most AI agents break in production

What's inside

How to design an agentic loop that doesn't break

How to build an MCP tool system with proper security

How to manage context windows without losing state

How to orchestrate multiple AI agents

How to build a hook and plugin system for AI tools

Quick start

Use as a Claude Code / Codex skill

Use as a standalone reference

Audit an existing agentic tool

Pattern format

How this was made

Repo structure

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

We reverse-engineered a 512K-line CC codebase. Here are the 91 patterns that actually matter.

Why most AI agents break in production

What's inside

How to design an agentic loop that doesn't break

How to build an MCP tool system with proper security

How to manage context windows without losing state

How to orchestrate multiple AI agents

How to build a hook and plugin system for AI tools

Quick start

Use as a Claude Code / Codex skill

Use as a standalone reference

Audit an existing agentic tool

Pattern format

How this was made

Repo structure

Contributing

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages