Skip to content

Boomboomdunce/harness-engineering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

harness-engineering

harness-engineering is a portable Agent Skill that helps coding agents (Codex, Claude Code, and other compatible clients) follow best practices in product development and project modification.

The skill brings together insights from OpenAI's harness engineering paradigm, Anthropic's multi-agent research, and the community's Ralph pattern into actionable playbooks, templates, and principles that agents use automatically.

What It Does

When triggered, this skill gives agents:

  • A startup audit — fast harness check when entering any new repository
  • Workflow routing — playbooks for common scenarios (new project, feature dev, long-running build, refactoring, bugfix)
  • Ready-to-use templates — for instruction files, handoff artifacts, sprint contracts, evaluator rubrics, and progress tracking
  • Core principles — repo as system of record, map not encyclopedia, separate planning/doing/judging, verify against reality, structured handoffs, incremental commits, entropy management
  • Context engineering — progressive disclosure, context resets vs compaction, fresh context reliability
  • Multi-agent patterns — when and how to use planner/generator/evaluator architecture

Repository Layout

harness-engineering/
├── README.md
├── .gitignore
└── harness-engineering/
    ├── SKILL.md                        # Core skill — principles, workflow router, guidance
    ├── playbooks/
    │   ├── new-project.md              # Greenfield project kickoff
    │   ├── feature-development.md      # Feature work in existing repo
    │   ├── long-running-build.md       # Multi-session autonomous builds
    │   ├── refactor-cleanup.md         # Refactoring and debt reduction
    │   └── bugfix-investigation.md     # Bug investigation workflow
    ├── templates/
    │   ├── AGENTS.md.template          # Template for project instruction files
    │   ├── handoff-artifact.md         # Template for session handoffs
    │   ├── sprint-contract.md          # Template for sprint contracts
    │   ├── evaluator-rubric.md         # Template for evaluator criteria
    │   └── progress-tracker.json       # Template for feature tracking (JSON)
    └── references/
        └── ecosystem.md               # Harness engineering ecosystem resources

Install

Copy the harness-engineering/ skill directory into the location your client scans for skills.

Codex

Personal install:

mkdir -p ~/.codex/skills
cp -R harness-engineering ~/.codex/skills/

Project install:

mkdir -p /path/to/repo/.agents/skills
cp -R harness-engineering /path/to/repo/.agents/skills/

Claude Code

Personal install:

mkdir -p ~/.claude/skills
cp -R harness-engineering ~/.claude/skills/

Project install:

mkdir -p /path/to/repo/.claude/skills
cp -R harness-engineering /path/to/repo/.claude/skills/

GitHub Copilot CLI

Personal install:

mkdir -p ~/.copilot/skills
cp -R harness-engineering ~/.copilot/skills/

Verify

After installing, ask the agent "what skills are available" or start a task that involves project setup, code review, or long-running development. The skill should trigger automatically.

Core Concepts

Concept Description
Repo as system of record Everything the agent needs lives in the repo — Slack, tickets, and memory don't count
Map, not encyclopedia Instruction files are ~100-line directories pointing to deeper docs
Separate planning, doing, judging Don't let one agent spec, implement, and grade itself
Make quality gradable Convert "make it better" into concrete, weighted criteria
Verify against reality Test the running product, not just the code
Structured handoffs Context reset + handoff artifact beats a bloated session
Work incrementally One feature at a time, commit often, test each feature
Manage entropy Agents replicate patterns — including bad ones. Encode good patterns as lint rules.
Complexity earns its keep Every harness component is a claim the model can't do X. Stress-test those claims.

Playbooks

Playbook When to Use
New Project Starting from scratch — spec expansion, scaffold, incremental build
Feature Development Adding features to an existing codebase
Long-Running Build Multi-hour/multi-session autonomous development
Refactor & Cleanup Tech debt, code cleanup, architectural improvement
Bug Investigation Reproduce → diagnose → test → fix → prevent

Sources

This skill synthesizes:

Compatibility

This repository follows the open Agent Skills format:

Contributing

Contributions welcome via Issues and PRs:

  • Improve or add playbooks
  • Enhance templates
  • Add ecosystem references
  • Share real-world experience reports

License

This repository is licensed under the MIT License.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors