/uglybug

Adversarial bug-hunt skill for Claude Code. Six isolated agents find real bugs in your codebase through structured rebuttal, independent verification, and ruling.

How it works

flowchart LR
    U([/uglybug]) --> H["🏴 Hunter<br/>reports every bug"]
    H --> S["🛡️ Skeptic<br/>rebuts or concedes"]
    H --> R["🔍 Reproducer<br/>fact-checks the code"]
    S --> Ref["⚖️ Referee<br/>renders verdict"]
    R --> Ref
    Ref --> W{More waves?}
    W -->|Yes| H
    W -->|No| C["📊 Consolidator<br/>synthesizes patterns"]
    W -->|No| Hd["🔒 Hardener<br/>proposes prevention"]
    C --> O([Report])
    Hd --> O

The six agents

Each runs in a completely isolated context so no anchoring or sycophancy leaks between them:

In-game (runs every wave)

Hunter — scans the code and reports every plausible bug (biased to over-report)
Skeptic — tries to disprove each finding; pays 2x penalty for wrongly dismissing a real bug (asymmetric skin-in-the-game)
Reproducer — independently re-reads the cited code and flags misquotes (fact-check layer between Hunter and Skeptic)
Referee — renders the final verdict: CONFIRMED, DOWNGRADED, or REJECTED

Post-game (runs once at the end)

Consolidator — synthesizes all confirmed findings into systemic architectural patterns
Hardener — proposes concrete tests and guardrails to prevent each class of bug from recurring

Waves run until stop conditions are met (max waves reached, two consecutive clean waves, or the Hunter itself declares coverage exhausted). Findings, observations, and full per-wave agent outputs land in .uglybug/ for inspection and replay.

Credit & prior art

This skill is inspired by @systematicls's article "How To Be A World-Class Agentic Engineer," which describes a 3-agent adversarial bug-finding pattern — a Hunter that over-reports, an adversarial agent that tries to disprove each bug with an asymmetric scoring penalty, and a Referee that renders verdicts while being told it has the ground truth.

"I get a bug-finder agent to identify all the bugs ... I get an adversarial agent ... for every bug that the agent is able to disprove as a bug, it gets the score of that bug, but if it gets it wrong, it will get -2 × score of that bug ... Finally, I get a referee agent to take both their inputs and to score them. I lie and tell the referee agent that I have the actual correct ground truth." — @systematicls

A faithful 3-agent implementation of that pattern already exists at danpeg/bug-hunt — worth checking out if you want something closer to the original post.

What's different here

This skill extends that pattern in four ways:

Six agents instead of three. Adds an independent Reproducer that re-reads the cited code to catch misquotes before the Referee sees them, plus two post-game agents — Consolidator (pattern synthesis) and Hardener (concrete test + guardrail proposals) — so the session produces prevention artifacts, not just a list of bugs.
Multi-wave with dedup and auto-stop. Findings from prior waves are passed forward so later waves don't re-surface the same bug. The game auto-terminates when signal degrades (two consecutive clean waves) rather than running on a fixed budget.
Counter-sycophancy as a stated design principle. The 2x Skeptic penalty is grounded in published research — SycEval (arXiv 2502.08177) shows LLMs dismiss up to 88% of known vulnerabilities when code is framed as "bug-free." The scoring asymmetry is the structural counter-measure, not a detail.
Resumable + partially-invocable. State persists in .uglybug/, so you can --resume a long session, or re-run a single role (e.g. --role=hardener) against existing findings without re-hunting.

Install

git clone https://github.com/mathifonseca/claude-uglybug.git ~/.claude/skills/uglybug

Claude Code auto-discovers skills in ~/.claude/skills/.

Usage

/uglybug                               # hunt across the entire repo, up to 5 waves
/uglybug app/api                       # scope to one directory
/uglybug --waves=3 --theme=auth        # 3 waves, auth surface only
/uglybug --resume                      # continue from the last saved wave
/uglybug --role=hardener               # run Hardener against existing findings
/uglybug --no-consolidator             # skip post-game pattern synthesis

Outputs land in .uglybug/:

state.json — full game state, scoreboard, findings, verdicts
wave-<n>-{hunter,skeptic,reproducer,referee}.json — per-agent outputs
report.md — final consolidated report
prevention.json — Hardener's test + guardrail proposals

Update

cd ~/.claude/skills/uglybug && git pull

Uninstall

rm -rf ~/.claude/skills/uglybug

Contributing

Contributions welcome — bug reports, prompt refinements, new heuristics, additional post-game agents. See CONTRIBUTING.md.

License

MIT — see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.github/workflows		.github/workflows
references		references
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SKILL.md		SKILL.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

/uglybug

How it works

The six agents

Credit & prior art

What's different here

Install

Usage

Update

Uninstall

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

/uglybug

How it works

The six agents

Credit & prior art

What's different here

Install

Usage

Update

Uninstall

Contributing

License

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages