Add adversarial-verification plugin#148
Conversation
|
|
There was a problem hiding this comment.
Claude Code Review
This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.
Tip: disable this comment in your organization's Code Review settings.
Stress-tests claims, designs, and bug findings by dispatching two isolated sub-agents (advocate + skeptic) and synthesizing their arguments into a structured verdict. Counters sycophancy and single- agent agreement bias by forcing maximal disagreement before commit. Two modes: - Decision mode: free-form arguments organized by evaluation dimensions for approach/design choices - Proof mode: N null hypotheses the skeptic proves and advocate refutes, for verifying bug findings and security claims Includes SKILL.md + 5 reference docs: - anti-patterns.md (10 common failure modes with diagnoses) - decision-mode.md (structure for approach selection) - proof-mode.md (N-null-hypothesis structure for finding verification) - prompt-templates.md (advocate/skeptic templates enforcing anti-balance) - synthesis.md (verdict table format and recommendation structure) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
7f05877 to
833dbef
Compare
|
@claude review once |
There was a problem hiding this comment.
Claude Code Review
This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.
Tip: disable this comment in your organization's Code Review settings.
Summary
Adds a new
adversarial-verificationplugin that stress-tests claims, designs, and bug findings by dispatching two isolated sub-agents (advocate + skeptic) and synthesizing their arguments into a structured verdict. Counters sycophancy and single-agent agreement bias by forcing maximal disagreement before the caller commits.What it does
Two modes:
Core principle: isolated sub-agent contexts are non-negotiable. An agent that sees the other side's arguments will soften to accommodate them. The adversarial value comes from each agent arguing without knowledge of the counter-argument, with synthesis happening separately.
Structure
SKILL.md(main entry point with decision tree for mode selection)references/decision-mode.md(structure for approach selection)references/proof-mode.md(N-null-hypothesis structure for finding verification)references/prompt-templates.md(advocate/skeptic templates enforcing anti-balance)references/synthesis.md(verdict table format + recommendation structure)references/anti-patterns.md(10 common failure modes with diagnoses)When to use
Test plan
python3 .github/scripts/validate_codex_skills.pypasses (verified locally).claude-plugin/marketplace.json.codex/skills/adversarial-verification🤖 Generated with Claude Code