A Claude Code skill that performs deep critical analysis of documents, claims, and threads through multi-agent adversarial debates. Instead of simple summarization, it pits an Advocate against a Skeptic in isolated contexts, scores each round with a Judge, then verifies results independently and synthesizes contradictions.
The key advantage over single-context analysis: Advocate and Skeptic run as separate Claude Code Agent subagents with isolated contexts. They never see each other's prompts. This guarantees genuine perspective separation, not simulated role-switching in one window.
Input → Segmentation → Steelmanning → Adversarial Debate → Chain of Verification → Hegelian Synthesis → Report
| Phase | What happens | Why it matters |
|---|---|---|
| 1. Input | Read files (PDF, markdown, URLs) or claims. Auto-detect language. | Handles any input format. Output language matches input. |
| 2. Segmentation | Extract 3-7 key claims/themes. User confirms before proceeding. | Focuses analysis on what matters. |
| 3. Steelmanning | Strengthen each claim to its best possible version. | Prevents attacking straw men. Debates test the strongest version. |
| 4. Adversarial Debate | Advocate and Skeptic argue in parallel (isolated agents), Judge scores each round. 2-7 rounds with dynamic stopping. | Genuine perspective separation. No hedging. |
| 5. Chain of Verification | Independent Verifier checks conclusions against sources — without seeing debate verdicts. | Factored verification breaks circular self-confirmation. |
| 6. Hegelian Synthesis | Thesis + Antithesis → Synthesis. Cross-segment patterns, contradictions, integrated understanding. | Finds what no single source shows alone. |
| 7. Output | Narrative report as analytical article (not bullet points). Saved as markdown. | Readable, actionable, with full debate transcripts. |
Orchestrator (you) = Judge + Synthesizer
├── Agent: Steelman (per segment, parallel)
├── Agent: Advocate ──╮
│ ├── run in parallel, isolated contexts
├── Agent: Skeptic ──╯
└── Agent: Verifier (factored — no access to verdicts)
Each round, the Judge scores both sides on three scales (1-10):
- Evidence Strength — backed by facts from sources?
- Logical Coherence — no contradictions or gaps?
- Practical Relevance — connected to real decisions?
Dynamic stopping: debates end when arguments become cyclic, one side dominates for 2 rounds (5+ point lead), scores converge (<2 point gap for 2 rounds), or round 7 is reached. Minimum 2 rounds always.
Verdicts: STRONG / MODERATE / WEAK / CONTESTED
The Skeptic actively hunts for:
- Confirmation bias, Survivorship bias, Anchoring
- Authority bias, Availability heuristic
- Vendor bias, Cherry-picking
- Correlation ≠ Causation
# Create the skill directory
mkdir -p ~/.claude/skills/debate/references
# Copy files (from cloned repo)
cp skill/SKILL.md ~/.claude/skills/debate/SKILL.md
cp skill/references/*.md ~/.claude/skills/debate/references/Add to your Claude Code settings (~/.claude/settings.json):
{
"skills": {
"debate": {
"path": "~/.claude/skills/debate"
}
}
}/debate path/to/report.pdf
/debate quarterly-report.pdf
/debate report1.md report2.md report3.pdf
/debate research-papers/ --focus "Is the methodology sound?"
/debate "Remote work permanently reduces productivity"
Uses web search to gather evidence, then runs the full pipeline.
/debate ./research/
Reads all files in the directory, extracts claims, analyzes.
See examples/AI_Trends_2026_debate_analysis.md for a full analysis of 7 AI trend reports (Stanford HAI, Microsoft, Google Cloud, MIT Sloan, DataArt, Statworx, Adobe).
A live HTML version of this analysis: katokuneva-tech.github.io/ai-trends-2026-debate
The report reads as a narrative analytical article, not a list of bullet points. Each segment tells the story of the debate:
Practically every one of the seven sources names AI agents as the central theme of the year. Statworx, Google Cloud, and Microsoft describe a transition from simple chatbots to autonomous agent systems. Microsoft even introduces the term "AgentOps." At first glance, the data is convincing: 78% of organizations already use AI, inference costs dropped 280x...
However, the Skeptic discovered a critical scale substitution during the debate. The "78% adoption" figure describes AI in general — including trivial ChatGPT usage — not agent systems. Statworx itself describes agents as "on probation." The Skeptic found a precise analogy: in 2017, "BlockchainOps" appeared the same way — infrastructure around a technology doesn't prove maturity...
Each segment includes:
- Narrative analysis (3-5 paragraphs)
- Score dynamics with per-round commentary
- Collapsible full debate transcripts
- Verification status
- Cross-segment synthesis (Hegelian)
- Actionable recommendations grouped by confidence level
~/.claude/skills/debate/
├── SKILL.md # Main orchestrator (7 phases)
└── references/
├── advocate-prompt.md # Advocate role definition
├── skeptic-prompt.md # Skeptic role + bias checklist
├── steelman-prompt.md # Steelmanning instructions
├── judge-rubric.md # Scoring rubric + stopping criteria
├── verifier-prompt.md # Chain of Verification protocol
├── debate-protocol.md # Debate rules and structure
└── output-template.md # Report template + narrative style guide
| Aspect | Summarization | Dialectical Analyzer |
|---|---|---|
| Perspective | Single voice | Advocate vs Skeptic (isolated) |
| Bias detection | None | 8 bias types actively hunted |
| Verification | Self-check | Factored (verifier doesn't see conclusions) |
| Contradictions | Glossed over | Explicitly surfaced and synthesized |
| Output | "Key takeaways" | Narrative with verdicts, evidence, and confidence levels |
| Vendor bias | Invisible | Flagged as systemic blind spot |
Why isolated agent contexts? In single-context systems (like Cursor IDE), the model sees all role prompts and unconsciously hedges. With Claude Code Agent subagents, Advocate and Skeptic run in completely separate contexts — genuine adversarial reasoning.
Why steelmanning before debate? Without it, the Skeptic attacks a straw man version of the argument. Steelmanning ensures debates test the best version of each claim.
Why factored verification? Standard self-check ("reread and correct") is ineffective — models confirm their own conclusions (circular reinforcement). The Verifier receives only sources and claims, not debate outcomes.
Why narrative output? Bullet-point reports lose nuance. The narrative format preserves the story of how conclusions were reached: which arguments survived, which fell apart, and why.
Why dynamic stopping? Fixed round counts waste resources on settled debates and cut short contested ones. Dynamic stopping (convergence, dominance, cycling) adapts to each segment.
- Source-dependent: quality of analysis depends on quality of input documents
- No external fact-checking: the Verifier checks against provided sources, not external databases
- Token-intensive: 5 segments x 7 rounds x 2 agents = significant token usage in worst case
- Language: all prompts are in Russian but the system auto-detects input language and outputs accordingly
This skill was originally built as a multi-agent debate system in Cursor IDE for analyzing 76 PDF reports on AI trends. It was then redesigned for Claude Code to take advantage of isolated Agent subagent contexts — turning simulated role-switching into genuine perspective separation.
MIT