Quorum

Orchestrate a swarm of AI experts on any question. From 3 agents to 1,000. One command, multiple minds, stress-tested answers.

Type one command. Quorum spins up a team of specialists — researchers, analysts, skeptics, domain experts — makes them work the problem from every angle, debate each other, validate their claims, and deliver a synthesized verdict you can actually act on.

Like having a room full of smart people argue about your question before anyone gives you the answer. Or, in swarm mode, an entire conference hall.

/quorum "your question here"

Built by qinnovate | Full docs

Why Quorum?

When you ask AI a question, you get one perspective. It sounds confident. It might be completely wrong. And you have no way to know.

Quorum gives you what a single AI agent never can: a second opinion, a third opinion, a devil's advocate, and a fact-checker — all at once.

Here's what happens when you run it:

A team assembles. Quorum picks the right experts for your question — a strategist, a skeptic, a researcher, a domain specialist, someone from a completely different field
They work independently. No groupthink. Each expert tackles the question from their unique angle without seeing anyone else's answer
They challenge each other. The skeptic pokes holes. The researcher checks sources. Debate pairs argue the real disagreements
A supervisor synthesizes. Not a vote count — an authored judgment that weighs reasoning quality, surfaces buried insights, and tells you what actually matters
You get the verdict. What survived scrutiny, what's still disputed, and what to do next

The result: answers that have been stress-tested by multiple minds before you see them.

Quick Start

# Install
claude install qinnovates/quorum

# Ask anything — 5 agents, auto-configured
/quorum "Should we use PostgreSQL or DynamoDB for our new service?"

# Stress-test a decision — full adversarial convergence
/quorum "Should we build or buy our auth system?" --max

# Build something — auto-generates battle-tested PRD + Ralph loop
/quorum "Build a REST API for user auth with JWT" --max

# Review a document
/quorum "Review this proposal for risks" --artifact proposal.md

# Massive scale — swarm auto-engages at 20+
/quorum "Impact of EU AI Act on BCI startups" --set 200

# See the plan before spending tokens
/quorum "your question" --max --dry-run

How It Works

Quorum runs a multi-phase pipeline. You don't need to understand this to use it — just type /quorum "question" and go. But if you want the details:

Standard mode (3-17 agents):

Setup — Supervisor analyzes your question, picks the right experts and assigns each a unique angle
Independent work — All agents work in parallel, no one sees anyone else's output
Triage — Supervisor reads all reports, drops low-value agents, identifies key disagreements
Cross-review — Selected agents debate each other directly. Devil's Advocate challenges the majority
Synthesis — Supervisor authors the final report with editorial judgment
Validation — Adversarial reviewer challenges the synthesis (web fact-check preferred; same-session agent review as fallback — see Limitations)
Final report — What survived, what's disputed, what to do next

Converse mode (--converse):

Propose — Proposer puts a solution on the table
Contradict — Realist and Breaker attack with specific failure modes and counter-proposals
Defend or abandon — Proposer revises; critics attack the fixes; Synthesizer reports what survived
Converge — Judge declares: CONVERGED (survived attack) / TENSION (irreducible tradeoff) / EXHAUSTED (diminishing returns)

Swarm mode (20-1000+ agents):

Taxonomy — Partition Engine decomposes the problem into non-overlapping territories
Spawn — One agent per territory, persona derived from its domain
Simulation — Agents POST findings, REACT to others, HANDOFF cross-territory discoveries, SHIFT positions across multiple rounds
Pattern extraction — Environment Server identifies opinion clusters, polarizations, cascades, coalitions
Synthesis — Supervisor reads patterns (not individual reports), interviews key agents
Structural challenge — Socrates questions clusters, Plato audits evidence
Validation — Adversarial reviewer challenges the synthesis
Final report — Emergent consensus, polarizations, sentiment trajectory, what to do next

Full architecture documentation →

Modes

Quorum scales in five tiers. Each tier is a fundamentally different kind of reasoning — not just more agents.

Mode	Agents	Structure	When to Use
Flat (`--lite`, default)	3-8	Single panel, everyone debates everyone	Focused questions in 1-2 domains
Converse (`--converse`)	5-7	Iterative adversarial convergence, multiple rounds	Battle-tested solutions, architecture decisions, build-vs-buy
Dialectic (`--rigor dialectic`)	2 + supervisor	Socratic dialogue, 3-5 rounds of deepening	Deep understanding, philosophical/strategic questions
Subteam/Org (`--teams`, `--org`)	9-22	Named teams + Socrates + Plato	Cross-domain complexity, 3+ domains with different incentives
Swarm (`--swarm`)	20-1000+	Taxonomy-partitioned, environment-based coordination	Predictions, landscape surveys, exhaustive red teaming

Flat (Default)

/quorum "Should we use Rust or Go for our CLI?"        # 5 agents, auto-configured
/quorum "Best Python web framework?" --lite             # 3 agents, fast
/quorum "Review this API design" --artifact spec.yaml   # Document review

A single panel of experts debates in parallel. The supervisor coordinates directly. Best for focused questions in 1-2 domains where 5-8 agents provide enough coverage.

Converse Mode (`--converse`)

/quorum "Best approach to real-time anomaly detection?" --converse
/quorum "Build or buy our auth system?" --converse --full
/quorum "Most robust BCI auth method?" --converse --mode research

The full panel stays in the room across multiple rounds. Solutions must survive sustained adversarial attack. Five core roles:

Role	What they do
Proposer	Goes first. Defends and adapts across rounds.
Realist	"This fails because X, and here's what survives X." Every criticism includes a survival path.
Breaker	Red-teams the proposal. Self-rates attacks. When they say "I can't break this" — strongest signal in the system.
Synthesizer	Reports what survived and what collapsed at checkpoints.
Judge	Neutral arbiter. Tracks convergence. Ends conversation when signal plateaus.

--full adds Historian (precedent) and Survivor (pessimistic but constructive).

The agent ratio is research-backed — 40% adversarial / 60% constructive, derived from convergent findings across 8 domains:

Nemeth (2001): Assigned devil's advocacy makes people MORE entrenched. Critics must hold authentic positions with counter-proposals.
Schweiger (1986): Counter-plans beat pure critique by 34% on decision quality.
Liang et al. (2023): AI debate performance DROPS at 4+ adversarial agents. Context overload.
Asch (1951): Single dissenter reduces conformity 32%→5%. Two establishes credible minority (Moscovici 1969).

Anti-duplication rule: No repetition across rounds. "This won't work" is not allowed — "This won't work because X, and here's what would survive X" is required. No free nihilism.

Three outcomes: CONVERGED (solution survived attack), TENSION (irreducible tradeoff — user decides), EXHAUSTED (no convergence, best surviving proposal reported).

Dialectic Mode (`--rigor dialectic`)

/quorum "Should we open-source our core product?" --rigor dialectic

Two agents enter a Socratic dialogue that spirals deeper through contradiction:

Thesis states a position with evidence
Antithesis finds the contradiction, the unstated assumption, the edge case where it breaks
Each round goes deeper, not wider — no new topics, no retreating to generalities
The supervisor calls it when: Synthesis (combined position), Bedrock (irreducible values disagreement), or Spark (a new question nobody started with)

Subteam/Org Mode (`--teams`, `--org`)

/quorum "Review our BCI security posture" --teams "engineering,legal,clinical"
/quorum "Should we ship this feature?" --org

When a question crosses 3+ domains with different incentives, flat swarms produce noise. Subteams solve this with local consensus before global debate. Teams deliberate internally, then team leads cross-challenge each other.

Two structural roles appear:

Socrates asks each team lead ONE question targeting their weakest point. Prevents echo chambers.
Plato audits every claim: SUPPORTED or UNSUPPORTED. Prevents hallucination.

Swarm Mode (`--swarm`)

/quorum "Impact of EU AI Act on BCI startups" --swarm
/quorum "Will neural data be biometric by 2028?" --swarm --predict --size 200
/quorum "Red team our auth" --swarm --branches "network,app,social,physical,supply-chain"

Scales from 20 to 1000+ agents using taxonomy-partitioned territories and environment-based coordination. The supervisor reads emerging patterns, not individual reports. (The environment server is currently prompt-orchestrated — true scaling depends on effective summarization.)

Prediction mode (--predict) adds probabilistic activation, sentiment trajectory tracking, and coalition detection for Delphi-method forecasting.

Full guide: when to use each mode →

Superpower Mode (Auto-Detected)

When you say "build", "implement", "create", "scaffold", "write a", "set up", or "add feature", the supervisor auto-triggers the superpower pipeline. No flag needed.

/quorum "Build a REST API for user auth with JWT" --max

What happens:

Supervisor detects implementation intent ("Build") → triggers superpower pipeline
Decomposition agent generates a PRD with TDD enforcement:
- Exact file paths for every file created or modified
- Bite-sized tasks (one action each, 2-5 minutes)
- Each task: write failing test → verify fail → implement → verify pass → commit
- Machine-verifiable acceptance criteria (not "works correctly" but "returns 200 with valid JWT containing user_id claim")
Adversarial convergence stress-tests the PRD (only with --max):
- Architect: "Are the boundaries right? Missing abstractions?"
- Breaker: "Which acceptance criteria are ambiguous? Edge cases?"
- TDD Enforcer: "Is every task actually testable? Assertions specific enough?"
- Pragmatist: "Is this over-engineered? Can tasks be eliminated?"
- Judge: Computes convergence score. Declares READY or sends back for revision.
Output: _swarm/prd-user-auth.md with Ralph loop command

# Run the PRD autonomously
./quorum/scripts/ralph.sh --prd _swarm/prd-user-auth.md

The Ralph loop executes each task with fresh context: reads PRD + progress.md → picks next task → TDD (test → fail → implement → pass → commit) → updates progress → every 3 tasks runs Quorum review to catch regressions → repeats until done.

With vs without --max:

	`/quorum "Build X"`	`/quorum "Build X" --max`
PRD generation	5 agents (lighter)	7-15 agents (full decomposition)
Stress-test	No adversarial review	Full convergence (5 personas, C ≥ 0.8)
Output quality	Good for small features	Battle-tested for production systems

No --superpower flag exists. Same capability, zero cognitive load.

Rules for Prompts That Don't Produce Garbage

Name the exact pipeline, not the app. "Fix the detection pipeline" not "fix Spot"
State the current broken state. "Currently nothing detects" not "make it work"
Attach the design doc. --artifact gives the AI the spec, not vibes
One pipeline per prompt. Don't combine LiDAR + detection + onboarding + App Store
Include a constraint verb. "Trace before touching code" forces investigation before editing

The single best improvement: always include --artifact pointing to your design doc. That's the difference between "build me a thing" and "build this specific thing to this spec."

# Bad: vibes
/quorum "Build the auth system" --max

# Good: spec-driven
/quorum "Build the auth system to this spec" --artifact AUTH-DESIGN.md --max

Subagent Validation (Fresh-Context Testing)

For validation and testing phases, Quorum can spawn Claude Code subagents — fresh instances with zero knowledge of the main session's conversation. The subagent does not know what the "right" answer is, so it tests honestly. This eliminates confirmation bias structurally, not just by prompt instruction.

# Main session designs test protocol with pass/fail criteria
# Subagent executes in fresh context — no anchoring, no expected outcomes
# Subagent reports raw PASS/FAIL with evidence
# Main session interprets results — discrepancies are the signal

This pattern proved its value in production: a validation subagent caught a real bug that the main session missed, because the main session was anchored on prior discussion. See Architecture: Subagent Execution Model for the full specification.

What Makes Quorum Different

Every multi-agent tool in the Claude Code ecosystem does the same thing: splits a task, hands each piece to an agent, collects answers, merges them. More hands, same brain. If all 8 agents hallucinate the same thing, you get a confident, well-formatted wrong answer.

Quorum is the only plugin that asks: "How do we know this answer is actually right?"

What Quorum adds that none of them have:

Agents assigned opposing positions, forced to defend them with evidence
A Devil's Advocate who argues against the majority — because the answer that survives pushback is the one worth trusting
Challenge agents get less context on purpose so they can't just agree with everyone
Research agents search different sources with different terms — not the same Google result five times
The supervisor judges reasoning quality, not vote counts — a well-argued minority beats a hand-waving majority
Converse mode — research-backed adversarial convergence where critics must counter-propose, not just attack
Dialectic mode — two agents drill through contradiction across multiple rounds until they hit bedrock
Swarm mode — 1000+ agents with taxonomy-partitioned territories and emergent pattern detection

Anti-Boxing

When you give an AI a project profile and a classification gate, it starts only pulling from familiar domains, only spawning agents it already knows. The profile IS the box. Every efficiency optimization that prunes "low-signal" agents is killing exactly the perspectives that would break the box.

Anti-boxing is Quorum's structural guarantee that the system keeps reaching outside its own comfort zone. It is not an established term in AI/ML literature — it draws from lateral thinking (de Bono), structured dissent (Janis), adversarial robustness, and cognitive diversity research. Anti-boxing as a named architectural pattern for multi-agent systems is original to Quorum.

The 6 anti-boxing rules:

Domain Outsider never from the profile's default domains. The outsider's value comes from NOT being in the profile.
Classification gate scores the question, not the project. A business question in a research repo gets business agents.
Condition-based outsider injection. High consensus with low challenge → inject a lateral thinker.
Exploratory queries invert the profile. "What am I missing?" spawns from domains the profile doesn't list.
Adversarial agents are immune to pruning. Devil's Advocate and Provocateur can never be killed by efficiency rules.
Inverted early termination. When everyone agrees, scrutiny goes UP. Unanimous consensus is the highest-risk scenario.

Validation & BS Detection (5 layers)

Layer	What It Does
Source Grading	Every finding rated STRONG / MODERATE / WEAK / UNVERIFIED
Contradiction Check	Catches when agents disagree AND when they agree without evidence
Hallucination Red Flags	Supervisor checklist for fabricated stats, fake citations, too-clean numbers
Adversarial Validation	Reviewer challenges the synthesis (web search preferred, subagent with fresh context, or same-session agent review as fallback)
Transparent Output	Report shows what's verified, what's unresolved, what couldn't be checked

The rule: If it can't be sourced, it gets flagged. If it can't be verified, it says so. If agents disagree, both sides are shown. You decide — not the AI.

Validation Verdicts (for review/audit workflows)

Verdict	Meaning
VALIDATED	Evidence found, no refutation
FLAGGED	Below threshold or panel disagreed — needs human review
BLOCKED	Consensus: unsupported, contradicted, or hallucinated

Examples

It's not a developer tool. It's a thinking tool. Any question where you'd want a smart friend to push back before you commit.

# Quick opinion — 5 agents, done in 2 minutes
/quorum "Should we use PostgreSQL or DynamoDB for our new service?"

# Stress-test a decision — full adversarial convergence
/quorum "Should we build or buy our auth system?" --max

# Build something — auto-detects superpower mode, generates battle-tested PRD
/quorum "Build a REST API for user auth with JWT" --max
# → Supervisor detects "Build" → generates PRD with TDD + acceptance criteria
# → Adversarial convergence stress-tests the PRD
# → Outputs: _swarm/prd-user-auth.md with Ralph loop command
# → Run: ./quorum/scripts/ralph.sh --prd _swarm/prd-user-auth.md

# Settling an argument — auto-routes to dialectic (binary question detected)
/quorum "Is a hot dog a sandwich?"

# Buying a house
/quorum "Found a house for $450K, 1960s build, no inspection. What should first-time buyers worry about?"

# Career crossroads — auto-routes to dialectic (two-option tension)
/quorum "Making $180K in fintech, offered $140K at a climate startup. Worth the pay cut?"

# Before a job interview
/quorum "Interviewing at Stripe for senior security. What will they ask I'm not preparing for?"

# Evaluating a business idea
/quorum "App that matches dog owners for group walks. Business or feature?" --max

# Document review
/quorum "Review this contract for risks I might miss" --artifact contract.pdf

# Research landscape — auto-routes to web research
/quorum "Complete landscape of EEG-based authentication methods"

# Validate prior research (two-stage)
/quorum "Fact-check for hallucinations" --artifact _swarm/eeg-auth.md --no-web

# Massive scale prediction — swarm auto-engages at 20+
/quorum "Will BCI startups consolidate or fragment by 2028?" --set 200

# Exhaustive red team at scale
/quorum "Red team our auth system — every attack vector" --set 100

# Private, no web searches
/quorum "Evaluate our internal security posture" --artifact audit.md --no-web

# See the plan before spending tokens
/quorum "Microservices or monolith?" --max --dry-run

Options

Three tiers:

Tier	Flag	Agents	What Happens
Default	(none)	5	SME panel debates, supervisor synthesizes
Max	`--max`	7-15	Full adversarial convergence, teams/dialectic/superpower auto-selected
Custom	`--set N`	N	Swarm auto-engages at 20+

Four optional flags:

Flag	Why It Can't Be Auto-Detected
`--artifact PATH`	Supervisor can't know which file you mean
`--no-web`	Privacy choice only the user can make
`--ponder`	User explicitly wants Q&A before the swarm runs
`--dry-run`	User wants to see the plan without spending tokens

Everything else is auto-detected. The supervisor picks mode, structure, rigor, research, teams, dialectic, superpower — based on what you asked and what it finds.

Output Format

Every standard report includes:

Executive Summary — 3-5 sentences, degree of consensus, key finding
Supervisor's Assessment — The quorum's own judgment (most valuable section)
Confidence & Verification — What's backed by evidence vs. supervisor judgment
Disagreement Register — Unresolved disputes with both positions preserved
Priority Actions — Ranked by impact, not by how many agents mentioned them
Blind Spots — What the team collectively could not evaluate

Converse mode adds:

Attack Resistance Map — Each surviving component and which attacks it withstood

Swarm reports add:

Emergent Consensus — Findings agents in unrelated territories reached independently (strongest signal)
Polarizations — Genuine disagreements with evidence on both sides
Cascades — Findings that changed the swarm's trajectory
Coalition Map — Which territory groups aligned on which recommendations
Sentiment Trajectory — How the swarm's collective position evolved across rounds

Honest Limitations

Quorum is a reasoning quality tool, not a magic truth machine. These are the known structural limitations:

The validation gate is not truly independent. Phase 5's agent review uses a separate agent in the same Claude session. This is prompt-level independence, not structural independence. Lorenz et al. (2011) showed even mild social influence narrows diversity. Web search fact-checking (Method 1) provides genuinely independent evidence; agent review (Method 2) provides useful-but-limited adversarial review. Neither is a substitute for human review.
The environment server is simulated. Swarm mode's Environment Server and Pattern Detection are prompt-orchestrated, not a persistent runtime service. The supervisor summarizes agent outputs — the quality of pattern detection depends on summarization quality, not a dedicated algorithm. True O(patterns) scaling is a design goal, not a current guarantee.
Agent count scaling is heuristic. The Task Classification Gate (score 0-12 → agent count 3-17) uses operational heuristics, not research-derived thresholds. The research supports specific tiers: 3 (minimal), 5-7 (synchronous optimum per Dunbar/Woolley), then jump to asynchronous/swarm. The intermediate counts (8, 10, 14) are interpolated.
Hallucination is structural. No amount of multi-agent debate eliminates hallucination. Quorum makes it visible and reduces it. It does not and cannot eliminate it. Every report is a starting point for human judgment.

Version History

v5.2.0 — Converse Mode (2026-03-22)

Research-backed iterative adversarial convergence. 5-7 agents, 40/60 adversarial ratio, 10 peer-reviewed citations. Adversarial minimum raised from 1→2 for all swarm sizes ≥5 (Moscovici 1969). Validation gate honesty disclosure. Swarm O(patterns) honesty disclosure.

v5.1.0 — Outcome Predictor (2026-03-22)

Outcome Ledger, Calibrate mode, Monitor mode, Structured Seed Data, Visualization Export, Temporal Simulation.

v5.0.0 — Swarm Mode (2026-03-22)

20-1000+ agents with MECE taxonomy, environment-based coordination, activation scheduling, prediction mode. Inspired by MiroFish/OASIS.

v4.1.0 — Divergence Engine (2026-03-17)

Provocateur archetype, EXPLORE mode, structural protections, security hardening.

v4.0.0 — Adaptive Intelligence (2026-03-17)

Project Profiles, Task Classification Gate, Config Transparency Block, Adaptive Output Templates.

v3.2.0 — Security Hardening (2026-03-17)

Removed shell access from agent manifest, injection defense, credential detection.

v3.1.0 — Epistemic Quality Gate (2026-03-17)

Two-stage research + validation workflow. VALIDATED / FLAGGED / BLOCKED verdicts.

v3.0.0 — Subteams & Dialectic (2026-03-14)

Subteam/Org modes, Socrates + Plato, Dialectic mode, 5-layer validation pipeline.

Full changelog →

On Hallucination

No LLM is hallucination-proof. Not GPT-4. Not Claude. Not any model running inside Quorum. Hallucination is not a bug — it is a structural property of how these systems work.

Every transformer output is a probability sample from a learned distribution, not a fact lookup (Vaswani et al. 2017). Deletang et al. (2024) showed that language modeling and lossless compression are mathematically equivalent — transformers are powerful general-purpose compressors. But a model with finite parameters cannot losslessly encode a training corpus with greater entropy (Shannon 1948). The weights are necessarily a lossy compression, and lossy decompression produces artifacts. In images, those artifacts are JPEG blocks. In language models, they are hallucinations (Chlon et al. 2025). The math does not permit zero error.

This is not unique to machines. Artificial neural networks were modeled after biological neurons (McCulloch & Pitts 1943). Biological brains also confabulate — reconstructing memories from statistical patterns rather than retrieving stored records (Bartlett 1932, Schacter 1999, Loftus & Palmer 1974). Both systems fill gaps with plausible guesses. The difference is that we built the LLM, so we can study the mechanism.

Quorum's 5-layer validation pipeline, adversarial agents, and evidence audits reduce hallucination. They make it visible. They do not eliminate it. Every Quorum report is a starting point for human judgment, not a replacement for it.

Models will get more accurate. The rates will shrink. They will not reach zero, because probability in an indeterministic world means errors are structural, not temporary. That is what keeps us learning.

Full scientific explanation with citations →

Documentation

Usage Guide — When to use flat vs converse vs dialectic vs subteams vs swarm. Decision matrix, cost guide, examples
Architecture — Full phase-by-phase technical specification
Prompt Templates — All agent templates with variable reference
Safety & Privacy — Guardrails, privacy disclosure, tool permissions
Privacy Policy — Full privacy policy for all qinnovate tools
Changelog — Version history and what changed
Releases — GitHub releases with download links

License

MIT

Author

Kevin Qi — qinnovate.com

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
docs		docs
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
SKILL.md		SKILL.md

Folders and files

Latest commit

History

Repository files navigation

Quorum

Why Quorum?

Quick Start

How It Works

Modes

Flat (Default)

Converse Mode (--converse)

Dialectic Mode (--rigor dialectic)

Subteam/Org Mode (--teams, --org)

Swarm Mode (--swarm)

Superpower Mode (Auto-Detected)

Rules for Prompts That Don't Produce Garbage

Subagent Validation (Fresh-Context Testing)

What Makes Quorum Different

Anti-Boxing

Validation & BS Detection (5 layers)

Validation Verdicts (for review/audit workflows)

Examples

Options

Output Format

Honest Limitations

v5.2.0 — Converse Mode (2026-03-22)

v5.1.0 — Outcome Predictor (2026-03-22)

v5.0.0 — Swarm Mode (2026-03-22)

v4.1.0 — Divergence Engine (2026-03-17)

v4.0.0 — Adaptive Intelligence (2026-03-17)

v3.2.0 — Security Hardening (2026-03-17)

v3.1.0 — Epistemic Quality Gate (2026-03-17)

v3.0.0 — Subteams & Dialectic (2026-03-14)

On Hallucination

Documentation

License

Author

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Converse Mode (`--converse`)

Dialectic Mode (`--rigor dialectic`)

Subteam/Org Mode (`--teams`, `--org`)

Swarm Mode (`--swarm`)

Packages