One thousand agents. One hive mind.
You type one sentence. Behind the glass, a thousand workers fan out across 16 models, argue in cross-family review meshes, get audited by sealed judges they never knew existed, and converge back into a single answer β usually before your coffee gets cold.
Itβs quieter than youβd expect.
$ copilot
> hive1k h-500 "Document the auth system, add missing tests, and flag rollout risks"
[NEXUS] Booting H-500 swarm...
[NEXUS] Sealing acceptance criteria (8 checks)
[NEXUS] Deploying 2 Division Commanders...
[DIV-ALPHA] Commanding Architecture + Implementation divisions
[DIV-BETA] Commanding Testing + Documentation + Integration divisions
[DIV-ALPHA/CMD-ARCH] Mapping auth boundaries and module ownership
[DIV-ALPHA/CMD-IMPL] Tracing token issuance, refresh, and revocation flows
[DIV-BETA/CMD-TEST] Enumerating missing happy-path, edge-case, and failure-path tests
[DIV-BETA/CMD-DOCS] Drafting operator-facing docs and examples
[DIV-BETA/CMD-INTG] Checking rollout risks across API, web, DB, and monitoring
[REVIEW] Cross-family review mesh started
[SHADOW] 1 criterion failed on first pass β hardening cycle triggered
[SHADOW] Re-validated bundle: 0 critical failures remaining
β
Final bundle ready in 47s
Top outputs:
1. Auth architecture brief with module boundaries
2. Ranked test-gap list with highest-risk paths first
3. Rollout checklist covering cookies, refresh tokens, and observability
4. Updated docs outline for onboarding + operations
Consensus: CONSENSUS on 3/4 major findings
Shadow Score: 12.5% β hardened and accepted
One prompt. 625 agents worked that problem. You got one synthesized answer.
Try it yourself:
hive1k "Map this repo, explain how the major systems fit together, and list the 5 highest-risk gaps"
One model gives you one perspective. For a small task, thatβs fine. For anything with real stakes β architecture that spans six services, a migration that touches every API surface, a security audit where missing one edge case matters β one perspective is a gamble.
The failure that started this project: three sealed judges scored a system design 44β46 out of 50. Shadow scoring β hidden criteria the judges never saw β caught critical arithmetic errors in the same output. Review alone is not validation. Confident and correct are not the same thing.
Hive1K exists because some tasks deserve more than one brainβs best guess. It turns a single request into a structured process: decompose, parallelize across model families, cross-review, validate against sealed criteria, converge. The answer you get back isnβt one modelβs opinion β itβs the output that survived a gauntlet.
And it scales sub-linearly. Five times more agents costs roughly 2.2Γ more wall-clock time. The architecture is parallelism-first, convergence-second.
Picture it as a living organization, not a diagram.
Nexus sits at the top β one orchestrator running on claude-opus-4.6 with a 128K context budget. It reads your mission, decides what divisions are needed, and seals the acceptance criteria in an envelope that no agent below will ever see.
Division Commanders β up to four, named DIV-ALPHA through DIV-DELTA β each own a slice of the mission. Theyβre the recursive layer that Hive1Kβs predecessor (Swarm Command) didnβt have. This is what lets the system scale from hundreds to over a thousand agents without losing coherence.
Commanders (20 total, 5 per division) are domain specialists β architecture, implementation, testing, documentation, integration. They break their domain into micro-tasks.
Squad Leads (200 total) decompose those micro-tasks further and run canary checks before committing workers.
Workers (1,000 total) do the actual atomic work. Theyβre leaf nodes β they execute, they donβt spawn. Each gets a 128-token micro-brief and returns a 256-token atom.
Reviewers (20 total) form a cross-family mesh. Every review pair is intentionally split across model families β Claude reviews GPTβs work, GPT reviews Claudeβs β so agreement means more than self-consistency.
Then the sealed envelope opens. Shadow scoring validates everything against criteria the swarm never optimized for. If the score is too high, a hardening cycle fires. Only what survives gets synthesized into your final answer.
Your question
β
βΌ
ββββββββββββββββββββββββββ
β NEXUS (1) β Decomposes mission
β Seals hidden criteria β Synthesizes final answer
βββββββββββββ¬βββββββββββββ
β
βββββββββββββ¬ββββββββ΄ββββββββ¬ββββββββββββ
βΌ βΌ βΌ βΌ
DIV-ALPHA DIV-BETA DIV-GAMMA DIV-DELTA 4 Division
βββ΄ββ βββ΄ββ βββ΄ββ βββ΄ββ Commanders
β β β β β β β β
ββ΄β ββ΄β ββ΄β ββ΄β ββ΄β ββ΄β ββ΄β ββ΄β 20 Commanders
βββ βββ βββ βββ βββ βββ βββ βββ (5 per division)
Β·Β·Β· Β·Β·Β· Β·Β·Β· Β·Β·Β· Β·Β·Β· Β·Β·Β· Β·Β·Β· Β·Β·Β·
βββ βββ βββ βββ βββ βββ βββ βββ 200 Squad Leads
βΌβΌβΌ βΌβΌβΌ βΌβΌβΌ βΌβΌβΌ βΌβΌβΌ βΌβΌβΌ βΌβΌβΌ βΌβΌβΌ (10 per commander)
ββββββββββββββββββββββββββββββββββββββββββββββββ
β 1,000 Workers execute in parallel β
β Atomic tasks Β· 8K context Β· Leaf nodes β
ββββββββββββββββββββββββ¬ββββββββββββββββββββββββ
β
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββ
β 20 Reviewers β cross-family scoring mesh β
β Claude β GPT pairs Β· 4-axis sealed scoring β
ββββββββββββββββββββββββ¬ββββββββββββββββββββββββ
β
βΌ
ββββββββββββββββββββββββ
β Shadow Scoring β Sealed-envelope validation
β Hardening cycle β Spec L2 conformance
ββββββββββββ¬ββββββββββββ
β
βΌ
Your answer
Context compresses on the way down. Results compress on the way up. A 4K-token mission becomes 128-token micro-briefs at the leaves, and 256-token atoms bubble back up through merges until Nexus holds a 4K-token final report. Nothing explodes.
For the full visual deep dive: architecture diagrams Β· architecture overview
Hive1K isnβt one size. Itβs three sizes, and you pick the one that fits.
~316 agents Β· Fast reconnaissance. Bounded tasks where you want multi-model coverage without the full hierarchy. Good for: mapping a codebase, reviewing a design doc, triaging a bug backlog.
~625 agents Β· The workhorse. Most real software tasks land here β document a system, write missing tests, audit for security gaps, plan a migration. Two Division Commanders coordinate the effort.
~1,245 agents Β· Maximum coverage. Four Division Commanders, the complete hierarchy, every model family engaged. For repo-wide audits, high-stakes architecture decisions, or anything where missing a blind spot has real consequences.
hive1k h-250 "Triage the open bug backlog and rank by risk"
hive1k h-500 "Document the auth system and flag rollout risks"
hive1k h-1k "Full architecture review β find every gap, test every assumption"
Cost scales sub-linearly (Ξ± β 0.45). Going from H-250 to H-1K is roughly 2.2Γ the wall-clock time, not 4Γ. The architecture pays for parallelism, not for waiting.
Details and cost estimates: docs/scaling.md
A thousand agents can produce a lot of confident nonsense if you let them. Hive1K is built around the premise that agreement is not accuracy β three agents saying the same wrong thing is worse than one agent saying it, because it feels more true.
A 4-stage pipeline decides what survives:
- Workers self-score β each atom ships with a confidence signal
- Squad Leads merge locally β classify results as CONSENSUS / MAJORITY / CONFLICT / UNIQUE
- Commanders merge across squads β trimmed mean, weighted formula:
0.40 Γ confidence + 0.30 Γ evidence + 0.15 Γ scope + 0.15 Γ coverage β conflict_penalty - Nexus arbitrates β median-of-3 judging on unresolved conflicts
Disagreement isnβt suppressed. Itβs scored, preserved, and surfaced. When agents conflict, you see it.
The sealed-envelope protocol. Before any commander executes, Nexus generates acceptance criteria and locks them away. The swarm never sees them. After execution, the criteria unseal and validate the output.
| Shadow Score | What it means | What happens |
|---|---|---|
| 0% | Every criterion passed | Ship it |
| 1β15% | Minor gaps | Proceed with notes |
| 16β30% | Moderate gaps | Gap report attached, warning raised |
| 31β50% | Significant gaps | Bundle quarantined, hardening cycle |
| > 50% | Critical failure | Bundle rejected entirely |
This is why Hive1K exists as a distinct project. Its predecessor didnβt have this. The moment we saw judges rate flawed output 44/50 while hidden criteria caught the errors, shadow scoring became non-negotiable.
Full protocol: docs/shadow-scoring.md Β· docs/consensus.md
curl -fsSL https://raw.githubusercontent.com/DUBSOpenHub/hive1k/main/quickstart.sh | bashThen open Copilot and type: hive1k
Prefer to inspect first?
curl -fsSL https://raw.githubusercontent.com/DUBSOpenHub/hive1k/main/quickstart.sh -o quickstart.sh
less quickstart.sh
bash quickstart.shmkdir -p ~/.copilot/skills/hive1k ~/.copilot/agents && \
curl -sL https://raw.githubusercontent.com/DUBSOpenHub/hive1k/main/skills/hive1k/SKILL.md \
-o ~/.copilot/skills/hive1k/SKILL.md && \
curl -sL https://raw.githubusercontent.com/DUBSOpenHub/hive1k/main/agents/hive1k.agent.md \
-o ~/.copilot/agents/hive1k.agent.md# macOS
shasum -a 256 ~/.copilot/skills/hive1k/SKILL.md
shasum -a 256 ~/.copilot/agents/hive1k.agent.md
# Linux
sha256sum ~/.copilot/skills/hive1k/SKILL.md
sha256sum ~/.copilot/agents/hive1k.agent.mdNote: SHA hashes are published in the latest release. Compare your download hashes against the release notes before use.
git clone https://github.com/DUBSOpenHub/hive1k.git
cd hive1k
chmod +x quickstart.sh && ./quickstart.shRequires an active Copilot subscription.
Hive1K is one tool in a family. They solve different problems.
| You want to... | Use | Why |
|---|---|---|
| Get one consensus answer from a recursive agent hierarchy (250β1,245 agents) | Hive1K | Recursive decomposition, cross-model review, shadow validation, one synthesized output |
| Run parallel coding workstreams across terminals | Stampede | Independent task lanes, execution throughput, branch-per-task |
| Tournament-test ideas across many models | Havoc Hackathon | Competitive elimination rounds, sealed judge panels, ranked synthesis |
| Orchestrate ~250 agents without the recursive layer | Swarm Command | Hive1Kβs predecessor β same core, no Division Commanders, depth 3 instead of 4 |
Short version: Hive1K for consensus at scale. Stampede for parallel execution. Havoc for idea tournaments. Swarm Command if you want the simpler original.
The sections below are for people who want to know how the gears turn. If you just want to use Hive1K, everything above is enough.
| Role | Models |
|---|---|
| Nexus | claude-opus-4.6 |
| Commanders (pool of 10) | claude-opus-4.6, claude-opus-4.5, claude-opus-4.6-1m, claude-sonnet-4.6, claude-sonnet-4.5, claude-sonnet-4, gpt-5.4, gpt-5.2, gpt-5.1, goldeneye |
| Squad Leads | claude-haiku-4.5, gpt-5.4-mini |
| Workers (pool of 6) | claude-haiku-4.5, gpt-5.4-mini, gpt-5-mini, gpt-4.1, gpt-5.3-codex, gpt-5.2-codex |
| Reviewers (8 cross-family pairs) | claude-opus-4.6βgpt-5.4, claude-opus-4.5βgpt-5.2, claude-opus-4.6-1mβgpt-5.1, claude-sonnet-4.6βgpt-5.3-codex, claude-sonnet-4.5βgpt-5.2-codex, claude-sonnet-4βgpt-5.4-mini, claude-haiku-4.5βgpt-5-mini, goldeneyeβgpt-4.1 |
Every reviewer pair intentionally crosses model families. When Claude and GPT agree, that signal is worth more than either alone.
All tunables live in config.yml:
consensus:
threshold_consensus: 0.70
threshold_majority: 0.50
depth_guard:
max_spawn_depth: 4
max_workers_per_squad_lead: 5
circuit_breaker:
timeout_cascade: [240, 150, 90, 50, 30]
shadow_scoring:
enabled: true
spec_version: "1.0.0"
conformance_level: "L2"
sealed_criteria_count: 10
hardening:
enabled: true
threshold: 15
cost_ceiling:
enabled: true
mode: user-configurableDepth Guard enforces 5 laws and 3 layers of protection against runaway recursion. Circuit breaker implements a 3-state FSM with 5-level recovery escalation. Neither is optional β at this scale, guardrails are structural.
- Depth Guard β hard limit on recursion depth (max 4), spawn budgets per layer, enforcement at every level
- Circuit breaker β CLOSED β OPEN β HALF-OPEN FSM with cascading timeouts
- Token compression β context shrinks at each layer (4K β 3K β 2K β 512 β 128 tokens), results compress on the way back up
- Cost ceiling β user-configurable budget cap; the swarm stops before it overspends
hive1k/
βββ README.md # You are here
βββ AGENTS.md # Agent/skill descriptions
βββ CONTRIBUTING.md # Contribution guidelines
βββ catalog.yml # Skill metadata
βββ config.yml # All tunables
βββ LICENSE # MIT
βββ SECURITY.md # Security policy
βββ quickstart.sh # One-line installer
βββ .github/
β βββ copilot-instructions.md # AI agent instructions for this repo
β βββ workflows/ci.yml # CI validation
β βββ skills/hive1k/SKILL.md # Skill discovery path
βββ agents/
β βββ hive1k.agent.md # Standalone agent version
βββ skills/hive1k/
β βββ SKILL.md # Core skill definition
βββ templates/
β βββ commander.md # Commander prompt template
β βββ division-commander.md # Division Commander prompt template
β βββ worker.md # Worker prompt template
β βββ reviewer.md # Cross-reviewer prompt template
β βββ squad-lead.md # Squad lead prompt template
βββ protocols/
β βββ depth-guard.md # 5 Laws + 3-layer enforcement
β βββ circuit-breaker.md # 3-state FSM + 5-level recovery
β βββ context-capsule.md # JSON schemas for data structures
β βββ meta-reviewer.md # Reviewer quality gate protocol
βββ docs/
βββ architecture.md # Architecture overview
βββ architecture-diagrams.md # Mermaid diagrams
βββ consensus.md # Consensus algorithm deep dive
βββ example-output.md # Sample completed run output
βββ learning-path.md # Recommended reading order
βββ scaling.md # Scale chooser + cost estimates
βββ shadow-scoring.md # Shadow scoring protocol
βββ use-cases.md # Expanded prompt gallery
| Doc | Whatβs in it |
|---|---|
| learning-path.md | Beginner, operator, and architect reading tracks |
| architecture.md | The full system model |
| architecture-diagrams.md | Mermaid diagrams for every layer |
| scaling.md | Scale chooser, cost estimates, tuning guide |
| use-cases.md | Prompt gallery with expected outcomes |
| consensus.md | The 4-stage consensus algorithm in detail |
| shadow-scoring.md | The sealed-envelope protocol, hardening cycle |
| example-output.md | Full transcript of a completed swarm run |
Hive1K implements Shadow Score Spec L2 β sealed acceptance criteria generated before execution, validated after, hardened on failure.
MIT β use it, fork it, build on it.
π Built by @DUBSOpenHub with the GitHub Copilot CLI