Skip to content

DUBSOpenHub/hive1k

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

20 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🐝 Hive1K

One thousand agents. One hive mind.

You type one sentence. Behind the glass, a thousand workers fan out across 16 models, argue in cross-family review meshes, get audited by sealed judges they never knew existed, and converge back into a single answer β€” usually before your coffee gets cold.

It’s quieter than you’d expect.

License: MIT Security Policy


What happens when you type hive1k

$ copilot

> hive1k h-500 "Document the auth system, add missing tests, and flag rollout risks"

[NEXUS] Booting H-500 swarm...
[NEXUS] Sealing acceptance criteria (8 checks)
[NEXUS] Deploying 2 Division Commanders...
[DIV-ALPHA] Commanding Architecture + Implementation divisions
[DIV-BETA] Commanding Testing + Documentation + Integration divisions
  [DIV-ALPHA/CMD-ARCH] Mapping auth boundaries and module ownership
  [DIV-ALPHA/CMD-IMPL] Tracing token issuance, refresh, and revocation flows
  [DIV-BETA/CMD-TEST] Enumerating missing happy-path, edge-case, and failure-path tests
  [DIV-BETA/CMD-DOCS] Drafting operator-facing docs and examples
  [DIV-BETA/CMD-INTG] Checking rollout risks across API, web, DB, and monitoring
[REVIEW] Cross-family review mesh started
[SHADOW] 1 criterion failed on first pass β†’ hardening cycle triggered
[SHADOW] Re-validated bundle: 0 critical failures remaining

βœ… Final bundle ready in 47s

Top outputs:
  1. Auth architecture brief with module boundaries
  2. Ranked test-gap list with highest-risk paths first
  3. Rollout checklist covering cookies, refresh tokens, and observability
  4. Updated docs outline for onboarding + operations

Consensus: CONSENSUS on 3/4 major findings
Shadow Score: 12.5% β†’ hardened and accepted

One prompt. 625 agents worked that problem. You got one synthesized answer.

Try it yourself:

hive1k "Map this repo, explain how the major systems fit together, and list the 5 highest-risk gaps"

Why a thousand agents?

One model gives you one perspective. For a small task, that’s fine. For anything with real stakes β€” architecture that spans six services, a migration that touches every API surface, a security audit where missing one edge case matters β€” one perspective is a gamble.

The failure that started this project: three sealed judges scored a system design 44–46 out of 50. Shadow scoring β€” hidden criteria the judges never saw β€” caught critical arithmetic errors in the same output. Review alone is not validation. Confident and correct are not the same thing.

Hive1K exists because some tasks deserve more than one brain’s best guess. It turns a single request into a structured process: decompose, parallelize across model families, cross-review, validate against sealed criteria, converge. The answer you get back isn’t one model’s opinion β€” it’s the output that survived a gauntlet.

And it scales sub-linearly. Five times more agents costs roughly 2.2Γ— more wall-clock time. The architecture is parallelism-first, convergence-second.


The Hive

Picture it as a living organization, not a diagram.

Nexus sits at the top β€” one orchestrator running on claude-opus-4.6 with a 128K context budget. It reads your mission, decides what divisions are needed, and seals the acceptance criteria in an envelope that no agent below will ever see.

Division Commanders β€” up to four, named DIV-ALPHA through DIV-DELTA β€” each own a slice of the mission. They’re the recursive layer that Hive1K’s predecessor (Swarm Command) didn’t have. This is what lets the system scale from hundreds to over a thousand agents without losing coherence.

Commanders (20 total, 5 per division) are domain specialists β€” architecture, implementation, testing, documentation, integration. They break their domain into micro-tasks.

Squad Leads (200 total) decompose those micro-tasks further and run canary checks before committing workers.

Workers (1,000 total) do the actual atomic work. They’re leaf nodes β€” they execute, they don’t spawn. Each gets a 128-token micro-brief and returns a 256-token atom.

Reviewers (20 total) form a cross-family mesh. Every review pair is intentionally split across model families β€” Claude reviews GPT’s work, GPT reviews Claude’s β€” so agreement means more than self-consistency.

Then the sealed envelope opens. Shadow scoring validates everything against criteria the swarm never optimized for. If the score is too high, a hardening cycle fires. Only what survives gets synthesized into your final answer.

                            Your question
                                 β”‚
                                 β–Ό
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚       NEXUS (1)        β”‚  Decomposes mission
                    β”‚  Seals hidden criteria β”‚  Synthesizes final answer
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                β”‚
            β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
            β–Ό           β–Ό               β–Ό           β–Ό
        DIV-ALPHA   DIV-BETA      DIV-GAMMA   DIV-DELTA    4 Division
        β”Œβ”€β”΄β”€β”       β”Œβ”€β”΄β”€β”         β”Œβ”€β”΄β”€β”       β”Œβ”€β”΄β”€β”       Commanders
        β”‚   β”‚       β”‚   β”‚         β”‚   β”‚       β”‚   β”‚
       ─┴─ ─┴─    ─┴─ ─┴─      ─┴─ ─┴─    ─┴─ ─┴─        20 Commanders
       β”‚β”‚β”‚ β”‚β”‚β”‚    β”‚β”‚β”‚ β”‚β”‚β”‚      β”‚β”‚β”‚ β”‚β”‚β”‚    β”‚β”‚β”‚ β”‚β”‚β”‚          (5 per division)
       Β·Β·Β· Β·Β·Β·    Β·Β·Β· Β·Β·Β·      Β·Β·Β· Β·Β·Β·    Β·Β·Β· Β·Β·Β·
       β”‚β”‚β”‚ β”‚β”‚β”‚    β”‚β”‚β”‚ β”‚β”‚β”‚      β”‚β”‚β”‚ β”‚β”‚β”‚    β”‚β”‚β”‚ β”‚β”‚β”‚          200 Squad Leads
       β–Όβ–Όβ–Ό β–Όβ–Όβ–Ό    β–Όβ–Όβ–Ό β–Όβ–Όβ–Ό      β–Όβ–Όβ–Ό β–Όβ–Όβ–Ό    β–Όβ–Όβ–Ό β–Όβ–Όβ–Ό          (10 per commander)
      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
      β”‚         1,000 Workers execute in parallel     β”‚
      β”‚     Atomic tasks Β· 8K context Β· Leaf nodes    β”‚
      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                             β”‚
                             β–Ό
      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
      β”‚    20 Reviewers β€” cross-family scoring mesh   β”‚
      β”‚  Claude ↔ GPT pairs Β· 4-axis sealed scoring  β”‚
      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                             β”‚
                             β–Ό
                  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                  β”‚   Shadow Scoring     β”‚  Sealed-envelope validation
                  β”‚   Hardening cycle    β”‚  Spec L2 conformance
                  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                             β”‚
                             β–Ό
                        Your answer

Context compresses on the way down. Results compress on the way up. A 4K-token mission becomes 128-token micro-briefs at the leaves, and 256-token atoms bubble back up through merges until Nexus holds a 4K-token final report. Nothing explodes.

For the full visual deep dive: architecture diagrams Β· architecture overview


Scale

Hive1K isn’t one size. It’s three sizes, and you pick the one that fits.

H-250 Β· Scout Swarm

~316 agents Β· Fast reconnaissance. Bounded tasks where you want multi-model coverage without the full hierarchy. Good for: mapping a codebase, reviewing a design doc, triaging a bug backlog.

H-500 Β· Worker Swarm

~625 agents Β· The workhorse. Most real software tasks land here β€” document a system, write missing tests, audit for security gaps, plan a migration. Two Division Commanders coordinate the effort.

H-1K Β· Full Hive

~1,245 agents Β· Maximum coverage. Four Division Commanders, the complete hierarchy, every model family engaged. For repo-wide audits, high-stakes architecture decisions, or anything where missing a blind spot has real consequences.

hive1k h-250 "Triage the open bug backlog and rank by risk"
hive1k h-500 "Document the auth system and flag rollout risks"
hive1k h-1k  "Full architecture review β€” find every gap, test every assumption"

Cost scales sub-linearly (Ξ± β‰ˆ 0.45). Going from H-250 to H-1K is roughly 2.2Γ— the wall-clock time, not 4Γ—. The architecture pays for parallelism, not for waiting.

Details and cost estimates: docs/scaling.md


Trust

A thousand agents can produce a lot of confident nonsense if you let them. Hive1K is built around the premise that agreement is not accuracy β€” three agents saying the same wrong thing is worse than one agent saying it, because it feels more true.

Consensus scoring

A 4-stage pipeline decides what survives:

  1. Workers self-score β€” each atom ships with a confidence signal
  2. Squad Leads merge locally β€” classify results as CONSENSUS / MAJORITY / CONFLICT / UNIQUE
  3. Commanders merge across squads β€” trimmed mean, weighted formula: 0.40 Γ— confidence + 0.30 Γ— evidence + 0.15 Γ— scope + 0.15 Γ— coverage βˆ’ conflict_penalty
  4. Nexus arbitrates β€” median-of-3 judging on unresolved conflicts

Disagreement isn’t suppressed. It’s scored, preserved, and surfaced. When agents conflict, you see it.

Shadow scoring

The sealed-envelope protocol. Before any commander executes, Nexus generates acceptance criteria and locks them away. The swarm never sees them. After execution, the criteria unseal and validate the output.

Shadow Score What it means What happens
0% Every criterion passed Ship it
1–15% Minor gaps Proceed with notes
16–30% Moderate gaps Gap report attached, warning raised
31–50% Significant gaps Bundle quarantined, hardening cycle
> 50% Critical failure Bundle rejected entirely

This is why Hive1K exists as a distinct project. Its predecessor didn’t have this. The moment we saw judges rate flawed output 44/50 while hidden criteria caught the errors, shadow scoring became non-negotiable.

Full protocol: docs/shadow-scoring.md Β· docs/consensus.md


Get started

Quickstart (one command)

curl -fsSL https://raw.githubusercontent.com/DUBSOpenHub/hive1k/main/quickstart.sh | bash

Then open Copilot and type: hive1k

Prefer to inspect first?

curl -fsSL https://raw.githubusercontent.com/DUBSOpenHub/hive1k/main/quickstart.sh -o quickstart.sh
less quickstart.sh
bash quickstart.sh

Manual install

mkdir -p ~/.copilot/skills/hive1k ~/.copilot/agents && \
  curl -sL https://raw.githubusercontent.com/DUBSOpenHub/hive1k/main/skills/hive1k/SKILL.md \
    -o ~/.copilot/skills/hive1k/SKILL.md && \
  curl -sL https://raw.githubusercontent.com/DUBSOpenHub/hive1k/main/agents/hive1k.agent.md \
    -o ~/.copilot/agents/hive1k.agent.md

Verify your downloads

# macOS
shasum -a 256 ~/.copilot/skills/hive1k/SKILL.md
shasum -a 256 ~/.copilot/agents/hive1k.agent.md

# Linux
sha256sum ~/.copilot/skills/hive1k/SKILL.md
sha256sum ~/.copilot/agents/hive1k.agent.md

Note: SHA hashes are published in the latest release. Compare your download hashes against the release notes before use.

Clone the repo

git clone https://github.com/DUBSOpenHub/hive1k.git
cd hive1k
chmod +x quickstart.sh && ./quickstart.sh

Requires an active Copilot subscription.


Neighbors, not competitors

Hive1K is one tool in a family. They solve different problems.

You want to... Use Why
Get one consensus answer from a recursive agent hierarchy (250–1,245 agents) Hive1K Recursive decomposition, cross-model review, shadow validation, one synthesized output
Run parallel coding workstreams across terminals Stampede Independent task lanes, execution throughput, branch-per-task
Tournament-test ideas across many models Havoc Hackathon Competitive elimination rounds, sealed judge panels, ranked synthesis
Orchestrate ~250 agents without the recursive layer Swarm Command Hive1K’s predecessor β€” same core, no Division Commanders, depth 3 instead of 4

Short version: Hive1K for consensus at scale. Stampede for parallel execution. Havoc for idea tournaments. Swarm Command if you want the simpler original.


Under the hood

The sections below are for people who want to know how the gears turn. If you just want to use Hive1K, everything above is enough.

The 16 models

Role Models
Nexus claude-opus-4.6
Commanders (pool of 10) claude-opus-4.6, claude-opus-4.5, claude-opus-4.6-1m, claude-sonnet-4.6, claude-sonnet-4.5, claude-sonnet-4, gpt-5.4, gpt-5.2, gpt-5.1, goldeneye
Squad Leads claude-haiku-4.5, gpt-5.4-mini
Workers (pool of 6) claude-haiku-4.5, gpt-5.4-mini, gpt-5-mini, gpt-4.1, gpt-5.3-codex, gpt-5.2-codex
Reviewers (8 cross-family pairs) claude-opus-4.6↔gpt-5.4, claude-opus-4.5↔gpt-5.2, claude-opus-4.6-1m↔gpt-5.1, claude-sonnet-4.6↔gpt-5.3-codex, claude-sonnet-4.5↔gpt-5.2-codex, claude-sonnet-4↔gpt-5.4-mini, claude-haiku-4.5↔gpt-5-mini, goldeneye↔gpt-4.1

Every reviewer pair intentionally crosses model families. When Claude and GPT agree, that signal is worth more than either alone.

Configuration

All tunables live in config.yml:

consensus:
  threshold_consensus: 0.70
  threshold_majority: 0.50

depth_guard:
  max_spawn_depth: 4
  max_workers_per_squad_lead: 5

circuit_breaker:
  timeout_cascade: [240, 150, 90, 50, 30]

shadow_scoring:
  enabled: true
  spec_version: "1.0.0"
  conformance_level: "L2"
  sealed_criteria_count: 10
  hardening:
    enabled: true
    threshold: 15

cost_ceiling:
  enabled: true
  mode: user-configurable

Depth Guard enforces 5 laws and 3 layers of protection against runaway recursion. Circuit breaker implements a 3-state FSM with 5-level recovery escalation. Neither is optional β€” at this scale, guardrails are structural.

Safety mechanisms

  • Depth Guard β€” hard limit on recursion depth (max 4), spawn budgets per layer, enforcement at every level
  • Circuit breaker β€” CLOSED β†’ OPEN β†’ HALF-OPEN FSM with cascading timeouts
  • Token compression β€” context shrinks at each layer (4K β†’ 3K β†’ 2K β†’ 512 β†’ 128 tokens), results compress on the way back up
  • Cost ceiling β€” user-configurable budget cap; the swarm stops before it overspends

Repo structure

hive1k/
β”œβ”€β”€ README.md                           # You are here
β”œβ”€β”€ AGENTS.md                           # Agent/skill descriptions
β”œβ”€β”€ CONTRIBUTING.md                     # Contribution guidelines
β”œβ”€β”€ catalog.yml                         # Skill metadata
β”œβ”€β”€ config.yml                          # All tunables
β”œβ”€β”€ LICENSE                             # MIT
β”œβ”€β”€ SECURITY.md                         # Security policy
β”œβ”€β”€ quickstart.sh                       # One-line installer
β”œβ”€β”€ .github/
β”‚   β”œβ”€β”€ copilot-instructions.md         # AI agent instructions for this repo
β”‚   β”œβ”€β”€ workflows/ci.yml                # CI validation
β”‚   └── skills/hive1k/SKILL.md         # Skill discovery path
β”œβ”€β”€ agents/
β”‚   └── hive1k.agent.md                # Standalone agent version
β”œβ”€β”€ skills/hive1k/
β”‚   └── SKILL.md                        # Core skill definition
β”œβ”€β”€ templates/
β”‚   β”œβ”€β”€ commander.md                    # Commander prompt template
β”‚   β”œβ”€β”€ division-commander.md           # Division Commander prompt template
β”‚   β”œβ”€β”€ worker.md                       # Worker prompt template
β”‚   β”œβ”€β”€ reviewer.md                     # Cross-reviewer prompt template
β”‚   └── squad-lead.md                   # Squad lead prompt template
β”œβ”€β”€ protocols/
β”‚   β”œβ”€β”€ depth-guard.md                  # 5 Laws + 3-layer enforcement
β”‚   β”œβ”€β”€ circuit-breaker.md              # 3-state FSM + 5-level recovery
β”‚   β”œβ”€β”€ context-capsule.md              # JSON schemas for data structures
β”‚   └── meta-reviewer.md               # Reviewer quality gate protocol
└── docs/
    β”œβ”€β”€ architecture.md                 # Architecture overview
    β”œβ”€β”€ architecture-diagrams.md        # Mermaid diagrams
    β”œβ”€β”€ consensus.md                    # Consensus algorithm deep dive
    β”œβ”€β”€ example-output.md               # Sample completed run output
    β”œβ”€β”€ learning-path.md                # Recommended reading order
    β”œβ”€β”€ scaling.md                      # Scale chooser + cost estimates
    β”œβ”€β”€ shadow-scoring.md               # Shadow scoring protocol
    └── use-cases.md                    # Expanded prompt gallery

Go deeper

Doc What’s in it
learning-path.md Beginner, operator, and architect reading tracks
architecture.md The full system model
architecture-diagrams.md Mermaid diagrams for every layer
scaling.md Scale chooser, cost estimates, tuning guide
use-cases.md Prompt gallery with expected outcomes
consensus.md The 4-stage consensus algorithm in detail
shadow-scoring.md The sealed-envelope protocol, hardening cycle
example-output.md Full transcript of a completed swarm run

Spec conformance

Hive1K implements Shadow Score Spec L2 β€” sealed acceptance criteria generated before execution, validated after, hardened on failure.


License

MIT β€” use it, fork it, build on it.


🐝 Built by @DUBSOpenHub with the GitHub Copilot CLI

About

🐝 Hive1K β€” one thousand agents, one hive mind. Recursive multi-model swarm orchestration for the Copilot CLI. Launch 250–1,000+ AI agents across 16 models.

Topics

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors