Русский · English
Multi-agent framework for Claude Code: 60 specialized agents, 45 methodology skills, 21 Python orchestration scripts, tier-aware acceptance (S/M/L), filesystem-isolated adversary review, cross-family second opinion via Codex MCP, human as supreme judge at critical transitions.
Multi-agent pipelines on a single model family suffer from three systemic failure modes:
| Problem | What goes wrong | How the system handles it |
|---|---|---|
| Framing contamination | The same Claude across multiple roles shares the same blind spots | Adversary runs in a fresh subprocess with a filesystem-curated view — sees only what an external process places there |
| Goodhart on validators | Validators degenerate into format-gates, checking fields instead of thinking | Tier-aware dispatch + cross-family second opinion via Codex (different model lineage = different blind spots) |
| Undifferentiated rigour | A button tweak and a landing redesign go through the same pipeline | S — light human-glance; M — adversary + judge; L — consilium of 5 reviewers + cross-family adjudication |
flowchart TB
H["Human layer<br/>Trigger phrase + supreme judge on M/L acceptance"]
A["Agents layer · 60 agents<br/>directors / leads / specialists / validators"]
S["Skills layer · 45 skills<br/>methodologies, protocols, tool guides"]
O["Orchestration layer · 21 Python scripts<br/>mechanical gates, adversary, consilium, archival"]
St["State layer<br/>engagement/ directory · whitelist · append-only logs"]
H <--> A
A <--> S
A <--> O
O <--> St
A <--> St
classDef human fill:#fef3c7,stroke:#d97706,color:#000
classDef agents fill:#dbeafe,stroke:#2563eb,color:#000
classDef skills fill:#dcfce7,stroke:#16a34a,color:#000
classDef orch fill:#fce7f3,stroke:#db2777,color:#000
classDef state fill:#e9d5ff,stroke:#9333ea,color:#000
class H human
class A agents
class S skills
class O orch
class St state
Each layer has a clear scope of responsibility. Layers don't substitute for each other: agents don't write scripts, scripts don't make judgments, humans don't do routine validation.
Detailed description of each layer and their interactions —
ARCHITECTURE.md.
Tier-aware acceptance. Each engagement is classified at intake into one of three tiers:
| Tier | Use case | Adversary | Director | Mechanical checks |
|---|---|---|---|---|
| S | Hotfix, button tweak, single deliverable | None — human glance | None | 6 |
| M | Feature, landing, dashboard, multi-specialist | 1× peer-opus | Judge mode | 13 |
| L | Rebrand, multi-wave, cross-domain | 5× consilium | Judge + adjudication | 21 |
Adversary in filesystem-isolated subprocess. Two-pass design against framing contamination:
- Pass 1 (Blind). Adversary sees a curated copy of
engagement/withouthandoff.md, without acceptance-log, without other reviewers. Forms preliminary findings without contamination. - Pass 2 (Informed). Adversary receives full state plus its own preliminary findings injected via prompt. Confirms, refines, or retracts findings. Delta preliminary→final is a contamination signal.
L-tier consilium. 5 reviewers in parallel: Anthropic Opus + 2× OpenAI GPT-5 (Codex) + Anthropic Sonnet + Anthropic Haiku. Cross-family disagreements are detected automatically and flagged for manual review.
Director as judge, not sweep-runner. On M/L the director issues a verdict per directive with explicit adjudication on every disagreement between adversary and author. Doesn't dispatch, doesn't edit content, doesn't re-run validators. Adjudication completeness is enforced mechanically — every finding must have a decision marker.
Human as supreme judge. Between consilium synthesis and director
verdict the human gets a chat-ready summary (≤2 minutes to read) and
responds in one of three forms: PROCEED / REJECT: <reason> /
DIRECTED: <what to change>. No 200 lines of markdown — the system
formats and expands it.
Mechanical safety baseline. Exit-code gates run at every transition:
danger-scan (DROP / force-push / prod-deploy registry),
handoff-precheck (tier-aware structural verification),
handoff-paths-check (phantom path detection),
director-verdict-check (adjudication completeness),
preflight (tools availability).
Audit trail by FS state. Engagement = directory. State is read from
files: iteration, validation-log.md, validation-outputs/*.json,
consilium-summary.md, human-directive.md, acceptance-log.md.
No databases, no external logs — cat reconstructs the picture
completely.
sequenceDiagram
autonumber
participant U as Human
participant SK as agency-intake (skill)
participant L as Domain Lead
participant SP as Specialists
participant V as Validators
participant SC as Scripts (orchestration)
participant D as Director
U->>SK: trigger phrase
SK->>SK: classify → criteria.md (S/M/L)
SK->>L: handoff to lead
L->>SP: dispatch tasks
SP->>L: executor-reports/
L->>V: dispatch validators
V->>L: validation-outputs/*.json
L->>SC: handoff-precheck.py
SC-->>L: exit 0 / fail
SC->>SC: adversary.py --consilium {M|L}
SC->>SC: consilium-synth.py
SC->>U: consilium-present.py (chat summary)
U->>SC: PROCEED / REJECT / DIRECTED
SC->>SC: human-directive.py
SC->>D: invoke director (judge mode)
D->>D: write acceptance-log.md per directive
D->>SC: director-verdict-check.py
SC-->>U: ACCEPT — engagement-archive.py
S-tier skips adversary, consilium and director phase: producer self-attests, mechanical checks gate, human accepts directly.
| Category | Count | Roles |
|---|---|---|
| Directors | 3 | dev-director, design-director, marketing-director — judge mode on M/L |
| Leads | 11 | 3 top-leads (dev/design/marketing) + 8 mid-leads (product, engineering, quality, brand, product-design, traffic, content, analytics) |
| Specialists | 20 | backend, frontend, fullstack, devops, qa, tech-architect, product-analyst, technical-writer; ux, ui, visual, brand-strategist, presentation; copywriter, banner-designer, seo, ppc, keyword-researcher, web-analyst, ai-visibility |
| Validators | 26 | code-reviewer, security-auditor, accessibility, performance, migration, test-reviewer, reality-checker, skeptic, completeness, task/tech-spec/user-spec validators, infra/deploy reviewers, pre/post-deploy QA, anti-pattern detector, ux-review, skill-checker, etc. |
| Category | Count | What's in it |
|---|---|---|
| Agency protocol | 6 | agency-intake, engagement-protocol, director-acceptance-protocol, validation-pipeline, docs-pipeline, codex-bridge |
| Dev methodology | 18 | TDD, code review, spec planning (user/tech), task decomposition, deploy, security, infrastructure, prompt engineering, persistent tasks, pre/post-deploy QA |
| Design methodology | 8 | brand, design system, UI/UX, presentation, banner, design tokens |
| Marketing methodology | 4 | SEO auditing, semantic drift, AI visibility, task decomposition |
| Regional SEO/PPC stack | 6 | API integrations for Russian-market analytics platforms (Webmaster, Metrika, Direct, Wordstat, Search) |
| Skill development | 3 | skill authoring, test design, testing |
Frontmatter tags for the router: [PROTOCOL], [METHODOLOGY], [TOOL].
12 main:
adversary.py— bridge for 5 reviewer roles with two-pass curated-view isolationconsilium-synth.py— adversary output aggregation, two-stage dedupconsilium-present.py— chat-ready format with decision menudirector-verdict-check.py— mechanical adjudication completenesshandoff-precheck.py— hard-gate tier dispatch (S=6 / M=13 / L=21 checks)human-directive.py— scaffold human-directive.md from CLI argspreflight.py— tools availability checkdanger-scan.py— registry of dangerous operationshandoff-paths-check.py— phantom path detectioncross-val-check.py— verbatim quote verificationtrace-schema-check.py— trace JSON schema + stalenessengagement-archive.py— idempotent archival
Plus optional/ — opt-in utilities outside the core protocol
(see scripts/optional/README.md).
- Claude Code
- Codex
- Python 3.10+
- (Optional) Yandex API tokens — for marketing skills (Webmaster, Metrika, Direct, Wordstat, Search)
-
Clone the repository:
git clone https://github.com/ElPinus/agentic-workflow.git cd agentic-workflow -
Copy contents to
~/.claude/:cp -r agents/* ~/.claude/agents/ cp -r skills/* ~/.claude/skills/ cp -r scripts/* ~/.claude/scripts/
(On Windows — corresponding paths in
%USERPROFILE%\.claude\.) -
Configure Codex MCP:
cp .mcp.json.example .mcp.json
Set the absolute path to the
codexCLI. -
(Optional) Configure Yandex API:
cp .env.example .env
Fill in tokens if you use marketing skills.
-
Restart Claude Code — verify that MCP tools are visible.
Entry point — trigger phrase in chat. Both English and Russian are recognized out of the box:
new task <description>
or
мне надо сделать задачу <description>
Add or adjust phrasings in the agency-intake skill's Use when:
list to match your team's vocabulary.
The system then autonomously runs the engagement through all layers. On M/L you get a chat summary with a decision menu — respond with a short verdict.
Detailed flow and role of each layer —
ARCHITECTURE.md.
MIT (see LICENSE)