Skip to content

ElPinus/agentic-workflow

Repository files navigation

Русский · English

agentic-workflow

Multi-agent framework for Claude Code: 60 specialized agents, 45 methodology skills, 21 Python orchestration scripts, tier-aware acceptance (S/M/L), filesystem-isolated adversary review, cross-family second opinion via Codex MCP, human as supreme judge at critical transitions.

Why this exists

Multi-agent pipelines on a single model family suffer from three systemic failure modes:

Problem What goes wrong How the system handles it
Framing contamination The same Claude across multiple roles shares the same blind spots Adversary runs in a fresh subprocess with a filesystem-curated view — sees only what an external process places there
Goodhart on validators Validators degenerate into format-gates, checking fields instead of thinking Tier-aware dispatch + cross-family second opinion via Codex (different model lineage = different blind spots)
Undifferentiated rigour A button tweak and a landing redesign go through the same pipeline S — light human-glance; M — adversary + judge; L — consilium of 5 reviewers + cross-family adjudication

Architecture — five layers

flowchart TB
    H["Human layer<br/>Trigger phrase + supreme judge on M/L acceptance"]
    A["Agents layer · 60 agents<br/>directors / leads / specialists / validators"]
    S["Skills layer · 45 skills<br/>methodologies, protocols, tool guides"]
    O["Orchestration layer · 21 Python scripts<br/>mechanical gates, adversary, consilium, archival"]
    St["State layer<br/>engagement/ directory · whitelist · append-only logs"]

    H <--> A
    A <--> S
    A <--> O
    O <--> St
    A <--> St

    classDef human fill:#fef3c7,stroke:#d97706,color:#000
    classDef agents fill:#dbeafe,stroke:#2563eb,color:#000
    classDef skills fill:#dcfce7,stroke:#16a34a,color:#000
    classDef orch fill:#fce7f3,stroke:#db2777,color:#000
    classDef state fill:#e9d5ff,stroke:#9333ea,color:#000

    class H human
    class A agents
    class S skills
    class O orch
    class St state
Loading

Each layer has a clear scope of responsibility. Layers don't substitute for each other: agents don't write scripts, scripts don't make judgments, humans don't do routine validation.

Detailed description of each layer and their interactions — ARCHITECTURE.md.

Key mechanisms

Tier-aware acceptance. Each engagement is classified at intake into one of three tiers:

Tier Use case Adversary Director Mechanical checks
S Hotfix, button tweak, single deliverable None — human glance None 6
M Feature, landing, dashboard, multi-specialist 1× peer-opus Judge mode 13
L Rebrand, multi-wave, cross-domain 5× consilium Judge + adjudication 21

Adversary in filesystem-isolated subprocess. Two-pass design against framing contamination:

  • Pass 1 (Blind). Adversary sees a curated copy of engagement/ without handoff.md, without acceptance-log, without other reviewers. Forms preliminary findings without contamination.
  • Pass 2 (Informed). Adversary receives full state plus its own preliminary findings injected via prompt. Confirms, refines, or retracts findings. Delta preliminary→final is a contamination signal.

L-tier consilium. 5 reviewers in parallel: Anthropic Opus + 2× OpenAI GPT-5 (Codex) + Anthropic Sonnet + Anthropic Haiku. Cross-family disagreements are detected automatically and flagged for manual review.

Director as judge, not sweep-runner. On M/L the director issues a verdict per directive with explicit adjudication on every disagreement between adversary and author. Doesn't dispatch, doesn't edit content, doesn't re-run validators. Adjudication completeness is enforced mechanically — every finding must have a decision marker.

Human as supreme judge. Between consilium synthesis and director verdict the human gets a chat-ready summary (≤2 minutes to read) and responds in one of three forms: PROCEED / REJECT: <reason> / DIRECTED: <what to change>. No 200 lines of markdown — the system formats and expands it.

Mechanical safety baseline. Exit-code gates run at every transition: danger-scan (DROP / force-push / prod-deploy registry), handoff-precheck (tier-aware structural verification), handoff-paths-check (phantom path detection), director-verdict-check (adjudication completeness), preflight (tools availability).

Audit trail by FS state. Engagement = directory. State is read from files: iteration, validation-log.md, validation-outputs/*.json, consilium-summary.md, human-directive.md, acceptance-log.md. No databases, no external logs — cat reconstructs the picture completely.

Engagement flow

sequenceDiagram
    autonumber
    participant U as Human
    participant SK as agency-intake (skill)
    participant L as Domain Lead
    participant SP as Specialists
    participant V as Validators
    participant SC as Scripts (orchestration)
    participant D as Director

    U->>SK: trigger phrase
    SK->>SK: classify → criteria.md (S/M/L)
    SK->>L: handoff to lead
    L->>SP: dispatch tasks
    SP->>L: executor-reports/
    L->>V: dispatch validators
    V->>L: validation-outputs/*.json
    L->>SC: handoff-precheck.py
    SC-->>L: exit 0 / fail
    SC->>SC: adversary.py --consilium {M|L}
    SC->>SC: consilium-synth.py
    SC->>U: consilium-present.py (chat summary)
    U->>SC: PROCEED / REJECT / DIRECTED
    SC->>SC: human-directive.py
    SC->>D: invoke director (judge mode)
    D->>D: write acceptance-log.md per directive
    D->>SC: director-verdict-check.py
    SC-->>U: ACCEPT — engagement-archive.py
Loading

S-tier skips adversary, consilium and director phase: producer self-attests, mechanical checks gate, human accepts directly.

What's inside

Agents (60)

Category Count Roles
Directors 3 dev-director, design-director, marketing-director — judge mode on M/L
Leads 11 3 top-leads (dev/design/marketing) + 8 mid-leads (product, engineering, quality, brand, product-design, traffic, content, analytics)
Specialists 20 backend, frontend, fullstack, devops, qa, tech-architect, product-analyst, technical-writer; ux, ui, visual, brand-strategist, presentation; copywriter, banner-designer, seo, ppc, keyword-researcher, web-analyst, ai-visibility
Validators 26 code-reviewer, security-auditor, accessibility, performance, migration, test-reviewer, reality-checker, skeptic, completeness, task/tech-spec/user-spec validators, infra/deploy reviewers, pre/post-deploy QA, anti-pattern detector, ux-review, skill-checker, etc.

Skills (45)

Category Count What's in it
Agency protocol 6 agency-intake, engagement-protocol, director-acceptance-protocol, validation-pipeline, docs-pipeline, codex-bridge
Dev methodology 18 TDD, code review, spec planning (user/tech), task decomposition, deploy, security, infrastructure, prompt engineering, persistent tasks, pre/post-deploy QA
Design methodology 8 brand, design system, UI/UX, presentation, banner, design tokens
Marketing methodology 4 SEO auditing, semantic drift, AI visibility, task decomposition
Regional SEO/PPC stack 6 API integrations for Russian-market analytics platforms (Webmaster, Metrika, Direct, Wordstat, Search)
Skill development 3 skill authoring, test design, testing

Frontmatter tags for the router: [PROTOCOL], [METHODOLOGY], [TOOL].

Scripts (21)

12 main:

  • adversary.py — bridge for 5 reviewer roles with two-pass curated-view isolation
  • consilium-synth.py — adversary output aggregation, two-stage dedup
  • consilium-present.py — chat-ready format with decision menu
  • director-verdict-check.py — mechanical adjudication completeness
  • handoff-precheck.py — hard-gate tier dispatch (S=6 / M=13 / L=21 checks)
  • human-directive.py — scaffold human-directive.md from CLI args
  • preflight.py — tools availability check
  • danger-scan.py — registry of dangerous operations
  • handoff-paths-check.py — phantom path detection
  • cross-val-check.py — verbatim quote verification
  • trace-schema-check.py — trace JSON schema + staleness
  • engagement-archive.py — idempotent archival

Plus optional/ — opt-in utilities outside the core protocol (see scripts/optional/README.md).

Setup

Requirements

  • Claude Code
  • Codex
  • Python 3.10+
  • (Optional) Yandex API tokens — for marketing skills (Webmaster, Metrika, Direct, Wordstat, Search)

Installation

  1. Clone the repository:

    git clone https://github.com/ElPinus/agentic-workflow.git
    cd agentic-workflow
  2. Copy contents to ~/.claude/:

    cp -r agents/* ~/.claude/agents/
    cp -r skills/* ~/.claude/skills/
    cp -r scripts/* ~/.claude/scripts/

    (On Windows — corresponding paths in %USERPROFILE%\.claude\.)

  3. Configure Codex MCP:

    cp .mcp.json.example .mcp.json

    Set the absolute path to the codex CLI.

  4. (Optional) Configure Yandex API:

    cp .env.example .env

    Fill in tokens if you use marketing skills.

  5. Restart Claude Code — verify that MCP tools are visible.

Quickstart

Entry point — trigger phrase in chat. Both English and Russian are recognized out of the box:

new task <description>

or

мне надо сделать задачу <description>

Add or adjust phrasings in the agency-intake skill's Use when: list to match your team's vocabulary.

The system then autonomously runs the engagement through all layers. On M/L you get a chat summary with a decision menu — respond with a short verdict.

Detailed flow and role of each layer — ARCHITECTURE.md.

License

MIT (see LICENSE)

About

Multi-agent framework for Claude Code: tier-aware acceptance, filesystem-isolated adversary review, cross-family second-opinion via Codex MCP, human as supreme judge

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors