Dispatch-protocol hardening: rename Task→Agent + dispatch_gate + task_lifecycle_gate + bootstrap_gate F24/F25 (#662) by michael-wojcik · Pull Request #663 · Synaptic-Labs-AI/PACT-Plugin

michael-wojcik · 2026-05-06T23:20:47Z

Summary

Closes #662. Single PR, 5 atomic commits, hardens the PACT specialist-dispatch protocol against the silent-fail-open class that produced #662 itself.

SHA	Subject
`585bd20`	fix(dispatch): correct `4c286c1` rename + harden bootstrap_gate (F24/F25)
`cff3697`	feat(dispatch_gate): PreToolUse Agent gate (F1-F7+F14+F15+F21+F23+F26)
`bfd8009`	feat(task_lifecycle_gate): PostToolUse TaskCreate\|TaskUpdate gate (F8-F13+F23)
`13e4662`	docs+chore: F22 runbook + PACT_DISPATCH_F7_MODE shadow-mode + v4.2.0
`c6f95d6`	test: comprehensive coverage (+87 tests)

Plugin version: 4.1.2 → 4.2.0 (minor — new gate capabilities).

What this fixes

The orchestrator persona authoritatively documented Task(...) as the specialist-spawn tool, but Claude Code's actual platform tool is Agent. When a --agent-flag session reads the persona, finds no Task tool in its surface, and falls back to Agent without name=/team_name=, every spawned agent silently runs without Agent-Teams coordination — and the orchestrator rationalizes the missing tools as "degraded mode" rather than treating it as a HARD STOP.

This PR closes 27 silent-failure paths (F1-F27) plus the bootstrap-marker bypass class.

Cat-1 vs Cat-2 rename discipline

Cat-1 (renamed): spawn-tool token Task→Agent across persona, commands, skills, protocols, hooks.json L66/L187 matchers, bootstrap_gate _BLOCKED_TOOLS. Pre-edit: 4 hits in agents/commands/skills/protocols. Post-edit: 0 hits.
Cat-2 (preserved): task-management tools TaskCreate/TaskUpdate/TaskList/TaskGet/TaskStop/TaskOutput — these are NOT spawn-tool references. Baseline 551 → post-edit 596 (grew from new test code; zero Cat-2 names corrupted). hooks.json L196 TaskCreate|TaskUpdate matcher UNCHANGED — regression-prevention assertion in test_hooks_json.py.
Refresh-system carve-out: transcript_parser.py + patterns.py parametrized over Task|Agent to read historical session transcripts. Dispatch code itself is clean rename (no dual-naming). TASK_TOOL_PATTERN renamed to SPAWN_TOOL_PATTERN.

bootstrap_gate hardening

F24 marker provenance: bootstrap.md now writes JSON marker {v, sid, sig=SHA256(session_id|plugin_root|plugin_version|version)}. is_marker_set extends with size cap + JSON parse + key-set + version match + sid==session_dir.name + hmac.compare_digest signature verification. Closes the Bash(\"touch bootstrap-complete\") bypass surface (an attacker LLM could previously bypass bootstrap_gate by directly creating an empty marker; F24 produces an empty file → verification fails → gate continues to deny).
F25 fail-closed wrapper: retrofit on bootstrap_gate.py (and sibling bootstrap_prompt_gate.py) mirrors PR fix: add hookEventName to hookSpecificOutput emit sites (#658) #660 _emit_load_failure_deny pattern — stdlib-only helper before wrapped imports; cross-package imports in try/except BaseException; module-load failure → exit 2 + permissionDecision=deny + structured hookEventName=PreToolUse. Same defect class as merge_guard_pre.py emits hookSpecificOutput without hookEventName — deny path silently fails open #658.
Docstring rewrite: bootstrap_gate.py L19-24 + L57-62 docstring rewritten — the load-bearing propagation vector that misled commit 4c286c1f's incorrect rename direction. Future reviewers reading these lines now see accurate spawn-tool semantics.

New gates

dispatch_gate.py (PreToolUse, matcher='Agent')

Single evaluate_dispatch composition (anti-sprawl, ~250 LOC budget; verified by parametrized introspection test that no per-F-row functions snuck in):

F-row	Behavior
F1	`name=` empty → DENY
F2	`team_name=` empty → DENY (catches adversarial team_name='' before F5)
F3	NFKC-normalize → regex `^[a-z0-9-]+$` → length cap 64 → reserved-token ban {team-lead, lead, user, external, peer, unknown, solo} → DENY (marker-spoofing prevention)
F4	`subagent_type` not in cached FS-glob of `agents/pact-*.md` → DENY
F5	`team_name` doesn't match `pact_context.get_team_name()` (or empty source) → DENY
F14	`name=` already live in team `config.json` `members[]` → DENY (uniqueness)
F15	team config.json doesn't exist → DENY
F6	no Task assigned to `owner==name` → DENY
F7	prompt > 800 chars + mission-keywords + no TaskList reference → WARN (configurable: `PACT_DISPATCH_F7_MODE` warn\|deny\|shadow)
Carve-outs	SOLO_EXEMPT {general-purpose, Explore, Plan} and non-pact-* subagent_type → ALLOW
F23	every gate decision (ALLOW + WARN + DENY) emits journal `dispatch_decision` event
F26	prompt redaction at journal-write boundary strips sk-/xoxb-/ghp_/AKIA + JWT-shape tokens
F21	module-load failure → fail-closed deny

task_lifecycle_gate.py (PostToolUse, matcher='TaskCreate|TaskUpdate')

Single evaluate_lifecycle composition; PostToolUse cannot DENY, all output is advisory additionalContext:

F-row	Behavior
F8	TEACHBACK Task without `addBlocks=[B_id]` → advisory
F9	pact-* owned non-TEACHBACK Task without `addBlockedBy=[A_id]` → advisory
F10	team-lead marks pact-*-owned task completed without paired SendMessage to that owner within 120s → advisory
F11	pact-*-owned Task B completed with empty/missing `metadata.handoff` → advisory
F12	teammate self-completes task (and not in `is_self_complete_exempt()` carve-outs) → advisory + `metadata.completion_disputed=true` writeback to disk + `metadata.gate_writeback=true` recursion-marker self-skip
F13	`metadata.handoff` schema validation (required fields) — disjoint from F11 (F13 fires only when payload exists but malformed)
F23	journal `lifecycle_decision` event
F21	module-load failure → fail-closed advisory (PostToolUse cannot DENY; exit 0)

F12 actor identity uses trustworthy_actor_name() from shared/dispatch_helpers.py — agent_id-derived only (harness-trustworthy paths 2 + 3 per resolve_agent_name 5-step chain); does NOT fall back to teammate-spoofable tool_input fields.

Persona body additions

First-spawn-verification step in bootstrap skill (verify team membership after first specialist dispatch).
HARD-STOP framing for "missing tools" reports: when a teammate reports TaskList/SendMessage/TaskUpdate not loaded → HARD STOP, dispatch protocol violation, NOT degraded mode.
WARN-means-STOP-and-re-dispatch reinforcement near the canonical dispatch form so the calling LLM doesn't rationalize past F7 advisories.

F22 post-merge validation runbook

pact-plugin/tests/runbooks/662-dispatch-gate.md (NEW) documents:

Matcher-mutation counter-test (mutate hooks.json matcher to 'WrongName' → gate doesn't fire → revert; proves matcher is load-bearing — same defect class as merge_guard_pre.py emits hookSpecificOutput without hookEventName — deny path silently fails open #658)
F18 Bash-marker-bypass closure: Bash(\"touch bootstrap-complete\") produces empty file → F24 verification fails → gate continues to deny
F7 advisory injection empirical observation (informs future warn → deny upgrade decision)
F25 sabotaged-import fail-closed counter-test
Pass/fail criteria + rollback procedure
RUNBOOK_RUN_DATES.md log entry (denominator /8)

Per pinned memory: hooks cannot be smoke-tested in-session (loaded at session start, not on file change). Validation is a manual post-merge step in a fresh session.

Tests

Test cardinality: 7244 → 7331 (+87 tests). 0 regressions.
pyright clean on all new files (CLI; IDE-side stale-cache shows benign "unresolved import" warnings that don't affect runtime or CI).
Test discipline per PR fix: add hookEventName to hookSpecificOutput emit sites (#658) #660 R2: sys.modules pop only the module under test, never shared.*; snapshot+restore on teardown.
Counter-test cardinality discipline per fix: extend _extract_task_id to probe nested tool_response.task.id (#620) #638 (mutate matcher → assert no fire).

Test plan

All existing tests pass (7331/7331)
pyright clean on new code
Cat-2 preservation grep audit (596 ≥ 551 baseline)
hooks.json L196 unchanged (regression-prevention assertion)
Post-merge fresh-session validation per tests/runbooks/662-dispatch-gate.md — REQUIRED before declaring 4.2.0 production-stable

Architectural deviation flagged for follow-up

Backend-coder-3 implemented F12-on-unresolvable-actor as skip (no advisory) when trustworthy_actor_name returns None; architect §5.3 specified advisory-emit. Encoded in test_f12_skips_when_actor_unresolvable_documents_architect_5_3_deviation for visibility. Follow-up issue to be filed post-merge.

Cross-references

merge_guard_pre.py emits hookSpecificOutput without hookEventName — deny path silently fails open #658 (defect class — silent fail-open in hookEventName)
PR fix: add hookEventName to hookSpecificOutput emit sites (#658) #660 (defense-in-depth precedent — _emit_load_failure_deny pattern)
v4.0.0: --agent-flag orchestrator + lazy-load protocols (Option F) #621 / Phase 2 follow-ups (this is Phase 2.5)
fix: extend _extract_task_id to probe nested tool_response.task.id (#620) #638 (hooks-not-smoke-testable-in-session pinned memory)
Commit 4c286c1f ("fix(security): rename Agent→Task in bootstrap_gate._BLOCKED_TOOLS", 2026-05-05) — corrected by 585bd20

Spawn-tool token in Claude Code is `Agent`, not `Task`. Commit 4c286c1 swapped the rename direction in bootstrap_gate._BLOCKED_TOOLS (Agent→Task) based on misread cross-evidence; this commit restores `Agent`. Cat-1 rename Task→Agent across persona, commands, skills, protocols, and hooks.json L66/L187 spawn-tool matchers. L196 `TaskCreate|TaskUpdate` preserved (Cat-2 task-management tools). Cat-2 baseline ≥551 verified by new test_hooks_json regression-prevention assertions. bootstrap_gate.py changes: - _BLOCKED_TOOLS swapped Task→Agent - L19-24 + L57-62 docstring rewritten (the propagation vector that misled 4c286c1) - F25 fail-closed wrapper retrofit: stdlib-only _emit_load_failure_deny defined before wrapped imports; mirrors PR #660 merge_guard_pre.py - F24 marker provenance: is_marker_set extends with size cap, JSON parse, key-set, version match, sid==session_dir.name, and hmac.compare_digest signature verification — closes the Bash("touch bootstrap-complete") bypass bootstrap_prompt_gate.py: F25 sibling retrofit; UserPromptSubmit cannot DENY so emits advisory additionalContext on load failure. commands/bootstrap.md: now produces F24-stamped marker JSON {v, sid, sig=SHA256(session_id|plugin_root|plugin_version|version)}. Refresh transcript-parser parametrized over Task|Agent (carve-out from clean rename — historical session transcripts contain Task literals). TASK_TOOL_PATTERN renamed to SPAWN_TOOL_PATTERN. Persona body (pact-orchestrator.md): first-spawn-verification step; HARD-STOP framing for "missing tools" reports (no degraded-mode rationalization); WARN-means-STOP-and-re-dispatch reinforcement. Tests: 113 passed in smoke; 7232 passed in full suite. F24 cardinality 8 cases; F25 fail-closed counter-test. F20 frontmatter audit with pact-orchestrator in CARVE_OUT_FILES (--agent-loaded, not Agent-Teams- spawned).

Adds dispatch_gate.py — a PreToolUse hook on Agent spawn that enforces F1-F7, F14, F15, F21, F23, F26 against pact-* specialist dispatches. Closes the silent fail-open class where the orchestrator persona's dispatch instructions could diverge from actual spawn-tool surface and degrade into "missing tools, proceeding anyway" rationalization. F-row enforcement (single evaluate_dispatch composition, anti-sprawl): - F1: name= empty -> DENY - F2: team_name= empty -> DENY (catches adversarial team_name='' before F5) - F3: NFKC-normalize -> regex ^[a-z0-9-]+$ -> length cap 64 -> reserved-token ban {team-lead, lead, user, external, peer, unknown, solo} -> DENY (marker-spoofing prevention) - F4: subagent_type not in cached FS-glob of agents/pact-*.md -> DENY - F5: team_name doesn't match pact_context.get_team_name() (or empty source) -> DENY - F14: name= already live in team config.json members[] -> DENY - F15: team_name's config.json doesn't exist -> DENY - F6: no Task assigned to owner==name in team task files -> DENY - F7: prompt > 800 chars + mission-keywords + no TaskList reference -> WARN (advisory; persona body reinforces "WARN means STOP and re-dispatch correctly") - Carve-outs: SOLO_EXEMPT {general-purpose, Explore, Plan} and non-pact-* subagent_type -> ALLOW F21 fail-closed wrapper mirrors PR #660 _emit_load_failure_deny: stdlib-only helper before wrapped imports; cross-package imports in try/except BaseException; exit 2 + permissionDecision=deny on any module-load failure. F23 emits a session-journal dispatch_decision event on every gate verdict so denies are auditable, not visible only to the calling LLM. F26 prompt redaction strips sk-/xoxb-/ghp_/AKIA literal-prefix tokens + JWT-shape regex before append_event. shared/dispatch_helpers.py extracts helpers reused by task_lifecycle_gate (Commit 3): is_registered_pact_specialist, has_task_assigned, trustworthy_actor_name, SOLO_EXEMPT, F24_MARKER_VERSION. Smoke tests (7): happy-path ALLOW, F1/F2/F3 DENY, SOLO_EXEMPT carve-out, F21 fail-closed counter-test via subprocess + PYTHONSAFEPATH=1 + sabotaged dispatch_helpers, F26 redaction verification. Test cardinality: 7 smoke pass; 7245 full-suite pass / 17 skip / 0 fail. Cat-2 token preservation 592 (>=551 baseline).

Adds task_lifecycle_gate.py — a PostToolUse hook on TaskCreate|TaskUpdate that emits advisory output for F8-F13 violations and writes back metadata.completion_disputed=true on F12 self-completions, with a metadata.gate_writeback recursion-marker self-skip. PostToolUse cannot DENY; output is hookSpecificOutput.additionalContext advisory; exit 0 always. F-row enforcement: - F8: TEACHBACK Task created without addBlocks=[B_id] -> advisory - F9: pact-* owned non-TEACHBACK Task created without addBlockedBy -> advisory - F10: team-lead marks pact-*-owned task completed without paired SendMessage to that owner within last 120s -> advisory - F11: pact-*-owned Task B completed with empty/missing metadata.handoff -> advisory - F12: pact-*-owned task transitions to completed AND actor (via trustworthy_actor_name from agent_id, harness-trustworthy paths only) is the owner AND owner not in is_self_complete_exempt() carve-outs -> advisory + direct FS writeback metadata.completion_disputed=true, gate_writeback=true via atomic .tmp+os.replace - F13: completion-time metadata.handoff schema validation (required fields produced, decisions, reasoning_chain, uncertainty, integration, open_questions); disjoint from F11 (F13 fires only when payload exists but malformed) Recursion guard at evaluate_lifecycle entry: tool_input.metadata. gate_writeback=True -> silent skip, prevents F12 self-trigger on the gate's own writeback. F12 actor identity uses trustworthy_actor_name from shared/dispatch_helpers.py — agent_id-derived only (harness-trustworthy paths 2 and 3 per PREPARE inventory); does NOT fall back to teammate-spoofable tool_input fields. F21 fail-closed wrapper mirrors bootstrap_gate.py pattern: stdlib-only _emit_load_failure_advisory helper before wrapped imports; advisory output (cannot DENY) on module-load failure; exit 0. F23 emits session-journal lifecycle_decision events for advisories. Hook co-located with wake_lifecycle_emitter under existing PostToolUse matcher='TaskCreate|TaskUpdate' (architect §13 Q1 single-matcher-two-hooks). wake_lifecycle_emitter fields unchanged. Smoke tests (6): F8/F9/F11+F13-disjoint advisories, F12 writeback to disk verification, recursion-marker self-skip counter-test, F21 fail-closed counter-test. Test cardinality: 6 smoke pass; 7238 full-suite pass / 17 skip / 0 fail (now 7245+6 ~= 7251 with both new smoke suites).

Adds the post-merge fresh-session validation runbook for the dispatch-protocol hardening, makes F7 mode runtime-configurable via PACT_DISPATCH_F7_MODE, and bumps the plugin version 4.1.2 → 4.2.0. Runbook (tests/runbooks/662-dispatch-gate.md) documents: - F22 matcher mutation counter-test (mutate hooks.json matcher to 'WrongName' -> gate doesn't fire -> revert; proves matcher is load-bearing) - F18 Bash-marker-bypass closure: Bash("touch bootstrap-complete") produces empty file -> F24 marker-provenance verification rejects -> bootstrap_gate continues to deny - F7 advisory injection empirical observation (informs future warn -> deny upgrade decision) - F25 sabotaged-import fail-closed counter-test - Pass/fail criteria + rollback procedure - RUNBOOK_RUN_DATES.md gets a 662-dispatch-gate stub entry (denominator /8 per existing runbook §5 convention) dispatch_gate.py F7_MODE constant replaced with module-load read of os.environ.get("PACT_DISPATCH_F7_MODE", "warn"). Allowed values: - "warn" (default, advisory output, behavior unchanged) - "deny" (future calibration upgrade — DENY when F7 conditions match) - "shadow" (silent ALLOW; journals as WARN_SHADOWED for calibration data collection without user-visible advisory) Unknown values fall back to warn. README.md (plugin) gains a Configuration section documenting the env-var. 4-file version dance: - pact-plugin/.claude-plugin/plugin.json (authoritative) - .claude-plugin/marketplace.json - README.md (root) — plugin-cache path reference - pact-plugin/README.md `rg -n '4\.1\.2'` returns 0 hits. Tests: 32 pass (test_hooks_json + test_dispatch_gate_smoke); pyright on dispatch_gate.py: 0 errors. F7_MODE env-var sanity verified manually (=shadow -> shadow; =bogus -> warn fallback). Closes #662.

Adds two new test files and extends F20 frontmatter audit: - tests/test_dispatch_gate.py (NEW, 51 parametrized tests) — F1, F2, F3 (NFKC corpus + length-cap-64 boundary + 7 reserved tokens), F4, F5 (mismatch + empty-source per architect §7(h)), F6, F7 (all 3 PACT_DISPATCH_F7_MODE modes warn|deny|shadow including journal-only ALLOW), F14 (uniqueness), F15, F21 (subprocess + PYTHONSAFEPATH=1 fail-closed counter-test), F23 (journal emit on every verdict), F26 (5 credential patterns + JWT-shape with adjacent-string-literal- concat to bypass pre-commit secret-scanner false-positives), SOLO_EXEMPT carve-outs, non-pact-* pass-through, defensive (malformed stdin / non-target tool), anti-sprawl invariant via inspect introspection. - tests/test_task_lifecycle_gate.py (NEW, 23 tests) — F8, F9, F10 (119s vs 121s SendMessage-recency boundary), F11, F12 (writeback + carve-outs for secretary + signal-task, recursion-marker self-skip), F12-on-unresolvable-actor (encodes CURRENT skip behavior with deviation-documenting test name; follow-up issue post-merge for architect §5.3 reconciliation), F13 (6 missing- required-field params + non-dict + F11/F13 disjointness), F21 (PostToolUse advisory fail-closed), anti-sprawl. - tests/test_skills_structure.py (extended) — F20 parametrized audit walking pact-plugin/agents/pact-*.md asserting `pact-agent-teams` in skills frontmatter, F20_CARVE_OUT_FILES = {"pact-orchestrator"} (orchestrator is --agent-loaded, not Agent-Teams-spawned). Test cardinality: 7244 -> 7331 (+87 tests). 0 regressions. pyright clean on new files (CLI; IDE-side stale-cache shows benign import warnings that don't affect runtime or CI). Smoke tests retained intact — subprocess+PYTHONSAFEPATH F21 mechanism is unique there. F22 fresh-session validation deferred to post-merge runbook tests/runbooks/662-dispatch-gate.md per hooks-cannot-be-smoke-tested- in-session discipline. Auditor YELLOW notes addressed: (1) LOC overshoot — anti-sprawl invariant verified via parametrized introspection of single evaluate_dispatch / evaluate_lifecycle composition; no per-F-row sprawl. (2) PACT_DISPATCH_F7_MODE — tri-state tested across all 3 modes. Closes #662.

has_task_assigned read `~/.claude/teams/{team_name}/tasks/` but the canonical task store is `~/.claude/tasks/{team_name}/` (per shared/task_utils.py L49). On main this caused every legitimate pact-* specialist dispatch to F6-DENY in production; the bug was masked because tests/_seed_team wrote to the same wrong path. Fix: - shared/dispatch_helpers.py L130: path corrected to canonical store - tests/_seed_team helpers in test_dispatch_gate.py and test_dispatch_gate_smoke.py write tasks at the canonical path; team config.json stays under teams/{team_name}/ - 3 new regression tests (test_dispatch_gate.py): canonical-only path satisfies has_task_assigned; legacy-only path does not; cross- references task_utils.py to lock the path against future drift Counter-test cardinality verified per #638 discipline: temp-revert of the path fix → 3/3 new tests fail; revert restored → 61/61 dispatch tests pass. Test cardinality: 7331 → 7334 (+3). Zero regressions. pyright clean on changed files.

The bootstrap_gate.is_marker_set verifier docstring previously framed the SHA256-stamped marker contents as cryptographic provenance backed by "would-be secrets" the attacker cannot forge. That overstates the defense: all four signature inputs (session_id, plugin_root, plugin_version, marker_version) are readable from the same-user filesystem, so a same-user attacker with Python execution can recompute the digest. Rewritten to accurately describe the check as a marker-content fingerprint (not a MAC) that closes the trivial Bash-touch bypass and raises attacker effort + creates a detection surface. Also tightens the corresponding producer comment in commands/bootstrap.md so the human-facing description matches the verifier. No code change; documentation accuracy only. Full test suite unchanged at 7334 passed / 18 skipped.

Adjusts the plugin version from 4.2.0 to 4.1.3 across the four canonical version sites plus runbook references. The dispatch-protocol changes in this branch enforce a contract that was already documented in the orchestrator persona; the new gates complete an existing protocol's implementation rather than introduce a new user-facing capability. A patch bump matches the conservative read. Files updated: pact-plugin/.claude-plugin/plugin.json (authoritative), .claude-plugin/marketplace.json, README.md, pact-plugin/README.md, plus runbook prerequisites and the run-dates table.

Renames the user-facing dispatch-gate env-var that controls the inline-mission heuristic (long prompt + mission keywords + missing TaskList reference) to describe what the gate actually checks rather than carrying a planning-index label. Old: PACT_DISPATCH_F7_MODE New: PACT_DISPATCH_INLINE_MISSION_MODE Allowed values unchanged (warn|deny|shadow); default unchanged (warn); unknown-fallback unchanged (warn). Updated module docstring, README configuration table, and runbook references. Internal Python constant and source comments referencing the planning index remain pending in a follow-up purge along with the deny-message text, journal field, and remaining cross-surface cleanup. Test cardinality unchanged at 7334/18.

Replaces planning-index labels in user-facing dispatch-gate and task-lifecycle-gate output with descriptions of what each check actually verifies. Two surfaces touched: 1. Deny / advisory message strings (visible to the calling LLM via permissionDecisionReason / additionalContext): each gate rule now describes the violation behaviorally rather than naming a label. Example shape change: - before: "PACT dispatch_gate F3: name 'foo bar' violates ^[a-z0-9-]+$" - after: "PACT dispatch_gate: name 'foo bar' must match ^[a-z0-9-]+$ (lowercase alphanumerics + hyphens)" 2. Journal event field renamed from `f_row` to `rule`, values changed from labels to behavioral identifier strings: - dispatch_gate: name_required, team_name_required, name_too_long, name_invalid_regex, name_reserved_token, specialist_not_registered, team_name_mismatch, team_name_unavailable, no_task_assigned, long_inline_mission, name_not_unique, plugin_agents_missing. Length / regex / reserved checks were a single label before; they are now three separate rules. - task_lifecycle_gate: teachback_addblocks_missing, work_addblockedby_missing, completion_no_paired_send, handoff_missing, self_completion, handoff_schema_invalid. Lifecycle gate return type changed from list[str] (messages only) to list[tuple[rule, message]] so the journal records both the structured rule and the human-readable advisory. Test fixtures updated: assertions on the field name, rule values, and message substrings now match the behavioral phrasing. Test cardinality unchanged at 7334 passed / 18 skipped.

sections in behavioral terms Replaces the planning-index labels still embedded in the dispatch-gate, task-lifecycle-gate, and bootstrap-marker code paths with names that describe what each piece does. Surfaces touched: - Module docstrings on the two new gate files describe each rule by what it checks (e.g., "long inline mission heuristic") rather than by an index label. - Inline comments and section dividers across the gate code, the bootstrap-marker producer / verifier, and the helpers module use behavioral phrasing. - Test names rewritten across six gate test files: every `test_f<n>_*` now describes the behavior under verification (e.g., `test_deny_when_name_empty`, `test_deny_when_name_invalid_regex`, `test_skips_when_actor_unresolvable`). Test docstrings carry any rationale that was previously baked into the name. - Runbook section headers and body prose rewritten: "Matcher registration fidelity (counter-test by mutation)", "Bootstrap-marker provenance check", "Module-load fail-closed", "Inline-mission advisory observation" replace the old index-labeled headings. - The runbook's "Run Dates" log entries reference the runbook by its filename only. Internal Python identifiers with no user-visible counterpart and the schema-version constant for the marker remain pending in a small follow-up. Test cardinality unchanged at 7334 passed / 18 skipped. pyright clean on the five gate source files.

Renames the three path-alignment regression tests from index-prefixed labels to descriptions of what each test verifies: - test_canonical_path_satisfies_no_task_assigned - test_legacy_path_alone_does_not_satisfy_no_task_assigned - test_canonical_path_aligns_with_task_utils Section header comment dropped its label and the docstring rationale reads as ahistorical commentary about an implementation that previously read the legacy path. 3 tests pass; full suite unchanged at 7334 / 18.

Removes the backwards-compat alias that was retained so older monkeypatch sites continued to work. The five test fixture sites are now updated to use the canonical INLINE_MISSION_MODE name directly, and the alias line in dispatch_gate is gone. Also drops a few historical provenance phrases left in test docstrings ("R2-B1 / commit 5b12f80", "Pre-R2-B1", and a similar phrase referencing a prior PR-cycle label in the path-alignment fixture's docstring). Test cardinality unchanged at 7334 / 18; pyright clean.

Renames the bootstrap-marker schema-version constant from a planning-index name to one that describes its role. Producer and verifier are updated in lockstep so the marker-stamp script in the bootstrap command and the content-fingerprint verification in the bootstrap-gate module remain bound by the integer schema value. Renames: - bootstrap_gate marker schema constant -> MARKER_SCHEMA_VERSION - bootstrap_gate marker size-cap constant -> _MARKER_MAX_BYTES Sites updated: marker-schema constant references in hooks/bootstrap_gate.py (declaration + docstring + two verifier sites), hooks/shared/dispatch_helpers.py (re-export comment + coupling cross-reference), commands/bootstrap.md (producer-coupling comment), tests/test_bootstrap_gate.py (import + assertion + docstring), and the runbook section bodies. Test cardinality unchanged at 7334 passed / 18 skipped. pyright clean.

Closes a confused-deputy bypass: the task-lifecycle gate's lead-only- completion advisory was suppressed when the owning agent's name matched the self-completion-exempt set. The dispatch gate did not reserve those same names, so a spawn could choose one as its name and defeat the central completion-authority invariant the gates exist to enforce. Reserved names now include `secretary` and `pact-secretary` (the two self-completion-exempt agents). A subset-invariant test mechanically prevents future drift: any addition to the exempt set without a matching addition to the reserved-name set will fail test_self_complete_exempt_agents_are_all_reserved. Two follow-ups folded in: - Smoke helper return-type narrowed to match the comprehensive helper (1-line `int()` coercion fix surfaced by pyright). - Two surviving s-prefixed smoke test names renamed to behavioral identifiers per the no-planning-artifacts rule. Test cardinality 7334 -> 7337 (+3 = two reserved-name parametrize cases + the subset-invariant test). Pyright clean across the gate sources and smoke test.

redaction, and consolidate sources of truth Several gate-correctness improvements that close real defects without expanding architectural scope: - Spawn-name regex now requires at least one alphanumeric character, rejecting degenerate forms (single hyphen, only hyphens, leading or trailing hyphen) that the previous looser pattern accepted. - Session team-name normalization happens once at gate entry (strip + lowercase) and the normalized form flows through every rule. The earlier code lowercased only at the session-equality comparison, so the registry / member-uniqueness / task-assignment lookups could see a different casing than the comparison did, producing inconsistent verdicts for mixed-case team-name input. - The lifecycle gate's local copy of the self-completion-exempt agent set has been removed; the canonical set is now imported from the shared intentional-wait module. This eliminates a drift surface that could re-open the dispatch / lifecycle bypass closed by the earlier reserved-name extension and the cross-module subset invariant test. - The has_task_assigned helper now delegates path construction and per-file reading to the canonical task-utils helper, removing the path-layout duplication that previously caused a divergence between the helper and the harness. - Journal-write redaction now covers Anthropic api keys (sk-ant- and api03 variants), GitHub OAuth / user / server / refresh tokens (gho_, ghu_, ghs_, ghr_), Google api keys (AIza prefix), and PEM private-key blocks (multi-line non-greedy). - The subset-assertion test docstring documents the categorical pattern: any future privilege class keyed on owner-name must live in a shared module and carry its own subset assertion against the reserved-name set, so the same defect class cannot recur. - Comprehensive test for the missing-handoff rule on lifecycle completion (parametrized over absent, empty-dict, and null shapes, with paired assertions that the schema-invalid rule does not also fire — pinning the disjointness invariant). Test cardinality: 7337 -> 7352 (+15). Pyright clean across the gate sources.

… for BOTH flag-walks Closes the authorization-mismatch bypass surfaced in PR #697 review (F-5). The original PR's `_GH_PR_NUMBER_RE` used the broad `_GH_PREFIX` (with `_GH_GLOBAL_FLAGS`) for the pre-subcommand walk, which re-anchored at the SECOND `gh pr merge {N}` occurrence in commands containing heredoc bodies with embedded merge-command literals. Concrete attack scenario: gh pr merge 663 --body "$(cat <<EOF Fixes #999. See related: gh pr merge 999 --admin EOF )" --squash OLD: regex captured 999 (from heredoc body) → AskUserQuestion prompted operator for PR #999 → operator approves (legitimate cross-link in body) → token written for #999 → PostToolUse consumes #999 token → actual command merges #663. Operator authorized the wrong PR. NEW: regex uses `_GH_FLAG_TOKENS` for BOTH the pre-subcommand walk AND the post-subcommand walk. `_GH_FLAG_TOKENS` matches only flag-shaped tokens (`-x`, `--long`, optionally `--flag value`), so it cannot re-anchor at heredoc-embedded `gh pr` literals. The first `gh pr merge` match wins; subsequent occurrences in body content are ignored. Test changes: - `test_heredoc_body_with_embedded_gh_pr_merge` converted from strict-xfail to passing test in new TestGH_PR_NumberRE_AuthorizationBypassFixed class - New `test_authorization_mismatch_attack` pins the end-to-end attack shape that this fix prevents - Other xfail (`test_branch_name_with_digit_prefix_suffix_match`, the 7352-tests case) remains xfail-strict — different root cause (Python `\b` boundary semantics at digit-to-hyphen). Tracked separately for follow-up. Verification: 13 tests in test_merge_guard_pre.py — 12 passing + 1 xfailed (was 11 + 2). Full suite: 7573 passed (up from 7561). Empirical probe script /tmp/probe_s1.py confirms heredoc case now captures 663 not 999; all 9 prior TRUE GAINS still pass.

michael-wojcik added 16 commits May 6, 2026 18:22

michael-wojcik merged commit 96329a3 into main May 7, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dispatch-protocol hardening: rename Task→Agent + dispatch_gate + task_lifecycle_gate + bootstrap_gate F24/F25 (#662)#663

Dispatch-protocol hardening: rename Task→Agent + dispatch_gate + task_lifecycle_gate + bootstrap_gate F24/F25 (#662)#663
michael-wojcik merged 16 commits into
mainfrom
662-dispatch-protocol

michael-wojcik commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

michael-wojcik commented May 6, 2026

Summary

What this fixes

Cat-1 vs Cat-2 rename discipline

bootstrap_gate hardening

New gates

dispatch_gate.py (PreToolUse, matcher='Agent')

task_lifecycle_gate.py (PostToolUse, matcher='TaskCreate|TaskUpdate')

Persona body additions

F22 post-merge validation runbook

Tests

Test plan

Architectural deviation flagged for follow-up

Cross-references

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant