Dispatch-protocol hardening: rename Task→Agent + dispatch_gate + task_lifecycle_gate + bootstrap_gate F24/F25 (#662)#663
Merged
Conversation
Spawn-tool token in Claude Code is `Agent`, not `Task`. Commit 4c286c1 swapped the rename direction in bootstrap_gate._BLOCKED_TOOLS (Agent→Task) based on misread cross-evidence; this commit restores `Agent`. Cat-1 rename Task→Agent across persona, commands, skills, protocols, and hooks.json L66/L187 spawn-tool matchers. L196 `TaskCreate|TaskUpdate` preserved (Cat-2 task-management tools). Cat-2 baseline ≥551 verified by new test_hooks_json regression-prevention assertions. bootstrap_gate.py changes: - _BLOCKED_TOOLS swapped Task→Agent - L19-24 + L57-62 docstring rewritten (the propagation vector that misled 4c286c1) - F25 fail-closed wrapper retrofit: stdlib-only _emit_load_failure_deny defined before wrapped imports; mirrors PR #660 merge_guard_pre.py - F24 marker provenance: is_marker_set extends with size cap, JSON parse, key-set, version match, sid==session_dir.name, and hmac.compare_digest signature verification — closes the Bash("touch bootstrap-complete") bypass bootstrap_prompt_gate.py: F25 sibling retrofit; UserPromptSubmit cannot DENY so emits advisory additionalContext on load failure. commands/bootstrap.md: now produces F24-stamped marker JSON {v, sid, sig=SHA256(session_id|plugin_root|plugin_version|version)}. Refresh transcript-parser parametrized over Task|Agent (carve-out from clean rename — historical session transcripts contain Task literals). TASK_TOOL_PATTERN renamed to SPAWN_TOOL_PATTERN. Persona body (pact-orchestrator.md): first-spawn-verification step; HARD-STOP framing for "missing tools" reports (no degraded-mode rationalization); WARN-means-STOP-and-re-dispatch reinforcement. Tests: 113 passed in smoke; 7232 passed in full suite. F24 cardinality 8 cases; F25 fail-closed counter-test. F20 frontmatter audit with pact-orchestrator in CARVE_OUT_FILES (--agent-loaded, not Agent-Teams- spawned).
Adds dispatch_gate.py — a PreToolUse hook on Agent spawn that enforces
F1-F7, F14, F15, F21, F23, F26 against pact-* specialist dispatches.
Closes the silent fail-open class where the orchestrator persona's
dispatch instructions could diverge from actual spawn-tool surface and
degrade into "missing tools, proceeding anyway" rationalization.
F-row enforcement (single evaluate_dispatch composition, anti-sprawl):
- F1: name= empty -> DENY
- F2: team_name= empty -> DENY (catches adversarial team_name='' before
F5)
- F3: NFKC-normalize -> regex ^[a-z0-9-]+$ -> length cap 64 ->
reserved-token ban {team-lead, lead, user, external, peer, unknown,
solo} -> DENY (marker-spoofing prevention)
- F4: subagent_type not in cached FS-glob of agents/pact-*.md -> DENY
- F5: team_name doesn't match pact_context.get_team_name() (or empty
source) -> DENY
- F14: name= already live in team config.json members[] -> DENY
- F15: team_name's config.json doesn't exist -> DENY
- F6: no Task assigned to owner==name in team task files -> DENY
- F7: prompt > 800 chars + mission-keywords + no TaskList reference ->
WARN (advisory; persona body reinforces "WARN means STOP and
re-dispatch correctly")
- Carve-outs: SOLO_EXEMPT {general-purpose, Explore, Plan} and
non-pact-* subagent_type -> ALLOW
F21 fail-closed wrapper mirrors PR #660 _emit_load_failure_deny:
stdlib-only helper before wrapped imports; cross-package imports in
try/except BaseException; exit 2 + permissionDecision=deny on any
module-load failure.
F23 emits a session-journal dispatch_decision event on every gate
verdict so denies are auditable, not visible only to the calling LLM.
F26 prompt redaction strips sk-/xoxb-/ghp_/AKIA literal-prefix tokens
+ JWT-shape regex before append_event.
shared/dispatch_helpers.py extracts helpers reused by
task_lifecycle_gate (Commit 3): is_registered_pact_specialist,
has_task_assigned, trustworthy_actor_name, SOLO_EXEMPT,
F24_MARKER_VERSION.
Smoke tests (7): happy-path ALLOW, F1/F2/F3 DENY, SOLO_EXEMPT carve-out,
F21 fail-closed counter-test via subprocess + PYTHONSAFEPATH=1 +
sabotaged dispatch_helpers, F26 redaction verification.
Test cardinality: 7 smoke pass; 7245 full-suite pass / 17 skip / 0 fail.
Cat-2 token preservation 592 (>=551 baseline).
Adds task_lifecycle_gate.py — a PostToolUse hook on TaskCreate|TaskUpdate that emits advisory output for F8-F13 violations and writes back metadata.completion_disputed=true on F12 self-completions, with a metadata.gate_writeback recursion-marker self-skip. PostToolUse cannot DENY; output is hookSpecificOutput.additionalContext advisory; exit 0 always. F-row enforcement: - F8: TEACHBACK Task created without addBlocks=[B_id] -> advisory - F9: pact-* owned non-TEACHBACK Task created without addBlockedBy -> advisory - F10: team-lead marks pact-*-owned task completed without paired SendMessage to that owner within last 120s -> advisory - F11: pact-*-owned Task B completed with empty/missing metadata.handoff -> advisory - F12: pact-*-owned task transitions to completed AND actor (via trustworthy_actor_name from agent_id, harness-trustworthy paths only) is the owner AND owner not in is_self_complete_exempt() carve-outs -> advisory + direct FS writeback metadata.completion_disputed=true, gate_writeback=true via atomic .tmp+os.replace - F13: completion-time metadata.handoff schema validation (required fields produced, decisions, reasoning_chain, uncertainty, integration, open_questions); disjoint from F11 (F13 fires only when payload exists but malformed) Recursion guard at evaluate_lifecycle entry: tool_input.metadata. gate_writeback=True -> silent skip, prevents F12 self-trigger on the gate's own writeback. F12 actor identity uses trustworthy_actor_name from shared/dispatch_helpers.py — agent_id-derived only (harness-trustworthy paths 2 and 3 per PREPARE inventory); does NOT fall back to teammate-spoofable tool_input fields. F21 fail-closed wrapper mirrors bootstrap_gate.py pattern: stdlib-only _emit_load_failure_advisory helper before wrapped imports; advisory output (cannot DENY) on module-load failure; exit 0. F23 emits session-journal lifecycle_decision events for advisories. Hook co-located with wake_lifecycle_emitter under existing PostToolUse matcher='TaskCreate|TaskUpdate' (architect §13 Q1 single-matcher-two-hooks). wake_lifecycle_emitter fields unchanged. Smoke tests (6): F8/F9/F11+F13-disjoint advisories, F12 writeback to disk verification, recursion-marker self-skip counter-test, F21 fail-closed counter-test. Test cardinality: 6 smoke pass; 7238 full-suite pass / 17 skip / 0 fail (now 7245+6 ~= 7251 with both new smoke suites).
Adds the post-merge fresh-session validation runbook for the
dispatch-protocol hardening, makes F7 mode runtime-configurable via
PACT_DISPATCH_F7_MODE, and bumps the plugin version 4.1.2 → 4.2.0.
Runbook (tests/runbooks/662-dispatch-gate.md) documents:
- F22 matcher mutation counter-test (mutate hooks.json matcher to
'WrongName' -> gate doesn't fire -> revert; proves matcher is
load-bearing)
- F18 Bash-marker-bypass closure: Bash("touch bootstrap-complete")
produces empty file -> F24 marker-provenance verification rejects ->
bootstrap_gate continues to deny
- F7 advisory injection empirical observation (informs future warn ->
deny upgrade decision)
- F25 sabotaged-import fail-closed counter-test
- Pass/fail criteria + rollback procedure
- RUNBOOK_RUN_DATES.md gets a 662-dispatch-gate stub entry (denominator
/8 per existing runbook §5 convention)
dispatch_gate.py F7_MODE constant replaced with module-load read of
os.environ.get("PACT_DISPATCH_F7_MODE", "warn"). Allowed values:
- "warn" (default, advisory output, behavior unchanged)
- "deny" (future calibration upgrade — DENY when F7 conditions match)
- "shadow" (silent ALLOW; journals as WARN_SHADOWED for calibration
data collection without user-visible advisory)
Unknown values fall back to warn. README.md (plugin) gains a
Configuration section documenting the env-var.
4-file version dance:
- pact-plugin/.claude-plugin/plugin.json (authoritative)
- .claude-plugin/marketplace.json
- README.md (root) — plugin-cache path reference
- pact-plugin/README.md
`rg -n '4\.1\.2'` returns 0 hits.
Tests: 32 pass (test_hooks_json + test_dispatch_gate_smoke); pyright on
dispatch_gate.py: 0 errors. F7_MODE env-var sanity verified manually
(=shadow -> shadow; =bogus -> warn fallback).
Closes #662.
Adds two new test files and extends F20 frontmatter audit:
- tests/test_dispatch_gate.py (NEW, 51 parametrized tests) — F1, F2,
F3 (NFKC corpus + length-cap-64 boundary + 7 reserved tokens), F4,
F5 (mismatch + empty-source per architect §7(h)), F6, F7 (all 3
PACT_DISPATCH_F7_MODE modes warn|deny|shadow including journal-only
ALLOW), F14 (uniqueness), F15, F21 (subprocess + PYTHONSAFEPATH=1
fail-closed counter-test), F23 (journal emit on every verdict), F26
(5 credential patterns + JWT-shape with adjacent-string-literal-
concat to bypass pre-commit secret-scanner false-positives),
SOLO_EXEMPT carve-outs, non-pact-* pass-through, defensive
(malformed stdin / non-target tool), anti-sprawl invariant via
inspect introspection.
- tests/test_task_lifecycle_gate.py (NEW, 23 tests) — F8, F9, F10
(119s vs 121s SendMessage-recency boundary), F11, F12 (writeback
+ carve-outs for secretary + signal-task, recursion-marker
self-skip), F12-on-unresolvable-actor (encodes CURRENT skip
behavior with deviation-documenting test name; follow-up issue
post-merge for architect §5.3 reconciliation), F13 (6 missing-
required-field params + non-dict + F11/F13 disjointness), F21
(PostToolUse advisory fail-closed), anti-sprawl.
- tests/test_skills_structure.py (extended) — F20 parametrized audit
walking pact-plugin/agents/pact-*.md asserting `pact-agent-teams`
in skills frontmatter, F20_CARVE_OUT_FILES = {"pact-orchestrator"}
(orchestrator is --agent-loaded, not Agent-Teams-spawned).
Test cardinality: 7244 -> 7331 (+87 tests). 0 regressions. pyright
clean on new files (CLI; IDE-side stale-cache shows benign import
warnings that don't affect runtime or CI).
Smoke tests retained intact — subprocess+PYTHONSAFEPATH F21 mechanism
is unique there.
F22 fresh-session validation deferred to post-merge runbook
tests/runbooks/662-dispatch-gate.md per hooks-cannot-be-smoke-tested-
in-session discipline.
Auditor YELLOW notes addressed: (1) LOC overshoot — anti-sprawl
invariant verified via parametrized introspection of single
evaluate_dispatch / evaluate_lifecycle composition; no per-F-row
sprawl. (2) PACT_DISPATCH_F7_MODE — tri-state tested across all 3
modes.
Closes #662.
has_task_assigned read `~/.claude/teams/{team_name}/tasks/` but the
canonical task store is `~/.claude/tasks/{team_name}/` (per
shared/task_utils.py L49). On main this caused every legitimate pact-*
specialist dispatch to F6-DENY in production; the bug was masked
because tests/_seed_team wrote to the same wrong path.
Fix:
- shared/dispatch_helpers.py L130: path corrected to canonical store
- tests/_seed_team helpers in test_dispatch_gate.py and
test_dispatch_gate_smoke.py write tasks at the canonical path; team
config.json stays under teams/{team_name}/
- 3 new regression tests (test_dispatch_gate.py): canonical-only path
satisfies has_task_assigned; legacy-only path does not; cross-
references task_utils.py to lock the path against future drift
Counter-test cardinality verified per #638 discipline: temp-revert of
the path fix → 3/3 new tests fail; revert restored → 61/61 dispatch
tests pass.
Test cardinality: 7331 → 7334 (+3). Zero regressions. pyright clean on
changed files.
The bootstrap_gate.is_marker_set verifier docstring previously framed the SHA256-stamped marker contents as cryptographic provenance backed by "would-be secrets" the attacker cannot forge. That overstates the defense: all four signature inputs (session_id, plugin_root, plugin_version, marker_version) are readable from the same-user filesystem, so a same-user attacker with Python execution can recompute the digest. Rewritten to accurately describe the check as a marker-content fingerprint (not a MAC) that closes the trivial Bash-touch bypass and raises attacker effort + creates a detection surface. Also tightens the corresponding producer comment in commands/bootstrap.md so the human-facing description matches the verifier. No code change; documentation accuracy only. Full test suite unchanged at 7334 passed / 18 skipped.
Adjusts the plugin version from 4.2.0 to 4.1.3 across the four canonical version sites plus runbook references. The dispatch-protocol changes in this branch enforce a contract that was already documented in the orchestrator persona; the new gates complete an existing protocol's implementation rather than introduce a new user-facing capability. A patch bump matches the conservative read. Files updated: pact-plugin/.claude-plugin/plugin.json (authoritative), .claude-plugin/marketplace.json, README.md, pact-plugin/README.md, plus runbook prerequisites and the run-dates table.
Renames the user-facing dispatch-gate env-var that controls the inline-mission heuristic (long prompt + mission keywords + missing TaskList reference) to describe what the gate actually checks rather than carrying a planning-index label. Old: PACT_DISPATCH_F7_MODE New: PACT_DISPATCH_INLINE_MISSION_MODE Allowed values unchanged (warn|deny|shadow); default unchanged (warn); unknown-fallback unchanged (warn). Updated module docstring, README configuration table, and runbook references. Internal Python constant and source comments referencing the planning index remain pending in a follow-up purge along with the deny-message text, journal field, and remaining cross-surface cleanup. Test cardinality unchanged at 7334/18.
Replaces planning-index labels in user-facing dispatch-gate and
task-lifecycle-gate output with descriptions of what each check
actually verifies. Two surfaces touched:
1. Deny / advisory message strings (visible to the calling LLM via
permissionDecisionReason / additionalContext): each gate rule now
describes the violation behaviorally rather than naming a label.
Example shape change:
- before: "PACT dispatch_gate F3: name 'foo bar' violates ^[a-z0-9-]+$"
- after: "PACT dispatch_gate: name 'foo bar' must match
^[a-z0-9-]+$ (lowercase alphanumerics + hyphens)"
2. Journal event field renamed from `f_row` to `rule`, values changed
from labels to behavioral identifier strings:
- dispatch_gate: name_required, team_name_required, name_too_long,
name_invalid_regex, name_reserved_token, specialist_not_registered,
team_name_mismatch, team_name_unavailable, no_task_assigned,
long_inline_mission, name_not_unique, plugin_agents_missing.
Length / regex / reserved checks were a single label before;
they are now three separate rules.
- task_lifecycle_gate: teachback_addblocks_missing,
work_addblockedby_missing, completion_no_paired_send,
handoff_missing, self_completion, handoff_schema_invalid.
Lifecycle gate return type changed from list[str] (messages only) to
list[tuple[rule, message]] so the journal records both the
structured rule and the human-readable advisory.
Test fixtures updated: assertions on the field name, rule values, and
message substrings now match the behavioral phrasing.
Test cardinality unchanged at 7334 passed / 18 skipped.
sections in behavioral terms Replaces the planning-index labels still embedded in the dispatch-gate, task-lifecycle-gate, and bootstrap-marker code paths with names that describe what each piece does. Surfaces touched: - Module docstrings on the two new gate files describe each rule by what it checks (e.g., "long inline mission heuristic") rather than by an index label. - Inline comments and section dividers across the gate code, the bootstrap-marker producer / verifier, and the helpers module use behavioral phrasing. - Test names rewritten across six gate test files: every `test_f<n>_*` now describes the behavior under verification (e.g., `test_deny_when_name_empty`, `test_deny_when_name_invalid_regex`, `test_skips_when_actor_unresolvable`). Test docstrings carry any rationale that was previously baked into the name. - Runbook section headers and body prose rewritten: "Matcher registration fidelity (counter-test by mutation)", "Bootstrap-marker provenance check", "Module-load fail-closed", "Inline-mission advisory observation" replace the old index-labeled headings. - The runbook's "Run Dates" log entries reference the runbook by its filename only. Internal Python identifiers with no user-visible counterpart and the schema-version constant for the marker remain pending in a small follow-up. Test cardinality unchanged at 7334 passed / 18 skipped. pyright clean on the five gate source files.
Renames the three path-alignment regression tests from index-prefixed labels to descriptions of what each test verifies: - test_canonical_path_satisfies_no_task_assigned - test_legacy_path_alone_does_not_satisfy_no_task_assigned - test_canonical_path_aligns_with_task_utils Section header comment dropped its label and the docstring rationale reads as ahistorical commentary about an implementation that previously read the legacy path. 3 tests pass; full suite unchanged at 7334 / 18.
Removes the backwards-compat alias that was retained so older
monkeypatch sites continued to work. The five test fixture sites are
now updated to use the canonical INLINE_MISSION_MODE name directly,
and the alias line in dispatch_gate is gone.
Also drops a few historical provenance phrases left in test
docstrings ("R2-B1 / commit 5b12f80", "Pre-R2-B1", and a similar
phrase referencing a prior PR-cycle label in the path-alignment
fixture's docstring).
Test cardinality unchanged at 7334 / 18; pyright clean.
Renames the bootstrap-marker schema-version constant from a planning-index name to one that describes its role. Producer and verifier are updated in lockstep so the marker-stamp script in the bootstrap command and the content-fingerprint verification in the bootstrap-gate module remain bound by the integer schema value. Renames: - bootstrap_gate marker schema constant -> MARKER_SCHEMA_VERSION - bootstrap_gate marker size-cap constant -> _MARKER_MAX_BYTES Sites updated: marker-schema constant references in hooks/bootstrap_gate.py (declaration + docstring + two verifier sites), hooks/shared/dispatch_helpers.py (re-export comment + coupling cross-reference), commands/bootstrap.md (producer-coupling comment), tests/test_bootstrap_gate.py (import + assertion + docstring), and the runbook section bodies. Test cardinality unchanged at 7334 passed / 18 skipped. pyright clean.
Closes a confused-deputy bypass: the task-lifecycle gate's lead-only- completion advisory was suppressed when the owning agent's name matched the self-completion-exempt set. The dispatch gate did not reserve those same names, so a spawn could choose one as its name and defeat the central completion-authority invariant the gates exist to enforce. Reserved names now include `secretary` and `pact-secretary` (the two self-completion-exempt agents). A subset-invariant test mechanically prevents future drift: any addition to the exempt set without a matching addition to the reserved-name set will fail test_self_complete_exempt_agents_are_all_reserved. Two follow-ups folded in: - Smoke helper return-type narrowed to match the comprehensive helper (1-line `int()` coercion fix surfaced by pyright). - Two surviving s-prefixed smoke test names renamed to behavioral identifiers per the no-planning-artifacts rule. Test cardinality 7334 -> 7337 (+3 = two reserved-name parametrize cases + the subset-invariant test). Pyright clean across the gate sources and smoke test.
redaction, and consolidate sources of truth Several gate-correctness improvements that close real defects without expanding architectural scope: - Spawn-name regex now requires at least one alphanumeric character, rejecting degenerate forms (single hyphen, only hyphens, leading or trailing hyphen) that the previous looser pattern accepted. - Session team-name normalization happens once at gate entry (strip + lowercase) and the normalized form flows through every rule. The earlier code lowercased only at the session-equality comparison, so the registry / member-uniqueness / task-assignment lookups could see a different casing than the comparison did, producing inconsistent verdicts for mixed-case team-name input. - The lifecycle gate's local copy of the self-completion-exempt agent set has been removed; the canonical set is now imported from the shared intentional-wait module. This eliminates a drift surface that could re-open the dispatch / lifecycle bypass closed by the earlier reserved-name extension and the cross-module subset invariant test. - The has_task_assigned helper now delegates path construction and per-file reading to the canonical task-utils helper, removing the path-layout duplication that previously caused a divergence between the helper and the harness. - Journal-write redaction now covers Anthropic api keys (sk-ant- and api03 variants), GitHub OAuth / user / server / refresh tokens (gho_, ghu_, ghs_, ghr_), Google api keys (AIza prefix), and PEM private-key blocks (multi-line non-greedy). - The subset-assertion test docstring documents the categorical pattern: any future privilege class keyed on owner-name must live in a shared module and carry its own subset assertion against the reserved-name set, so the same defect class cannot recur. - Comprehensive test for the missing-handoff rule on lifecycle completion (parametrized over absent, empty-dict, and null shapes, with paired assertions that the schema-invalid rule does not also fire — pinning the disjointness invariant). Test cardinality: 7337 -> 7352 (+15). Pyright clean across the gate sources.
This was referenced May 7, 2026
michael-wojcik
added a commit
that referenced
this pull request
May 10, 2026
… for BOTH flag-walks Closes the authorization-mismatch bypass surfaced in PR #697 review (F-5). The original PR's `_GH_PR_NUMBER_RE` used the broad `_GH_PREFIX` (with `_GH_GLOBAL_FLAGS`) for the pre-subcommand walk, which re-anchored at the SECOND `gh pr merge {N}` occurrence in commands containing heredoc bodies with embedded merge-command literals. Concrete attack scenario: gh pr merge 663 --body "$(cat <<EOF Fixes #999. See related: gh pr merge 999 --admin EOF )" --squash OLD: regex captured 999 (from heredoc body) → AskUserQuestion prompted operator for PR #999 → operator approves (legitimate cross-link in body) → token written for #999 → PostToolUse consumes #999 token → actual command merges #663. Operator authorized the wrong PR. NEW: regex uses `_GH_FLAG_TOKENS` for BOTH the pre-subcommand walk AND the post-subcommand walk. `_GH_FLAG_TOKENS` matches only flag-shaped tokens (`-x`, `--long`, optionally `--flag value`), so it cannot re-anchor at heredoc-embedded `gh pr` literals. The first `gh pr merge` match wins; subsequent occurrences in body content are ignored. Test changes: - `test_heredoc_body_with_embedded_gh_pr_merge` converted from strict-xfail to passing test in new TestGH_PR_NumberRE_AuthorizationBypassFixed class - New `test_authorization_mismatch_attack` pins the end-to-end attack shape that this fix prevents - Other xfail (`test_branch_name_with_digit_prefix_suffix_match`, the 7352-tests case) remains xfail-strict — different root cause (Python `\b` boundary semantics at digit-to-hyphen). Tracked separately for follow-up. Verification: 13 tests in test_merge_guard_pre.py — 12 passing + 1 xfailed (was 11 + 2). Full suite: 7573 passed (up from 7561). Empirical probe script /tmp/probe_s1.py confirms heredoc case now captures 663 not 999; all 9 prior TRUE GAINS still pass.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #662. Single PR, 5 atomic commits, hardens the PACT specialist-dispatch protocol against the silent-fail-open class that produced #662 itself.
Plugin version: 4.1.2 → 4.2.0 (minor — new gate capabilities).
What this fixes
The orchestrator persona authoritatively documented
Task(...)as the specialist-spawn tool, but Claude Code's actual platform tool isAgent. When a--agent-flag session reads the persona, finds noTasktool in its surface, and falls back toAgentwithoutname=/team_name=, every spawned agent silently runs without Agent-Teams coordination — and the orchestrator rationalizes the missing tools as "degraded mode" rather than treating it as a HARD STOP.This PR closes 27 silent-failure paths (F1-F27) plus the bootstrap-marker bypass class.
Cat-1 vs Cat-2 rename discipline
Task→Agentacross persona, commands, skills, protocols, hooks.json L66/L187 matchers, bootstrap_gate_BLOCKED_TOOLS. Pre-edit: 4 hits in agents/commands/skills/protocols. Post-edit: 0 hits.TaskCreate/TaskUpdate/TaskList/TaskGet/TaskStop/TaskOutput— these are NOT spawn-tool references. Baseline 551 → post-edit 596 (grew from new test code; zero Cat-2 names corrupted). hooks.json L196TaskCreate|TaskUpdatematcher UNCHANGED — regression-prevention assertion in test_hooks_json.py.transcript_parser.py+patterns.pyparametrized overTask|Agentto read historical session transcripts. Dispatch code itself is clean rename (no dual-naming).TASK_TOOL_PATTERNrenamed toSPAWN_TOOL_PATTERN.bootstrap_gate hardening
bootstrap.mdnow writes JSON marker{v, sid, sig=SHA256(session_id|plugin_root|plugin_version|version)}.is_marker_setextends with size cap + JSON parse + key-set + version match + sid==session_dir.name +hmac.compare_digestsignature verification. Closes theBash(\"touch bootstrap-complete\")bypass surface (an attacker LLM could previously bypassbootstrap_gateby directly creating an empty marker; F24 produces an empty file → verification fails → gate continues to deny).bootstrap_gate.py(and siblingbootstrap_prompt_gate.py) mirrors PR fix: add hookEventName to hookSpecificOutput emit sites (#658) #660_emit_load_failure_denypattern — stdlib-only helper before wrapped imports; cross-package imports intry/except BaseException; module-load failure → exit 2 +permissionDecision=deny+ structuredhookEventName=PreToolUse. Same defect class as merge_guard_pre.py emits hookSpecificOutput without hookEventName — deny path silently fails open #658.4c286c1f's incorrect rename direction. Future reviewers reading these lines now see accurate spawn-tool semantics.New gates
dispatch_gate.py (PreToolUse, matcher='Agent')
Single
evaluate_dispatchcomposition (anti-sprawl, ~250 LOC budget; verified by parametrized introspection test that no per-F-row functions snuck in):name=empty → DENYteam_name=empty → DENY (catches adversarial team_name='' before F5)^[a-z0-9-]+$→ length cap 64 → reserved-token ban {team-lead, lead, user, external, peer, unknown, solo} → DENY (marker-spoofing prevention)subagent_typenot in cached FS-glob ofagents/pact-*.md→ DENYteam_namedoesn't matchpact_context.get_team_name()(or empty source) → DENYname=already live in teamconfig.jsonmembers[]→ DENY (uniqueness)owner==name→ DENYPACT_DISPATCH_F7_MODEwarn|deny|shadow)dispatch_decisioneventtask_lifecycle_gate.py (PostToolUse, matcher='TaskCreate|TaskUpdate')
Single
evaluate_lifecyclecomposition; PostToolUse cannot DENY, all output is advisoryadditionalContext:addBlocks=[B_id]→ advisoryaddBlockedBy=[A_id]→ advisorymetadata.handoff→ advisoryis_self_complete_exempt()carve-outs) → advisory +metadata.completion_disputed=truewriteback to disk +metadata.gate_writeback=truerecursion-marker self-skipmetadata.handoffschema validation (required fields) — disjoint from F11 (F13 fires only when payload exists but malformed)lifecycle_decisioneventF12 actor identity uses
trustworthy_actor_name()fromshared/dispatch_helpers.py— agent_id-derived only (harness-trustworthy paths 2 + 3 per resolve_agent_name 5-step chain); does NOT fall back to teammate-spoofable tool_input fields.Persona body additions
TaskList/SendMessage/TaskUpdatenot loaded → HARD STOP, dispatch protocol violation, NOT degraded mode.F22 post-merge validation runbook
pact-plugin/tests/runbooks/662-dispatch-gate.md(NEW) documents:Bash(\"touch bootstrap-complete\")produces empty file → F24 verification fails → gate continues to denyRUNBOOK_RUN_DATES.mdlog entry (denominator /8)Per pinned memory: hooks cannot be smoke-tested in-session (loaded at session start, not on file change). Validation is a manual post-merge step in a fresh session.
Tests
sys.modulespop only the module under test, nevershared.*; snapshot+restore on teardown.Test plan
tests/runbooks/662-dispatch-gate.md— REQUIRED before declaring 4.2.0 production-stableArchitectural deviation flagged for follow-up
Backend-coder-3 implemented F12-on-unresolvable-actor as skip (no advisory) when
trustworthy_actor_namereturns None; architect §5.3 specified advisory-emit. Encoded intest_f12_skips_when_actor_unresolvable_documents_architect_5_3_deviationfor visibility. Follow-up issue to be filed post-merge.Cross-references
_emit_load_failure_denypattern)4c286c1f("fix(security): rename Agent→Task in bootstrap_gate._BLOCKED_TOOLS", 2026-05-05) — corrected by 585bd20