Skip to content

Dispatch-protocol hardening: rename Task→Agent + dispatch_gate + task_lifecycle_gate + bootstrap_gate F24/F25 (#662)#663

Merged
michael-wojcik merged 16 commits into
mainfrom
662-dispatch-protocol
May 7, 2026
Merged

Dispatch-protocol hardening: rename Task→Agent + dispatch_gate + task_lifecycle_gate + bootstrap_gate F24/F25 (#662)#663
michael-wojcik merged 16 commits into
mainfrom
662-dispatch-protocol

Conversation

@michael-wojcik
Copy link
Copy Markdown
Collaborator

Summary

Closes #662. Single PR, 5 atomic commits, hardens the PACT specialist-dispatch protocol against the silent-fail-open class that produced #662 itself.

SHA Subject
585bd20 fix(dispatch): correct 4c286c1 rename + harden bootstrap_gate (F24/F25)
cff3697 feat(dispatch_gate): PreToolUse Agent gate (F1-F7+F14+F15+F21+F23+F26)
bfd8009 feat(task_lifecycle_gate): PostToolUse TaskCreate|TaskUpdate gate (F8-F13+F23)
13e4662 docs+chore: F22 runbook + PACT_DISPATCH_F7_MODE shadow-mode + v4.2.0
c6f95d6 test: comprehensive coverage (+87 tests)

Plugin version: 4.1.2 → 4.2.0 (minor — new gate capabilities).

What this fixes

The orchestrator persona authoritatively documented Task(...) as the specialist-spawn tool, but Claude Code's actual platform tool is Agent. When a --agent-flag session reads the persona, finds no Task tool in its surface, and falls back to Agent without name=/team_name=, every spawned agent silently runs without Agent-Teams coordination — and the orchestrator rationalizes the missing tools as "degraded mode" rather than treating it as a HARD STOP.

This PR closes 27 silent-failure paths (F1-F27) plus the bootstrap-marker bypass class.

Cat-1 vs Cat-2 rename discipline

  • Cat-1 (renamed): spawn-tool token Task→Agent across persona, commands, skills, protocols, hooks.json L66/L187 matchers, bootstrap_gate _BLOCKED_TOOLS. Pre-edit: 4 hits in agents/commands/skills/protocols. Post-edit: 0 hits.
  • Cat-2 (preserved): task-management tools TaskCreate/TaskUpdate/TaskList/TaskGet/TaskStop/TaskOutput — these are NOT spawn-tool references. Baseline 551 → post-edit 596 (grew from new test code; zero Cat-2 names corrupted). hooks.json L196 TaskCreate|TaskUpdate matcher UNCHANGED — regression-prevention assertion in test_hooks_json.py.
  • Refresh-system carve-out: transcript_parser.py + patterns.py parametrized over Task|Agent to read historical session transcripts. Dispatch code itself is clean rename (no dual-naming). TASK_TOOL_PATTERN renamed to SPAWN_TOOL_PATTERN.

bootstrap_gate hardening

  • F24 marker provenance: bootstrap.md now writes JSON marker {v, sid, sig=SHA256(session_id|plugin_root|plugin_version|version)}. is_marker_set extends with size cap + JSON parse + key-set + version match + sid==session_dir.name + hmac.compare_digest signature verification. Closes the Bash(\"touch bootstrap-complete\") bypass surface (an attacker LLM could previously bypass bootstrap_gate by directly creating an empty marker; F24 produces an empty file → verification fails → gate continues to deny).
  • F25 fail-closed wrapper: retrofit on bootstrap_gate.py (and sibling bootstrap_prompt_gate.py) mirrors PR fix: add hookEventName to hookSpecificOutput emit sites (#658) #660 _emit_load_failure_deny pattern — stdlib-only helper before wrapped imports; cross-package imports in try/except BaseException; module-load failure → exit 2 + permissionDecision=deny + structured hookEventName=PreToolUse. Same defect class as merge_guard_pre.py emits hookSpecificOutput without hookEventName — deny path silently fails open #658.
  • Docstring rewrite: bootstrap_gate.py L19-24 + L57-62 docstring rewritten — the load-bearing propagation vector that misled commit 4c286c1f's incorrect rename direction. Future reviewers reading these lines now see accurate spawn-tool semantics.

New gates

dispatch_gate.py (PreToolUse, matcher='Agent')

Single evaluate_dispatch composition (anti-sprawl, ~250 LOC budget; verified by parametrized introspection test that no per-F-row functions snuck in):

F-row Behavior
F1 name= empty → DENY
F2 team_name= empty → DENY (catches adversarial team_name='' before F5)
F3 NFKC-normalize → regex ^[a-z0-9-]+$ → length cap 64 → reserved-token ban {team-lead, lead, user, external, peer, unknown, solo} → DENY (marker-spoofing prevention)
F4 subagent_type not in cached FS-glob of agents/pact-*.md → DENY
F5 team_name doesn't match pact_context.get_team_name() (or empty source) → DENY
F14 name= already live in team config.json members[] → DENY (uniqueness)
F15 team config.json doesn't exist → DENY
F6 no Task assigned to owner==name → DENY
F7 prompt > 800 chars + mission-keywords + no TaskList reference → WARN (configurable: PACT_DISPATCH_F7_MODE warn|deny|shadow)
Carve-outs SOLO_EXEMPT {general-purpose, Explore, Plan} and non-pact-* subagent_type → ALLOW
F23 every gate decision (ALLOW + WARN + DENY) emits journal dispatch_decision event
F26 prompt redaction at journal-write boundary strips sk-/xoxb-/ghp_/AKIA + JWT-shape tokens
F21 module-load failure → fail-closed deny

task_lifecycle_gate.py (PostToolUse, matcher='TaskCreate|TaskUpdate')

Single evaluate_lifecycle composition; PostToolUse cannot DENY, all output is advisory additionalContext:

F-row Behavior
F8 TEACHBACK Task without addBlocks=[B_id] → advisory
F9 pact-* owned non-TEACHBACK Task without addBlockedBy=[A_id] → advisory
F10 team-lead marks pact-*-owned task completed without paired SendMessage to that owner within 120s → advisory
F11 pact-*-owned Task B completed with empty/missing metadata.handoff → advisory
F12 teammate self-completes task (and not in is_self_complete_exempt() carve-outs) → advisory + metadata.completion_disputed=true writeback to disk + metadata.gate_writeback=true recursion-marker self-skip
F13 metadata.handoff schema validation (required fields) — disjoint from F11 (F13 fires only when payload exists but malformed)
F23 journal lifecycle_decision event
F21 module-load failure → fail-closed advisory (PostToolUse cannot DENY; exit 0)

F12 actor identity uses trustworthy_actor_name() from shared/dispatch_helpers.py — agent_id-derived only (harness-trustworthy paths 2 + 3 per resolve_agent_name 5-step chain); does NOT fall back to teammate-spoofable tool_input fields.

Persona body additions

  • First-spawn-verification step in bootstrap skill (verify team membership after first specialist dispatch).
  • HARD-STOP framing for "missing tools" reports: when a teammate reports TaskList/SendMessage/TaskUpdate not loaded → HARD STOP, dispatch protocol violation, NOT degraded mode.
  • WARN-means-STOP-and-re-dispatch reinforcement near the canonical dispatch form so the calling LLM doesn't rationalize past F7 advisories.

F22 post-merge validation runbook

pact-plugin/tests/runbooks/662-dispatch-gate.md (NEW) documents:

  • Matcher-mutation counter-test (mutate hooks.json matcher to 'WrongName' → gate doesn't fire → revert; proves matcher is load-bearing — same defect class as merge_guard_pre.py emits hookSpecificOutput without hookEventName — deny path silently fails open #658)
  • F18 Bash-marker-bypass closure: Bash(\"touch bootstrap-complete\") produces empty file → F24 verification fails → gate continues to deny
  • F7 advisory injection empirical observation (informs future warn → deny upgrade decision)
  • F25 sabotaged-import fail-closed counter-test
  • Pass/fail criteria + rollback procedure
  • RUNBOOK_RUN_DATES.md log entry (denominator /8)

Per pinned memory: hooks cannot be smoke-tested in-session (loaded at session start, not on file change). Validation is a manual post-merge step in a fresh session.

Tests

Test plan

  • All existing tests pass (7331/7331)
  • pyright clean on new code
  • Cat-2 preservation grep audit (596 ≥ 551 baseline)
  • hooks.json L196 unchanged (regression-prevention assertion)
  • Post-merge fresh-session validation per tests/runbooks/662-dispatch-gate.md — REQUIRED before declaring 4.2.0 production-stable

Architectural deviation flagged for follow-up

Backend-coder-3 implemented F12-on-unresolvable-actor as skip (no advisory) when trustworthy_actor_name returns None; architect §5.3 specified advisory-emit. Encoded in test_f12_skips_when_actor_unresolvable_documents_architect_5_3_deviation for visibility. Follow-up issue to be filed post-merge.

Cross-references

Spawn-tool token in Claude Code is `Agent`, not `Task`. Commit 4c286c1
swapped the rename direction in bootstrap_gate._BLOCKED_TOOLS
(Agent→Task) based on misread cross-evidence; this commit restores
`Agent`.

Cat-1 rename Task→Agent across persona, commands, skills, protocols, and
hooks.json L66/L187 spawn-tool matchers. L196 `TaskCreate|TaskUpdate`
preserved (Cat-2 task-management tools). Cat-2 baseline ≥551 verified
by new test_hooks_json regression-prevention assertions.

bootstrap_gate.py changes:
- _BLOCKED_TOOLS swapped Task→Agent
- L19-24 + L57-62 docstring rewritten (the propagation vector that
  misled 4c286c1)
- F25 fail-closed wrapper retrofit: stdlib-only _emit_load_failure_deny
  defined before wrapped imports; mirrors PR #660 merge_guard_pre.py
- F24 marker provenance: is_marker_set extends with size cap, JSON
  parse, key-set, version match, sid==session_dir.name, and
  hmac.compare_digest signature verification — closes the
  Bash("touch bootstrap-complete") bypass

bootstrap_prompt_gate.py: F25 sibling retrofit; UserPromptSubmit cannot
DENY so emits advisory additionalContext on load failure.

commands/bootstrap.md: now produces F24-stamped marker JSON
{v, sid, sig=SHA256(session_id|plugin_root|plugin_version|version)}.

Refresh transcript-parser parametrized over Task|Agent (carve-out from
clean rename — historical session transcripts contain Task literals).
TASK_TOOL_PATTERN renamed to SPAWN_TOOL_PATTERN.

Persona body (pact-orchestrator.md): first-spawn-verification step;
HARD-STOP framing for "missing tools" reports (no degraded-mode
rationalization); WARN-means-STOP-and-re-dispatch reinforcement.

Tests: 113 passed in smoke; 7232 passed in full suite. F24 cardinality
8 cases; F25 fail-closed counter-test. F20 frontmatter audit with
pact-orchestrator in CARVE_OUT_FILES (--agent-loaded, not Agent-Teams-
spawned).
Adds dispatch_gate.py — a PreToolUse hook on Agent spawn that enforces
F1-F7, F14, F15, F21, F23, F26 against pact-* specialist dispatches.
Closes the silent fail-open class where the orchestrator persona's
dispatch instructions could diverge from actual spawn-tool surface and
degrade into "missing tools, proceeding anyway" rationalization.

F-row enforcement (single evaluate_dispatch composition, anti-sprawl):
- F1: name= empty -> DENY
- F2: team_name= empty -> DENY (catches adversarial team_name='' before
  F5)
- F3: NFKC-normalize -> regex ^[a-z0-9-]+$ -> length cap 64 ->
  reserved-token ban {team-lead, lead, user, external, peer, unknown,
  solo} -> DENY (marker-spoofing prevention)
- F4: subagent_type not in cached FS-glob of agents/pact-*.md -> DENY
- F5: team_name doesn't match pact_context.get_team_name() (or empty
  source) -> DENY
- F14: name= already live in team config.json members[] -> DENY
- F15: team_name's config.json doesn't exist -> DENY
- F6: no Task assigned to owner==name in team task files -> DENY
- F7: prompt > 800 chars + mission-keywords + no TaskList reference ->
  WARN (advisory; persona body reinforces "WARN means STOP and
  re-dispatch correctly")
- Carve-outs: SOLO_EXEMPT {general-purpose, Explore, Plan} and
  non-pact-* subagent_type -> ALLOW

F21 fail-closed wrapper mirrors PR #660 _emit_load_failure_deny:
stdlib-only helper before wrapped imports; cross-package imports in
try/except BaseException; exit 2 + permissionDecision=deny on any
module-load failure.

F23 emits a session-journal dispatch_decision event on every gate
verdict so denies are auditable, not visible only to the calling LLM.

F26 prompt redaction strips sk-/xoxb-/ghp_/AKIA literal-prefix tokens
+ JWT-shape regex before append_event.

shared/dispatch_helpers.py extracts helpers reused by
task_lifecycle_gate (Commit 3): is_registered_pact_specialist,
has_task_assigned, trustworthy_actor_name, SOLO_EXEMPT,
F24_MARKER_VERSION.

Smoke tests (7): happy-path ALLOW, F1/F2/F3 DENY, SOLO_EXEMPT carve-out,
F21 fail-closed counter-test via subprocess + PYTHONSAFEPATH=1 +
sabotaged dispatch_helpers, F26 redaction verification.

Test cardinality: 7 smoke pass; 7245 full-suite pass / 17 skip / 0 fail.
Cat-2 token preservation 592 (>=551 baseline).
Adds task_lifecycle_gate.py — a PostToolUse hook on TaskCreate|TaskUpdate
that emits advisory output for F8-F13 violations and writes back
metadata.completion_disputed=true on F12 self-completions, with a
metadata.gate_writeback recursion-marker self-skip.

PostToolUse cannot DENY; output is hookSpecificOutput.additionalContext
advisory; exit 0 always.

F-row enforcement:
- F8: TEACHBACK Task created without addBlocks=[B_id] -> advisory
- F9: pact-* owned non-TEACHBACK Task created without addBlockedBy ->
  advisory
- F10: team-lead marks pact-*-owned task completed without paired
  SendMessage to that owner within last 120s -> advisory
- F11: pact-*-owned Task B completed with empty/missing
  metadata.handoff -> advisory
- F12: pact-*-owned task transitions to completed AND actor (via
  trustworthy_actor_name from agent_id, harness-trustworthy paths only)
  is the owner AND owner not in is_self_complete_exempt() carve-outs ->
  advisory + direct FS writeback metadata.completion_disputed=true,
  gate_writeback=true via atomic .tmp+os.replace
- F13: completion-time metadata.handoff schema validation (required
  fields produced, decisions, reasoning_chain, uncertainty, integration,
  open_questions); disjoint from F11 (F13 fires only when payload exists
  but malformed)

Recursion guard at evaluate_lifecycle entry: tool_input.metadata.
gate_writeback=True -> silent skip, prevents F12 self-trigger on the
gate's own writeback.

F12 actor identity uses trustworthy_actor_name from
shared/dispatch_helpers.py — agent_id-derived only (harness-trustworthy
paths 2 and 3 per PREPARE inventory); does NOT fall back to
teammate-spoofable tool_input fields.

F21 fail-closed wrapper mirrors bootstrap_gate.py pattern: stdlib-only
_emit_load_failure_advisory helper before wrapped imports; advisory
output (cannot DENY) on module-load failure; exit 0.

F23 emits session-journal lifecycle_decision events for advisories.

Hook co-located with wake_lifecycle_emitter under existing
PostToolUse matcher='TaskCreate|TaskUpdate' (architect §13 Q1
single-matcher-two-hooks). wake_lifecycle_emitter fields unchanged.

Smoke tests (6): F8/F9/F11+F13-disjoint advisories, F12 writeback to
disk verification, recursion-marker self-skip counter-test, F21
fail-closed counter-test.

Test cardinality: 6 smoke pass; 7238 full-suite pass / 17 skip / 0 fail
(now 7245+6 ~= 7251 with both new smoke suites).
Adds the post-merge fresh-session validation runbook for the
dispatch-protocol hardening, makes F7 mode runtime-configurable via
PACT_DISPATCH_F7_MODE, and bumps the plugin version 4.1.2 → 4.2.0.

Runbook (tests/runbooks/662-dispatch-gate.md) documents:
- F22 matcher mutation counter-test (mutate hooks.json matcher to
  'WrongName' -> gate doesn't fire -> revert; proves matcher is
  load-bearing)
- F18 Bash-marker-bypass closure: Bash("touch bootstrap-complete")
  produces empty file -> F24 marker-provenance verification rejects ->
  bootstrap_gate continues to deny
- F7 advisory injection empirical observation (informs future warn ->
  deny upgrade decision)
- F25 sabotaged-import fail-closed counter-test
- Pass/fail criteria + rollback procedure
- RUNBOOK_RUN_DATES.md gets a 662-dispatch-gate stub entry (denominator
  /8 per existing runbook §5 convention)

dispatch_gate.py F7_MODE constant replaced with module-load read of
os.environ.get("PACT_DISPATCH_F7_MODE", "warn"). Allowed values:
- "warn"   (default, advisory output, behavior unchanged)
- "deny"   (future calibration upgrade — DENY when F7 conditions match)
- "shadow" (silent ALLOW; journals as WARN_SHADOWED for calibration
            data collection without user-visible advisory)
Unknown values fall back to warn. README.md (plugin) gains a
Configuration section documenting the env-var.

4-file version dance:
- pact-plugin/.claude-plugin/plugin.json (authoritative)
- .claude-plugin/marketplace.json
- README.md (root) — plugin-cache path reference
- pact-plugin/README.md

`rg -n '4\.1\.2'` returns 0 hits.

Tests: 32 pass (test_hooks_json + test_dispatch_gate_smoke); pyright on
dispatch_gate.py: 0 errors. F7_MODE env-var sanity verified manually
(=shadow -> shadow; =bogus -> warn fallback).

Closes #662.
Adds two new test files and extends F20 frontmatter audit:
- tests/test_dispatch_gate.py (NEW, 51 parametrized tests) — F1, F2,
  F3 (NFKC corpus + length-cap-64 boundary + 7 reserved tokens), F4,
  F5 (mismatch + empty-source per architect §7(h)), F6, F7 (all 3
  PACT_DISPATCH_F7_MODE modes warn|deny|shadow including journal-only
  ALLOW), F14 (uniqueness), F15, F21 (subprocess + PYTHONSAFEPATH=1
  fail-closed counter-test), F23 (journal emit on every verdict), F26
  (5 credential patterns + JWT-shape with adjacent-string-literal-
  concat to bypass pre-commit secret-scanner false-positives),
  SOLO_EXEMPT carve-outs, non-pact-* pass-through, defensive
  (malformed stdin / non-target tool), anti-sprawl invariant via
  inspect introspection.
- tests/test_task_lifecycle_gate.py (NEW, 23 tests) — F8, F9, F10
  (119s vs 121s SendMessage-recency boundary), F11, F12 (writeback
  + carve-outs for secretary + signal-task, recursion-marker
  self-skip), F12-on-unresolvable-actor (encodes CURRENT skip
  behavior with deviation-documenting test name; follow-up issue
  post-merge for architect §5.3 reconciliation), F13 (6 missing-
  required-field params + non-dict + F11/F13 disjointness), F21
  (PostToolUse advisory fail-closed), anti-sprawl.
- tests/test_skills_structure.py (extended) — F20 parametrized audit
  walking pact-plugin/agents/pact-*.md asserting `pact-agent-teams`
  in skills frontmatter, F20_CARVE_OUT_FILES = {"pact-orchestrator"}
  (orchestrator is --agent-loaded, not Agent-Teams-spawned).

Test cardinality: 7244 -> 7331 (+87 tests). 0 regressions. pyright
clean on new files (CLI; IDE-side stale-cache shows benign import
warnings that don't affect runtime or CI).

Smoke tests retained intact — subprocess+PYTHONSAFEPATH F21 mechanism
is unique there.

F22 fresh-session validation deferred to post-merge runbook
tests/runbooks/662-dispatch-gate.md per hooks-cannot-be-smoke-tested-
in-session discipline.

Auditor YELLOW notes addressed: (1) LOC overshoot — anti-sprawl
invariant verified via parametrized introspection of single
evaluate_dispatch / evaluate_lifecycle composition; no per-F-row
sprawl. (2) PACT_DISPATCH_F7_MODE — tri-state tested across all 3
modes.

Closes #662.
has_task_assigned read `~/.claude/teams/{team_name}/tasks/` but the
canonical task store is `~/.claude/tasks/{team_name}/` (per
shared/task_utils.py L49). On main this caused every legitimate pact-*
specialist dispatch to F6-DENY in production; the bug was masked
because tests/_seed_team wrote to the same wrong path.

Fix:
- shared/dispatch_helpers.py L130: path corrected to canonical store
- tests/_seed_team helpers in test_dispatch_gate.py and
  test_dispatch_gate_smoke.py write tasks at the canonical path; team
  config.json stays under teams/{team_name}/
- 3 new regression tests (test_dispatch_gate.py): canonical-only path
  satisfies has_task_assigned; legacy-only path does not; cross-
  references task_utils.py to lock the path against future drift

Counter-test cardinality verified per #638 discipline: temp-revert of
the path fix → 3/3 new tests fail; revert restored → 61/61 dispatch
tests pass.

Test cardinality: 7331 → 7334 (+3). Zero regressions. pyright clean on
changed files.
The bootstrap_gate.is_marker_set verifier docstring previously framed
the SHA256-stamped marker contents as cryptographic provenance backed
by "would-be secrets" the attacker cannot forge. That overstates the
defense: all four signature inputs (session_id, plugin_root,
plugin_version, marker_version) are readable from the same-user
filesystem, so a same-user attacker with Python execution can recompute
the digest.

Rewritten to accurately describe the check as a marker-content
fingerprint (not a MAC) that closes the trivial Bash-touch bypass and
raises attacker effort + creates a detection surface.

Also tightens the corresponding producer comment in commands/bootstrap.md
so the human-facing description matches the verifier.

No code change; documentation accuracy only. Full test suite unchanged
at 7334 passed / 18 skipped.
Adjusts the plugin version from 4.2.0 to 4.1.3 across the four canonical
version sites plus runbook references. The dispatch-protocol changes in
this branch enforce a contract that was already documented in the
orchestrator persona; the new gates complete an existing protocol's
implementation rather than introduce a new user-facing capability. A
patch bump matches the conservative read.

Files updated: pact-plugin/.claude-plugin/plugin.json (authoritative),
.claude-plugin/marketplace.json, README.md, pact-plugin/README.md,
plus runbook prerequisites and the run-dates table.
Renames the user-facing dispatch-gate env-var that controls the
inline-mission heuristic (long prompt + mission keywords + missing
TaskList reference) to describe what the gate actually checks rather
than carrying a planning-index label.

Old: PACT_DISPATCH_F7_MODE
New: PACT_DISPATCH_INLINE_MISSION_MODE

Allowed values unchanged (warn|deny|shadow); default unchanged (warn);
unknown-fallback unchanged (warn). Updated module docstring,
README configuration table, and runbook references.

Internal Python constant and source comments referencing the planning
index remain pending in a follow-up purge along with the deny-message
text, journal field, and remaining cross-surface cleanup.

Test cardinality unchanged at 7334/18.
Replaces planning-index labels in user-facing dispatch-gate and
task-lifecycle-gate output with descriptions of what each check
actually verifies. Two surfaces touched:

1. Deny / advisory message strings (visible to the calling LLM via
   permissionDecisionReason / additionalContext): each gate rule now
   describes the violation behaviorally rather than naming a label.
   Example shape change:
   - before: "PACT dispatch_gate F3: name 'foo bar' violates ^[a-z0-9-]+$"
   - after:  "PACT dispatch_gate: name 'foo bar' must match
              ^[a-z0-9-]+$ (lowercase alphanumerics + hyphens)"

2. Journal event field renamed from `f_row` to `rule`, values changed
   from labels to behavioral identifier strings:
   - dispatch_gate: name_required, team_name_required, name_too_long,
     name_invalid_regex, name_reserved_token, specialist_not_registered,
     team_name_mismatch, team_name_unavailable, no_task_assigned,
     long_inline_mission, name_not_unique, plugin_agents_missing.
     Length / regex / reserved checks were a single label before;
     they are now three separate rules.
   - task_lifecycle_gate: teachback_addblocks_missing,
     work_addblockedby_missing, completion_no_paired_send,
     handoff_missing, self_completion, handoff_schema_invalid.
   Lifecycle gate return type changed from list[str] (messages only) to
   list[tuple[rule, message]] so the journal records both the
   structured rule and the human-readable advisory.

Test fixtures updated: assertions on the field name, rule values, and
message substrings now match the behavioral phrasing.

Test cardinality unchanged at 7334 passed / 18 skipped.
sections in behavioral terms

Replaces the planning-index labels still embedded in the dispatch-gate,
task-lifecycle-gate, and bootstrap-marker code paths with names that
describe what each piece does. Surfaces touched:

- Module docstrings on the two new gate files describe each rule by
  what it checks (e.g., "long inline mission heuristic") rather than
  by an index label.
- Inline comments and section dividers across the gate code, the
  bootstrap-marker producer / verifier, and the helpers module use
  behavioral phrasing.
- Test names rewritten across six gate test files: every
  `test_f<n>_*` now describes the behavior under verification (e.g.,
  `test_deny_when_name_empty`, `test_deny_when_name_invalid_regex`,
  `test_skips_when_actor_unresolvable`). Test docstrings carry any
  rationale that was previously baked into the name.
- Runbook section headers and body prose rewritten:
  "Matcher registration fidelity (counter-test by mutation)",
  "Bootstrap-marker provenance check", "Module-load fail-closed",
  "Inline-mission advisory observation" replace the old
  index-labeled headings.
- The runbook's "Run Dates" log entries reference the runbook by its
  filename only.

Internal Python identifiers with no user-visible counterpart and the
schema-version constant for the marker remain pending in a small
follow-up.

Test cardinality unchanged at 7334 passed / 18 skipped. pyright clean
on the five gate source files.
Renames the three path-alignment regression tests from index-prefixed
labels to descriptions of what each test verifies:
- test_canonical_path_satisfies_no_task_assigned
- test_legacy_path_alone_does_not_satisfy_no_task_assigned
- test_canonical_path_aligns_with_task_utils

Section header comment dropped its label and the docstring rationale
reads as ahistorical commentary about an implementation that previously
read the legacy path.

3 tests pass; full suite unchanged at 7334 / 18.
Removes the backwards-compat alias that was retained so older
monkeypatch sites continued to work. The five test fixture sites are
now updated to use the canonical INLINE_MISSION_MODE name directly,
and the alias line in dispatch_gate is gone.

Also drops a few historical provenance phrases left in test
docstrings ("R2-B1 / commit 5b12f80", "Pre-R2-B1", and a similar
phrase referencing a prior PR-cycle label in the path-alignment
fixture's docstring).

Test cardinality unchanged at 7334 / 18; pyright clean.
Renames the bootstrap-marker schema-version constant from a
planning-index name to one that describes its role. Producer and
verifier are updated in lockstep so the marker-stamp script in the
bootstrap command and the content-fingerprint verification in the
bootstrap-gate module remain bound by the integer schema value.

Renames:
- bootstrap_gate marker schema constant -> MARKER_SCHEMA_VERSION
- bootstrap_gate marker size-cap constant -> _MARKER_MAX_BYTES

Sites updated: marker-schema constant references in
hooks/bootstrap_gate.py (declaration + docstring + two verifier
sites), hooks/shared/dispatch_helpers.py (re-export comment +
coupling cross-reference), commands/bootstrap.md (producer-coupling
comment), tests/test_bootstrap_gate.py (import + assertion +
docstring), and the runbook section bodies.

Test cardinality unchanged at 7334 passed / 18 skipped. pyright clean.
Closes a confused-deputy bypass: the task-lifecycle gate's lead-only-
completion advisory was suppressed when the owning agent's name
matched the self-completion-exempt set. The dispatch gate did not
reserve those same names, so a spawn could choose one as its name and
defeat the central completion-authority invariant the gates exist to
enforce.

Reserved names now include `secretary` and `pact-secretary` (the two
self-completion-exempt agents). A subset-invariant test mechanically
prevents future drift: any addition to the exempt set without a
matching addition to the reserved-name set will fail
test_self_complete_exempt_agents_are_all_reserved.

Two follow-ups folded in:
- Smoke helper return-type narrowed to match the comprehensive helper
  (1-line `int()` coercion fix surfaced by pyright).
- Two surviving s-prefixed smoke test names renamed to behavioral
  identifiers per the no-planning-artifacts rule.

Test cardinality 7334 -> 7337 (+3 = two reserved-name parametrize
cases + the subset-invariant test). Pyright clean across the gate
sources and smoke test.
redaction, and consolidate sources of truth

Several gate-correctness improvements that close real defects without
expanding architectural scope:

- Spawn-name regex now requires at least one alphanumeric character,
  rejecting degenerate forms (single hyphen, only hyphens, leading
  or trailing hyphen) that the previous looser pattern accepted.

- Session team-name normalization happens once at gate entry
  (strip + lowercase) and the normalized form flows through every
  rule. The earlier code lowercased only at the session-equality
  comparison, so the registry / member-uniqueness / task-assignment
  lookups could see a different casing than the comparison did,
  producing inconsistent verdicts for mixed-case team-name input.

- The lifecycle gate's local copy of the self-completion-exempt
  agent set has been removed; the canonical set is now imported from
  the shared intentional-wait module. This eliminates a drift surface
  that could re-open the dispatch / lifecycle bypass closed by the
  earlier reserved-name extension and the cross-module subset
  invariant test.

- The has_task_assigned helper now delegates path construction and
  per-file reading to the canonical task-utils helper, removing the
  path-layout duplication that previously caused a divergence between
  the helper and the harness.

- Journal-write redaction now covers Anthropic api keys (sk-ant- and
  api03 variants), GitHub OAuth / user / server / refresh tokens
  (gho_, ghu_, ghs_, ghr_), Google api keys (AIza prefix), and PEM
  private-key blocks (multi-line non-greedy).

- The subset-assertion test docstring documents the categorical
  pattern: any future privilege class keyed on owner-name must live
  in a shared module and carry its own subset assertion against the
  reserved-name set, so the same defect class cannot recur.

- Comprehensive test for the missing-handoff rule on lifecycle
  completion (parametrized over absent, empty-dict, and null shapes,
  with paired assertions that the schema-invalid rule does not also
  fire — pinning the disjointness invariant).

Test cardinality: 7337 -> 7352 (+15). Pyright clean across the gate
sources.
@michael-wojcik michael-wojcik merged commit 96329a3 into main May 7, 2026
michael-wojcik added a commit that referenced this pull request May 10, 2026
… for BOTH flag-walks

Closes the authorization-mismatch bypass surfaced in PR #697 review (F-5).
The original PR's `_GH_PR_NUMBER_RE` used the broad `_GH_PREFIX` (with
`_GH_GLOBAL_FLAGS`) for the pre-subcommand walk, which re-anchored at the
SECOND `gh pr merge {N}` occurrence in commands containing heredoc bodies
with embedded merge-command literals.

Concrete attack scenario:

    gh pr merge 663 --body "$(cat <<EOF
    Fixes #999. See related: gh pr merge 999 --admin
    EOF
    )" --squash

OLD: regex captured 999 (from heredoc body) → AskUserQuestion prompted
operator for PR #999 → operator approves (legitimate cross-link in body)
→ token written for #999 → PostToolUse consumes #999 token → actual
command merges #663. Operator authorized the wrong PR.

NEW: regex uses `_GH_FLAG_TOKENS` for BOTH the pre-subcommand walk AND
the post-subcommand walk. `_GH_FLAG_TOKENS` matches only flag-shaped
tokens (`-x`, `--long`, optionally `--flag value`), so it cannot
re-anchor at heredoc-embedded `gh pr` literals. The first `gh pr merge`
match wins; subsequent occurrences in body content are ignored.

Test changes:
- `test_heredoc_body_with_embedded_gh_pr_merge` converted from strict-xfail
  to passing test in new TestGH_PR_NumberRE_AuthorizationBypassFixed class
- New `test_authorization_mismatch_attack` pins the end-to-end attack
  shape that this fix prevents
- Other xfail (`test_branch_name_with_digit_prefix_suffix_match`, the
  7352-tests case) remains xfail-strict — different root cause (Python `\b`
  boundary semantics at digit-to-hyphen). Tracked separately for follow-up.

Verification: 13 tests in test_merge_guard_pre.py — 12 passing + 1 xfailed
(was 11 + 2). Full suite: 7573 passed (up from 7561). Empirical probe
script /tmp/probe_s1.py confirms heredoc case now captures 663 not 999;
all 9 prior TRUE GAINS still pass.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Rename Task→Agent throughout PACT plugin (revert #4c286c1f) + enforce name/team_name on dispatch

1 participant