From fde4c433da247ca3e0a6241bbf7af920696815e9 Mon Sep 17 00:00:00 2001 From: michael-wojcik <5386199+michael-wojcik@users.noreply.github.com> Date: Wed, 29 Apr 2026 20:58:54 -0400 Subject: [PATCH 1/6] Add inbox-wake skill (D1 Monitor-only, symmetric scope) Canonical SSOT for the wake mechanism at pact-plugin/skills/inbox-wake/SKILL.md. D1 design: Monitor-only with no cron-watchdog (cron-fire was empirically observed to terminate the registering session's Monitor). Symmetric scope: same skill arms both lead and teammate sessions via {agent_name} parameterization. Body: 162 lines. Sections: Overview (alarm-clock framing, signal-not-mailbox + between-tool-calls-not-mid-tool), When to Invoke, Operations (Arm + Teardown only, no Recovery), Monitor/WriteStateFile/Teardown blocks, six failure modes (silent Monitor death, long single-tool blocks wake, malformed STATE_FILE, schema-version mismatch, per-agent independence, concurrent re-arm), Verification, References. Agent-reader-primary audit annotations on every section. --- pact-plugin/skills/inbox-wake/SKILL.md | 162 +++++++++++++++++++++++++ 1 file changed, 162 insertions(+) create mode 100644 pact-plugin/skills/inbox-wake/SKILL.md diff --git a/pact-plugin/skills/inbox-wake/SKILL.md b/pact-plugin/skills/inbox-wake/SKILL.md new file mode 100644 index 00000000..b01f0800 --- /dev/null +++ b/pact-plugin/skills/inbox-wake/SKILL.md @@ -0,0 +1,162 @@ +--- +name: inbox-wake +description: | + Arms a per-agent Monitor that fires a turn on inbox-grow, closing the + poller-gated wake window during long-running operations. One skill, two + invocation sites: lead at SessionStart, every teammate at SubagentStart. + Use when: arming wake at session/subagent start; tearing down at session + end (/wrap-up, /pause, /imPACT, teammate Shutdown). +--- + +# Inbox-Wake Skill + +Per-agent wake mechanism for PACT teams: a single Monitor task per agent watches that agent's inbox file via `wc -c` byte-grow and fires a turn on growth, between tool calls. + +## Overview + +> **Monitor is an alarm clock, not a mailbox.** On `INBOX_GREW`, end the turn and return to idle — the platform's idle-delivery is the channel-of-record for content. Never read the inbox file or parse the wake's stdout payload yourself. +> +> **Wake surfaces between tool calls within a turn, not mid-tool.** Monitor's `INBOX_GREW` emit cannot interrupt a single in-flight tool call. The platform queues `INBOX_GREW` events that fire during a long-running tool and delivers them when the tool returns, bundled with the tool's result. The wake mechanism's promise is "messages surface between tool calls within a turn," NOT "instant interrupt anywhere." For multi-tool turns the wake reliably opens the poller-gate between tools; for single long tools (e.g., a 90-second blocking sleep) the agent is effectively unwakeable until the tool returns. + +Problem this solves: during long-running operations, the platform's `useInboxPoller` only delivers queued `SendMessage` between tool calls; long blocking tool calls leave inbound messages stuck until the next idle boundary. See [Communication Charter Part I — Delivery Model](../../protocols/pact-communication-charter.md#delivery-model). The Monitor's stdout emit forces a turn at the next between-tool-call boundary, bounding latency by the poll interval (2 s) rather than by the next opportunistic idle. + +Single-Monitor model, no in-session watchdog. Lifetime is session-scoped per agent. Inbox path is a single JSON file (`inboxes/{agent-name}.json`), not a directory. + +**Audit**: both alarm-clock paragraphs are non-negotiable. The first prevents an editing LLM from writing "parse the wake stdout to extract content" — wake is signal, not content. The second prevents an editing LLM from inferring mid-tool interrupt from "wake on inbox grow" — the substrate's actual capability is between-tool, not anywhere. Removing either paragraph silently overpromises the mechanism. + +## When to Invoke + +| Operation | Site | Trigger | +|---|---|---| +| **Arm** | Lead session | SessionStart-emitted directive (`session_init.py` `additionalContext`) | +| **Arm** | Teammate session | SubagentStart-emitted directive (`peer_inject.py` `additionalContext`) | +| **Teardown** | Lead session | `/wrap-up`, `/pause`, `/imPACT` command bodies | +| **Teardown** | Teammate session | `pact-agent-teams` `## Shutdown` — before approving `shutdown_request` | + +D1 has only Arm and Teardown — no Recovery operation, no in-session watchdog. A silently-dead Monitor is undetectable in-session and the mechanism degrades to "no wake" until the next SessionStart re-arms. + +**Audit**: an editing LLM tempted to add a Recovery operation "for symmetry with prior PACT skills" hits the explicit prohibition above plus the kill-mechanism rationale in `## Failure Modes` (the cron+Monitor watchdog combination kills its own Monitor; D1 deliberately drops the watchdog layer). + +## Operations + +### Arm + +Idempotent. Pass `agent_name` parameter (lead invokers pass `agent_name="team-lead"`; teammate invokers pass their own name). + +1. If STATE_FILE is present and parses with `v=1`: no-op (already armed; cheap on every SessionStart re-fire). +2. Otherwise cold-start: spawn the Monitor (see `## Monitor Block`); capture the returned `monitor_task_id`; write the STATE_FILE atomically (see `## WriteStateFile Block`). + +### Teardown + +Best-effort. Pass `agent_name`. See `## Teardown Block` for the exact sequence. Tolerates a Monitor that died silently mid-session. + +**Audit**: idempotency lives in the skill (STATE_FILE-presence check), NOT in the directive that invokes it. An editing LLM tempted to add an "if not already armed" guard at the directive site would re-introduce LLM-self-diagnosis as the gate, which is the failure mode the unconditional-emit discipline closes. + +## Monitor Block + +Canonical Monitor `cmd` body. Both `{team_name}` and `{agent_name}` placeholders are interpolated by the arming agent at Arm time. For the lead: `{agent_name}` = `team-lead`. For teammates: `{agent_name}` = the spawned teammate's name (e.g., `architect`, `preparer`). + +```bash +INBOX="$HOME/.claude/teams/{team_name}/inboxes/{agent_name}.json" +PREV=0 +while true; do + if [ -f "$INBOX" ]; then + SIZE=$(wc -c < "$INBOX" 2>/dev/null | tr -d ' ') + if [ "$SIZE" -gt "$PREV" ] 2>/dev/null; then + echo "INBOX_GREW size=$SIZE ts=$(date -u +%Y-%m-%dT%H:%M:%SZ)" + PREV=$SIZE + fi + fi + sleep 2 +done +``` + +Discipline: +- **Single-file inbox.** The inbox is `inboxes/{agent-name}.json` — a single JSON file, NOT a directory. Byte-grow detection via `wc -c`, NOT directory inotify. +- **Stdout discipline.** Each stdout line fires a turn. Emit ONLY `INBOX_GREW size=… ts=…` on real grow. Diagnostic and lifecycle output goes to `>&2` (stderr does not turn-fire). +- **Transient-error suppression.** `wc -c 2>/dev/null` swallows transient read errors so a momentary missing-file does not crash the loop. + +Spawn via `Monitor(persistent=true, cmd=)`; the returned task ID is captured for STATE_FILE write. + +**Audit**: stdout shape is exactly one line per grow event. An editing LLM "adding diagnostic info" by including `prev=$PREV` or by `echo`-ing on every poll will silently turn-fire on every poll cycle, creating a token-cost regression. F1 (single-file inbox) is the load-bearing wake trigger; an editing LLM who confuses inbox-as-directory will silently break wake delivery — the inbox path must remain `inboxes/{agent-name}.json`. + +## WriteStateFile Block + +Atomic-rename JSON write. STATE_FILE path is per-agent: `~/.claude/teams/{team_name}/inbox-wake-state-{agent_name}.json`. + +```python +state_path = Path.home() / ".claude" / "teams" / team_name / f"inbox-wake-state-{agent_name}.json" +payload = { + "v": 1, + "monitor_task_id": , + "armed_at": datetime.now(timezone.utc).isoformat(timespec="seconds").replace("+00:00", "Z"), +} +tmp = state_path.with_suffix(".json.tmp") +tmp.write_text(json.dumps(payload), encoding="utf-8") +os.replace(tmp, state_path) # atomic rename +``` + +Schema is intentionally minimal — exactly 3 fields: `v`, `monitor_task_id`, `armed_at`. D1 has no watchdog, so no `cron_job_id` and no heartbeat fields are written or read. The per-agent suffix lives in the **filename**, not in the schema. + +**Audit**: D1 has no cron and no heartbeat — STATE_FILE has 3 fields, no more. If you find yourself adding a 4th field, stop and re-read `## Failure Modes` on silent Monitor death. An editing LLM reasoning by analogy with prior cron+Monitor designs might re-add `cron_job_id`; do not. An editing LLM tempted to add `agent_name` as a schema field gets the structural answer: the filename carries that information; the schema stays minimal. + +## Teardown Block + +Order is load-bearing: stop live Monitor before unlinking the registry sidecar. + +1. Read STATE_FILE; if absent or invalid (malformed JSON / `v ≠ 1`), skip step 2 — nothing to stop. +2. `TaskStop(STATE_FILE.monitor_task_id)` — **ignoring not-found errors** (the Monitor may have died silently mid-session). +3. Unlink STATE_FILE — `Path.unlink(missing_ok=True)`. + +Teardown is best-effort. The Monitor may have died silently — `TaskStop` will return a `tool_use_error` in that case. Tolerate not-found and continue to step 3. Do not abort teardown on TaskStop failure; an undeleted STATE_FILE is worse than a failed TaskStop because it leaves a phantom registry entry that confuses the next session's Arm. + +Ordering rationale: the inverse ordering would leave a brief window where a STATE_FILE-less Monitor still runs but Arm sees no STATE_FILE and re-arms — creating an orphan. + +**Audit**: F6 tolerance phrasing ("ignoring not-found errors") is the load-bearing fragment. An editing LLM "tightening up error handling" by removing the phrase silently restores crash-on-stale-ID. The principle anchor — Teardown is best-effort because a torn-down session may have already lost its Monitor — tells the editing LLM why the phrase exists. + +## Failure Modes + +These failure modes apply per-agent — every agent (lead and every teammate) Arms its own Monitor on its own inbox, so each entry below describes the agent's own Monitor and the agent's own idle delivery. + +### Malformed STATE_FILE + +If the STATE_FILE exists but fails to parse as JSON, Arm treats it as not-armed and cold-starts. The pre-existing file is overwritten by the atomic-rename write. + +### Schema-version mismatch + +If `v` is not `1`, Arm treats the STATE_FILE as not-armed and cold-starts, overwriting on write. Future schema bumps must increment `v` and re-arm cleanly. + +### Silent Monitor death + +> **Silent Monitor death**: D1 has no in-session watchdog. If the Monitor task dies silently mid-session (process crash, OOM, harness GC, undisclosed lifetime cap), the wake mechanism degrades to no-wake until the next SessionStart re-arms. Detection is not possible from inside the agent's session; a manual `Skill("PACT:inbox-wake") + Arm` invocation will re-arm if the agent notices missed messages by other means (e.g., a peer complaining about no response). This is a deliberate trade against the cron-fire-kills-Monitor self-defeating loop empirically observed 2026-04-29 (cron-fire #1 at 22:49:36Z killed `monitor_task_id=bu1pmbva7` ~19 s later; isolated Monitor in same session survived ≥10 min — see PREPARE §C, hypothesis H1 HIGH-confidence). Future work tracked separately if/when the platform offers a watchdog primitive that does not kill the watched task. + +**Audit**: this paragraph is the principle anchor for D1's central trade-off (no watchdog). An editing LLM later proposing "let's add a heartbeat + cron watchdog after all" gets the empirical reason in prose: the cron+Monitor combination kills its own Monitor. The verbatim cite of `monitor_task_id=bu1pmbva7` and `2026-04-29T22:49:36Z` makes the rationale unambiguous and traceable to PREPARE §C. + +### Long single-tool calls block wake delivery + +> **Long single-tool calls block wake delivery**: Monitor's `INBOX_GREW` stdout emit fires when the inbox grows, but events that fire during a long-running single tool call (e.g., a 90-second blocking `sleep`) are queued and delivered to the agent only when that tool returns, bundled with the tool's result. The wake mechanism does NOT interrupt a tool mid-call. For multi-tool turns the wake surfaces between tool calls; for single long tools the agent is effectively unwakeable until the tool returns. Verified 2026-04-30T00:00–00:02Z in session pact-5951b31c. Test: 90-second blocking `sleep` running in parallel with a peer-dispatched delayed reply. Peer sent at 00:01:34Z; Monitor `INBOX_GREW` fired at 00:01:43Z and 00:01:45Z (during the sleep); Bash returned at the full 90s (00:02:23Z); teammate-message content delivered in the *next* turn via standard idle-delivery, not mid-tool. + +**Audit**: this paragraph is the principle anchor for D1's scope claim ("between tool calls, not mid-tool"). An editing LLM reading the skill body and seeing "wake on inbox grow" will reasonably infer mid-tool interrupt unless explicitly told otherwise. That inference is wrong. The §Overview's second alarm-clock paragraph + this entry together correctly scope the substrate's actual capability — the wake is a between-tool-call signal. The empirical timing tokens (00:01:34Z send, 00:01:43Z + 00:01:45Z `INBOX_GREW` fire-during-tool, 00:02:23Z tool return, next-turn delivery) make the constraint observable and reproducible, not just asserted. + +### Per-agent independence + +> Teammate Monitors and lead Monitors are independent processes. A teammate's Monitor death does not affect the lead's wake; a long-running tool call in one teammate does not block the lead's wake delivery. Each agent's wake is a per-agent guarantee — the dispatch graph as a whole tolerates partial wake degradation gracefully. + +**Audit**: an editing LLM might infer "if one Monitor dies, the whole team's wake is broken" — the per-agent-independence note prevents that incorrect inference. An editing LLM tempted to add a cross-agent watchdog "for resilience" would re-introduce the cron-watchdog pattern PREPARE §C falsified. The "no in-session watchdog" framing applies symmetrically; no per-agent watchdog either. + +### Concurrent re-arm + +If two SessionStart fires race (rare; resume-during-compact edge), both may attempt cold-start. Atomic-rename write makes STATE_FILE corruption impossible; the second write wins, the loser's Monitor task is orphaned but the next Teardown's `TaskStop(STATE_FILE.monitor_task_id)` only stops the winner. Orphan accumulation is bounded by the rarity of the race; force-termination cleanup via `cleanup_wake_registry` glob covers the registry sidecar regardless. + +## Verification + +See dogfood runbook `pact-plugin/tests/runbooks/591-inbox-wake.md` for end-to-end verification (fresh-session arm, inbox-grow wake, teardown, force-termination cleanup). Structural-pattern tests in `pact-plugin/tests/test_inbox_wake_skill_structure.py` and siblings verify skill-body invariants (section presence, F1/F6/F7 phrasing, alarm-clock anchor, per-agent symmetry). + +## References + +- [Communication Charter Part I §Wake Mechanism](../../protocols/pact-communication-charter.md#wake-mechanism) — protocol contract surface +- [Communication Charter Part I §Delivery Model](../../protocols/pact-communication-charter.md#delivery-model) — async-at-idle-boundary delivery model +- Approved plan: `docs/plans/inbox-wake-skill-plan.md` +- Authoritative design: `docs/architecture/591-inbox-wake-skill.md` (D1 Monitor-only, symmetric scope) +- PREPARE deliverable: `docs/preparation/591-inbox-wake-skill.md` — §C kill-mechanism investigation; §D alternative wake mechanisms +- Issue #591 (this feature); #594 (skill-body line-count ceiling); #444 (compaction durability + hook-emitted-directives) From 583f192e0d001e61c08dfd5a88322633a59b1e4a Mon Sep 17 00:00:00 2001 From: michael-wojcik <5386199+michael-wojcik@users.noreply.github.com> Date: Wed, 29 Apr 2026 21:09:28 -0400 Subject: [PATCH 2/6] Wire wake-arm directives into session_init and peer_inject hooks Add lead-side wake-arm directive append to session_init.py at line 709 region (after format_plugin_banner) and teammate-side directive via peer_inject.py SubagentStart additionalContext (chain-end after _COMPLETION_AUTHORITY_NOTE). Both directives are unconditional Tier-0 emissions per #444 hook-emitted- directives discipline. Teammate directive uses {agent_name} interpolation through _sanitize_agent_name for marker-spoofing defense. Add cleanup_wake_registry(team_name) helper to session_end.py with path-traversal pattern: is_safe_path_component gate, teams_root.resolve() with relative_to escape check, glob inbox-wake-state-*.json scoped to the validated team_dir, Path.unlink(missing_ok=True) wrapped in try/except OSError. Glob covers both lead and teammate STATE_FILE sidecars in one pass per the symmetric-scope design. Pre-mitigation test relaxation in test_peer_inject.py: assertions that previously matched _COMPLETION_AUTHORITY_NOTE as the result tail now match it as immediately preceding the new wake-arm directive. Suffix length arithmetic extended with rendered _WAKE_ARM_TEMPLATE length. 454 tests pass. --- pact-plugin/hooks/peer_inject.py | 33 +++++++++++++--- pact-plugin/hooks/session_end.py | 56 +++++++++++++++++++++++++++ pact-plugin/hooks/session_init.py | 14 +++++++ pact-plugin/tests/test_peer_inject.py | 54 +++++++++++++++++++++----- 4 files changed, 143 insertions(+), 14 deletions(-) diff --git a/pact-plugin/hooks/peer_inject.py b/pact-plugin/hooks/peer_inject.py index 2f174e23..117fc2af 100644 --- a/pact-plugin/hooks/peer_inject.py +++ b/pact-plugin/hooks/peer_inject.py @@ -55,6 +55,25 @@ ) +# Wake-arm directive — Tier-0 additionalContext (#591). Emitted per +# SubagentStart fire alongside the bootstrap prelude, peer list, and +# trailing reminders. {agent_name} interpolates via the existing +# safe_name pattern at the call site so the skill watches the correct +# inbox (inboxes/{agent_name}.json) and writes the correct STATE_FILE +# (inbox-wake-state-{agent_name}.json). Idempotent in the skill — the +# directive emits unconditionally; the skill no-ops if a valid +# STATE_FILE is already on disk for this agent. Per architect §15.2, +# Tier-0 hook delivery is the chosen channel (vs Tier-1 skill body) +# because #444's "hook-emitted directives: unconditional > conditional" +# discipline is non-negotiable for spawn-time directives. +_WAKE_ARM_TEMPLATE = ( + "\n\nArm wake mechanism: invoke Skill(\"PACT:inbox-wake\") and execute the " + "Arm operation before any tool call. Pass agent_name=\"{agent_name}\" so the " + "skill watches the correct inbox. Arm is idempotent — invoke unconditionally; " + "the skill no-ops if a valid STATE_FILE is already on disk for this agent." +) + + def _sanitize_agent_name(agent_name: str) -> str: """Strip characters from agent_name that could break out of the PACT ROLE marker format. @@ -167,12 +186,15 @@ def get_peer_context( ) prelude = _BOOTSTRAP_PRELUDE_TEMPLATE.format(agent_name=safe_name) + wake_arm = _WAKE_ARM_TEMPLATE.format(agent_name=safe_name) # Output ordering: prelude → peer_context → "\n\n" → plugin banner → - # _TEACHBACK_REMINDER → _COMPLETION_AUTHORITY_NOTE. The plugin banner - # is a single line with no leading/trailing newlines, so an explicit - # "\n\n" separator goes between peer_context and the banner. - # _TEACHBACK_REMINDER and _COMPLETION_AUTHORITY_NOTE each begin with - # "\n\n", preserving visual spacing through the trailing reminders. + # _TEACHBACK_REMINDER → _COMPLETION_AUTHORITY_NOTE → wake_arm. The + # plugin banner is a single line with no leading/trailing newlines, so + # an explicit "\n\n" separator goes between peer_context and the banner. + # _TEACHBACK_REMINDER, _COMPLETION_AUTHORITY_NOTE, and _WAKE_ARM_TEMPLATE + # each begin with "\n\n", preserving visual spacing through the trailing + # reminders. Wake-arm is chain-end (#591) — additive vs the prelude + # template; future audits find it without searching template internals. return ( prelude + peer_context @@ -180,6 +202,7 @@ def get_peer_context( + format_plugin_banner() + _TEACHBACK_REMINDER + _COMPLETION_AUTHORITY_NOTE + + wake_arm ) diff --git a/pact-plugin/hooks/session_end.py b/pact-plugin/hooks/session_end.py index c9be8027..b58013ac 100644 --- a/pact-plugin/hooks/session_end.py +++ b/pact-plugin/hooks/session_end.py @@ -57,6 +57,52 @@ def get_project_slug() -> str: return "" +def cleanup_wake_registry(team_name: str) -> None: + """Best-effort removal of inbox-wake STATE_FILE sidecars for the given team. + + Belt-and-suspenders for force-termination edge cases (SIGKILL, crash) + where the primary skill-invocation Teardown path didn't run. Cannot + stop the orphaned Monitors — those are agent-runtime tools unreachable + from this hook context. Sidecar removal lets the next session's Arm + cold-start cleanly instead of seeing a STATE_FILE pointing at a + long-dead Monitor. + + Per-agent STATE_FILE: every agent (lead AND every teammate spawned in + the team) owns its own `inbox-wake-state-{agent-name}.json`. This helper + globs the entire family and unlinks each — the lead's + `inbox-wake-state-team-lead.json` and every teammate's + `inbox-wake-state-{teammate-name}.json`. + + D1 has no heartbeat sidecar — single STATE_FILE per agent only. + + Path-traversal discipline (#492/#543 risk class): + - team_name validated via is_safe_path_component (existing helper). + - resolved path asserted under teams_root via relative_to(teams_root). + - Glob pattern `inbox-wake-state-*.json` is constrained to the validated + team_dir; Path.glob returns paths anchored to team_dir, so symlink- + escape via the glob result is closed by the prior relative_to check. + - Path.unlink wrapped in try/except OSError (missing_ok=True suppresses + FileNotFoundError; other OSError subtypes still raise — caught here + per module-wide fail-open posture). + """ + if not team_name or not is_safe_path_component(team_name): + return # fail-closed on invalid team name + teams_root = (Path.home() / ".claude" / "teams").resolve() + team_dir = (teams_root / team_name).resolve() + try: + team_dir.relative_to(teams_root) + except ValueError: + return # team_dir escaped teams_root (symlink attack defense) + try: + for state_file in team_dir.glob("inbox-wake-state-*.json"): + try: + state_file.unlink(missing_ok=True) + except OSError: + pass # fail-open per module convention + except OSError: + pass # fail-open if glob itself fails (e.g., team_dir vanished) + + def check_unpaused_pr( tasks: list[dict] | None, project_slug: str, @@ -801,6 +847,16 @@ def main(): # Callsite short-circuit on empty team_name is the belt-and-suspenders # layer around the internal fail-closed guard. current_team_name = get_team_name() + + # Wake-registry cleanup (#591). Belt-and-suspenders for force- + # termination paths. Cannot reach TaskStop from hook context; + # only the registry sidecar is removable here. D1 has no + # heartbeat sidecar — single STATE_FILE per agent only. Glob + # `inbox-wake-state-*.json` to catch lead AND every teammate's + # sidecar in one pass (symmetric per-agent arming, §15.4). + if current_team_name: + cleanup_wake_registry(current_team_name) + teams_r, teams_s = 0, 0 tasks_r, tasks_s = 0, 0 teams_reaper_ran = False diff --git a/pact-plugin/hooks/session_init.py b/pact-plugin/hooks/session_init.py index 0725feb1..3e454d2b 100755 --- a/pact-plugin/hooks/session_init.py +++ b/pact-plugin/hooks/session_init.py @@ -708,6 +708,20 @@ def main(): # wrapper at the call site. context_parts.append(format_plugin_banner()) + # 4d. Wake-arm directive — Tier-0 additionalContext (#591). Emits + # unconditionally on every SessionStart fire (startup/resume/clear/compact) + # per "Hook-emitted directives: unconditional > conditional" Working Memory + # entry. Arm is idempotent: the skill's Arm operation no-ops if a valid + # STATE_FILE is already on disk, so re-emission is cheap. There is no + # watchdog in D1 — a silently-dead Monitor is undetectable in-session and + # the mechanism degrades to "no wake" until the next SessionStart re-arms. + context_parts.append( + 'Arm wake mechanism: invoke Skill("PACT:inbox-wake") and execute the ' + 'Arm operation before any teammate dispatch. Arm is idempotent — invoke ' + 'unconditionally on every SessionStart (startup, resume, clear, compact); ' + 'the skill no-ops if a valid STATE_FILE is already on disk.' + ) + # 5. Remind orchestrator to create session-unique PACT team (or reuse on resume) team_name = generate_team_name(input_data) diff --git a/pact-plugin/tests/test_peer_inject.py b/pact-plugin/tests/test_peer_inject.py index 23c5b6d3..6165db15 100644 --- a/pact-plugin/tests/test_peer_inject.py +++ b/pact-plugin/tests/test_peer_inject.py @@ -55,7 +55,11 @@ def test_injects_peer_names(self, tmp_path): assert "frontend-coder" in result assert "database-engineer" in result assert "backend-coder" not in result - assert result.endswith(_COMPLETION_AUTHORITY_NOTE) + # Chain-end is the wake-arm directive (#591); completion-authority + # note immediately precedes it. Was result.endswith(...) pre-#591. + assert _COMPLETION_AUTHORITY_NOTE in result + assert result.index(_COMPLETION_AUTHORITY_NOTE) + len(_COMPLETION_AUTHORITY_NOTE) \ + == result.index('\n\nArm wake mechanism: invoke Skill("PACT:inbox-wake")') def test_excludes_spawning_agent(self, tmp_path): from peer_inject import ( @@ -82,7 +86,11 @@ def test_excludes_spawning_agent(self, tmp_path): assert "backend-coder" in result assert "architect" not in result - assert result.endswith(_COMPLETION_AUTHORITY_NOTE) + # Chain-end is the wake-arm directive (#591); completion-authority + # note immediately precedes it. Was result.endswith(...) pre-#591. + assert _COMPLETION_AUTHORITY_NOTE in result + assert result.index(_COMPLETION_AUTHORITY_NOTE) + len(_COMPLETION_AUTHORITY_NOTE) \ + == result.index('\n\nArm wake mechanism: invoke Skill("PACT:inbox-wake")') def test_returns_none_when_no_team_config(self, tmp_path): from peer_inject import get_peer_context @@ -118,7 +126,11 @@ def test_alone_message_when_only_member(self, tmp_path): ) assert "only active teammate" in result.lower() - assert result.endswith(_COMPLETION_AUTHORITY_NOTE) + # Chain-end is the wake-arm directive (#591); completion-authority + # note immediately precedes it. Was result.endswith(...) pre-#591. + assert _COMPLETION_AUTHORITY_NOTE in result + assert result.index(_COMPLETION_AUTHORITY_NOTE) + len(_COMPLETION_AUTHORITY_NOTE) \ + == result.index('\n\nArm wake mechanism: invoke Skill("PACT:inbox-wake")') def test_noop_when_no_team_name(self, tmp_path): from peer_inject import get_peer_context @@ -210,7 +222,11 @@ def test_reminder_appended_when_peers_exist(self, tmp_path): teams_dir=str(tmp_path / "teams") ) - assert result.endswith(_COMPLETION_AUTHORITY_NOTE) + # Chain-end is the wake-arm directive (#591); completion-authority + # note immediately precedes it. Was result.endswith(...) pre-#591. + assert _COMPLETION_AUTHORITY_NOTE in result + assert result.index(_COMPLETION_AUTHORITY_NOTE) + len(_COMPLETION_AUTHORITY_NOTE) \ + == result.index('\n\nArm wake mechanism: invoke Skill("PACT:inbox-wake")') assert "TEACHBACK TIMING" in result def test_reminder_appended_when_alone(self, tmp_path): @@ -236,7 +252,11 @@ def test_reminder_appended_when_alone(self, tmp_path): ) assert "only active teammate" in result.lower() - assert result.endswith(_COMPLETION_AUTHORITY_NOTE) + # Chain-end is the wake-arm directive (#591); completion-authority + # note immediately precedes it. Was result.endswith(...) pre-#591. + assert _COMPLETION_AUTHORITY_NOTE in result + assert result.index(_COMPLETION_AUTHORITY_NOTE) + len(_COMPLETION_AUTHORITY_NOTE) \ + == result.index('\n\nArm wake mechanism: invoke Skill("PACT:inbox-wake")') def test_reminder_contains_key_instructions(self): """The teachback reminder must mention the key instructions: @@ -301,12 +321,24 @@ def test_agent_name_excludes_self_with_reminder(self, tmp_path): ) assert "coder-2" in result - assert result.endswith(_COMPLETION_AUTHORITY_NOTE) + # Chain-end is the wake-arm directive (#591); completion-authority + # note immediately precedes it. Was result.endswith(...) pre-#591. + assert _COMPLETION_AUTHORITY_NOTE in result + assert result.index(_COMPLETION_AUTHORITY_NOTE) + len(_COMPLETION_AUTHORITY_NOTE) \ + == result.index('\n\nArm wake mechanism: invoke Skill("PACT:inbox-wake")') # Slice out the peer-list segment: drop the prelude (everything up to # and including the first blank-line gap before "Active teammates") - # and drop the trailing reminders. - suffix_len = len(_TEACHBACK_REMINDER) + len(_COMPLETION_AUTHORITY_NOTE) + # and drop the trailing reminders. Trailing chain post-#591 is + # _TEACHBACK_REMINDER + _COMPLETION_AUTHORITY_NOTE + _WAKE_ARM_TEMPLATE + # (rendered with agent_name="coder-1"). + from peer_inject import _WAKE_ARM_TEMPLATE + wake_arm_rendered = _WAKE_ARM_TEMPLATE.format(agent_name="coder-1") + suffix_len = ( + len(_TEACHBACK_REMINDER) + + len(_COMPLETION_AUTHORITY_NOTE) + + len(wake_arm_rendered) + ) before_reminder = result[:-suffix_len] peer_list_section = before_reminder.split("Active teammates on your team:", 1)[1] assert "coder-1" not in peer_list_section @@ -1050,7 +1082,11 @@ def test_note_appears_after_teachback_reminder(self, tmp_path): ) assert _COMPLETION_AUTHORITY_NOTE in result - assert result.endswith(_COMPLETION_AUTHORITY_NOTE) + # Chain-end is the wake-arm directive (#591); completion-authority + # note immediately precedes it. Was result.endswith(...) pre-#591. + assert _COMPLETION_AUTHORITY_NOTE in result + assert result.index(_COMPLETION_AUTHORITY_NOTE) + len(_COMPLETION_AUTHORITY_NOTE) \ + == result.index('\n\nArm wake mechanism: invoke Skill("PACT:inbox-wake")') # Teachback reminder precedes completion-authority note. assert result.index(_TEACHBACK_REMINDER) < result.index(_COMPLETION_AUTHORITY_NOTE) From 1d31890974aca799b6ffea6e3ea3072377c816a5 Mon Sep 17 00:00:00 2001 From: michael-wojcik <5386199+michael-wojcik@users.noreply.github.com> Date: Wed, 29 Apr 2026 21:11:22 -0400 Subject: [PATCH 3/6] Wire Teardown invocations, charter contract, and dogfood runbook MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add Skill("PACT:inbox-wake") + Teardown invocations at three callsites: - pact-agent-teams §Shutdown (teammate-side teardown before approving shutdown_request; agent-side TaskStop is the only path that can stop the registered Monitor) - /wrap-up (lead-side teardown before worktree cleanup) - /pause §6 NEW (lead-side teardown before teammate shutdown; resume re-arms via SessionStart unconditional emission) /imPACT deliberately omitted — all six imPACT outcomes either continue work or escalate to user, none warrant lead-side teardown. Add §Wake Mechanism subsection to pact-communication-charter Part I as the protocol contract surface only: contract statement, single-Monitor per agent, per-agent lifetime, alarm-clock framing slug-link, no-watchdog anchor. All implementation mechanics live in the skill body via slug-links to #overview, #teardown-block, etc. — no duplication. Add manual operator runbook at pact-plugin/tests/runbooks/591-inbox-wake.md (7 steps + pre-run + cleanup) with empirical anchors from this session's dogfood: bu1pmbva7 cron-fire kill 22:49:36Z, b0zw6x8bj 43.5-min isolated uptime, long-tool-blocks-wake test 00:01:34Z–00:02:23Z. Negative checks included: turn-fire-on-grow-only, ignoring-not-found tolerance. --- pact-plugin/commands/pause.md | 10 +- pact-plugin/commands/wrap-up.md | 1 + .../protocols/pact-communication-charter.md | 12 ++ pact-plugin/skills/pact-agent-teams/SKILL.md | 2 + pact-plugin/tests/runbooks/591-inbox-wake.md | 136 ++++++++++++++++++ 5 files changed, 159 insertions(+), 2 deletions(-) create mode 100644 pact-plugin/tests/runbooks/591-inbox-wake.md diff --git a/pact-plugin/commands/pause.md b/pact-plugin/commands/pause.md index 77b5e9f3..c86ff99c 100644 --- a/pact-plugin/commands/pause.md +++ b/pact-plugin/commands/pause.md @@ -103,7 +103,13 @@ JSON The timestamp (`ts`) is set automatically by `make_event()` and serves the same purpose as the previous `paused_at` field. -### 6. Shut Down Teammates +### 6. Tear Down the Lead's Wake Mechanism + +Invoke `Skill("PACT:inbox-wake")` and execute the Teardown operation with `agent_name="team-lead"`. This stops the lead's Monitor task (`TaskStop`, ignoring not-found errors) and unlinks `inbox-wake-state-team-lead.json`. Teardown is best-effort — see [Teardown Block](../skills/inbox-wake/SKILL.md#teardown-block) for the exact sequence. + +Run BEFORE step 7 (teammate shutdown). Teammates execute their own Teardown as part of approving `shutdown_request` (see `pact-agent-teams` `## Shutdown`). On resume, `session_init.py` re-arms the lead's Monitor at SessionStart; per-teammate Monitors are re-armed at SubagentStart on respawn. + +### 7. Shut Down Teammates Send `shutdown_request` individually to each active teammate **by name** and wait for responses. The secretary must have completed consolidation tasks (steps 1 and 3) before receiving the shutdown request. @@ -114,7 +120,7 @@ For each active teammate: Do NOT delete the team — it will be garbage-collected or reused on resume. -### 7. Report +### 8. Report ``` "Session paused. PR #{N} open at {url}. Resume with `/PACT:peer-review`." diff --git a/pact-plugin/commands/wrap-up.md b/pact-plugin/commands/wrap-up.md index 96092f98..cea235be 100644 --- a/pact-plugin/commands/wrap-up.md +++ b/pact-plugin/commands/wrap-up.md @@ -31,6 +31,7 @@ This is the deep-clean pass. Pass 1 (workflow-level HANDOFF review) is the prima - **Identify** any temporary files created during the session (e.g., `temp_test.py`, `debug.log`, `foo.txt`, `test_output.json`). - **Delete** these files to leave the workspace clean. +- **Tear down the lead's wake mechanism**: invoke `Skill("PACT:inbox-wake")` and execute the Teardown operation with `agent_name="team-lead"`. This stops the lead's Monitor task and unlinks the `inbox-wake-state-team-lead.json` sidecar. Teardown is best-effort — tolerate `TaskStop` not-found errors per the skill's [Teardown Block](../skills/inbox-wake/SKILL.md#teardown-block). Run BEFORE step 6 (Worktree Cleanup) so the Monitor is stopped while the worktree state file is still reachable for the unlink. ## 4. Orchestration Retrospective (Second-Order Cybernetics) diff --git a/pact-plugin/protocols/pact-communication-charter.md b/pact-plugin/protocols/pact-communication-charter.md index 73e2348e..9027cade 100644 --- a/pact-plugin/protocols/pact-communication-charter.md +++ b/pact-plugin/protocols/pact-communication-charter.md @@ -116,6 +116,18 @@ Before resending an apparently-unacknowledged message, verify the addressee has - For immediate halt of in-flight teammate work, user-side manual interrupt is required. - The team-lead's responsibility to "surface immediately" means at the team-lead's next idle, not at arbitrary real-time. +### Wake Mechanism + +Inbox-grow events fire a turn on the **addressed agent** (lead or teammate) during poller-gated waits. Both directions of the dispatch graph are covered: teammate→lead replies and lead→teammate dispatches. The wake mechanism is best-effort: when armed, it bounds idle-boundary delivery latency by Monitor's 2-s poll interval. There is no in-session watchdog; a silently-dead Monitor degrades the channel to baseline (the agent's existing idle-poll behavior). + +Each agent (lead AND every teammate) arms its own Monitor on its own single-file inbox via `wc -c` byte-grow; the Monitor emits `INBOX_GREW` on stdout to fire a turn on the addressed agent; the agent returns to idle and the platform's `useInboxPoller` delivers the message. Single skill, two invocation sites — see [implementation: Skill("PACT:inbox-wake")](../skills/inbox-wake/SKILL.md) for canonical mechanics. + +Lifetime is session-scoped per agent. The lead's Monitor is armed at SessionStart via `session_init.py`; each teammate's Monitor is armed at SubagentStart via `peer_inject.py` (per-spawn). Re-arm is idempotent — the skill no-ops if a valid STATE_FILE is already on disk for the agent. Teardown fires at session-end paths for the lead (`/wrap-up`, `/pause`) and at `shutdown_request` approval for teammates (see `pact-agent-teams` `## Shutdown`). + +Wake is **signal**, not content. On `INBOX_GREW`, the addressed agent ends the turn and returns to idle — the platform's idle-delivery is the channel-of-record for content. See the skill body's [§Overview alarm-clock framing](../skills/inbox-wake/SKILL.md#overview) for the principle anchor. + +D1 design intentionally has no watchdog. The audit artifact at `docs/architecture/591-inbox-wake-skill-redesign.md` predates the kill-mechanism finding (PREPARE §C) and should not be used as a reference for charter content. + ## Part II — Written Output ## Pillar 1 — Plain English diff --git a/pact-plugin/skills/pact-agent-teams/SKILL.md b/pact-plugin/skills/pact-agent-teams/SKILL.md index 0eedd7ff..eb336416 100644 --- a/pact-plugin/skills/pact-agent-teams/SKILL.md +++ b/pact-plugin/skills/pact-agent-teams/SKILL.md @@ -391,6 +391,8 @@ Before returning your final output: When you receive a `shutdown_request`: +> Before approving `shutdown_request`, invoke `Skill("PACT:inbox-wake")` and execute the Teardown operation with `agent_name=""`. This stops your Monitor task (`TaskStop`, ignoring not-found errors) and unlinks your `inbox-wake-state-.json` sidecar. Teardown is best-effort — if your Monitor died silently mid-session (per §Failure Modes), `TaskStop` returns a tool_use_error that you tolerate and continue to unlink. Approving shutdown_request without prior Teardown leaves your Monitor process orphaned in the harness's runtime store; it harmlessly dies with the session-process termination, but the registry sidecar then requires belt-and-suspenders cleanup via the lead's session_end.py `cleanup_wake_registry` glob. + | Situation | Response | |-----------|----------| | Idle, consultant with no active questions, or domain no longer relevant | Approve | diff --git a/pact-plugin/tests/runbooks/591-inbox-wake.md b/pact-plugin/tests/runbooks/591-inbox-wake.md new file mode 100644 index 00000000..a50ae1ac --- /dev/null +++ b/pact-plugin/tests/runbooks/591-inbox-wake.md @@ -0,0 +1,136 @@ +# Runbook: Inbox-Wake Skill (D1) — Live Dogfood Validation + +**Scope**: end-to-end verification of `Skill("PACT:inbox-wake")` Arm and Teardown across lead and teammate sessions. CI tests cover structural invariants (skill-body sections, hook directive presence, callsite token presence); this runbook covers behaviors that require a live `Monitor` task and a live inbox-grow event. + +**Operator**: a human (or supervised lead session) running each step interactively. Mark each step pass/fail with timestamp and observed evidence. + +**Empirical anchors** (from PREPARE / dogfood evidence carried forward): + +- **Kill-mechanism (rejected design)**: cron-fire #1 `2026-04-29T22:49:36Z` killed `monitor_task_id=bu1pmbva7` ~19 s later; isolated Monitor in same session survived ≥10 min. D1 drops cron entirely. See `docs/preparation/591-inbox-wake-skill.md` §C, hypothesis H1 HIGH-confidence. +- **A-test ship-precondition**: isolated Monitor `monitor_task_id=b0zw6x8bj` armed `2026-04-29T23:11:33Z`, survived to `2026-04-29T23:55:09Z` (43.5 min uptime). H4 (undisclosed Monitor wallclock cap) falsified at relevant scale. +- **Long-tool-blocks-wake**: verified `2026-04-30T00:00–00:02Z` in session pact-5951b31c. Peer send `00:01:34Z`; Monitor `INBOX_GREW` fired `00:01:43Z` and `00:01:45Z` during a 90-s blocking sleep; Bash returned `00:02:23Z`; teammate-message content delivered in the *next* turn via standard idle-delivery, not mid-tool. See skill body §Failure Modes. + +--- + +## Pre-Run Checklist + +- [ ] Plugin version is 3.21.0 or later (grep `pact-plugin/.claude-plugin/plugin.json`). +- [ ] `pact-plugin/skills/inbox-wake/SKILL.md` exists with frontmatter `name: inbox-wake`. +- [ ] `~/.claude/teams/{TEAM}/inboxes/team-lead.json` exists for the live team. +- [ ] No stale `~/.claude/teams/{TEAM}/inbox-wake-state-*.json` from a prior aborted session (if present, delete before run). + +Substitute `{TEAM}` with the active team name throughout. + +--- + +## Step 1 — Fresh-Session Arm (Lead) + +**Goal**: confirm `session_init.py` emits the wake-arm directive at SessionStart, the lead invokes the skill, and the STATE_FILE is written. + +1. Start a fresh PACT session (or `/clear`). +2. Confirm the lead's first turn shows the wake-arm directive in additionalContext (lead invokes `Skill("PACT:inbox-wake")` with the Arm operation, `agent_name="team-lead"`). +3. Verify the cold-start path: `ls ~/.claude/teams/{TEAM}/inbox-wake-state-team-lead.json` should exist. +4. Inspect the file: `cat ~/.claude/teams/{TEAM}/inbox-wake-state-team-lead.json` — must show 3 fields exactly: `v: 1`, `monitor_task_id: `, `armed_at: `. +5. Confirm `monitor_task_id` is a live task: `TaskGet(monitor_task_id)` reports `status: in_progress`. + +**Pass criteria**: STATE_FILE present, schema valid, Monitor task live. + +--- + +## Step 2 — INBOX_GREW Wake on Inbound SendMessage (Lead) + +**Goal**: confirm Monitor fires `INBOX_GREW` on inbox byte-grow, ending the lead's turn between tool calls and surfacing the message via standard idle-delivery. + +1. From a teammate context (or via direct file append in a separate terminal) cause an inbox-grow on the lead. Easiest: spawn any teammate via `Task(...)`; the spawn's prelude write grows `inboxes/team-lead.json`. +2. Observe lead's stdout for an `INBOX_GREW size=… ts=…` line during a poller-gated wait. +3. Confirm the lead's turn ends and the platform's `useInboxPoller` delivers the message. + +**Pass criteria**: `INBOX_GREW` line emitted on stdout (turn-firing); message delivered next idle. + +**Negative check**: between grows, stdout must NOT emit any `INBOX_GREW`-shaped line. The Monitor must not turn-fire on every poll cycle. + +--- + +## Step 3 — Long Single-Tool Wake Latency (documented limitation) + +**Goal**: empirically confirm that wake events that fire during a single long-running tool call are queued and delivered with the tool's return, NOT mid-tool. This is documented behavior (skill body §Failure Modes "Long single-tool calls block wake delivery"), not a defect. + +1. Lead initiates a single long-running Bash call (e.g., `sleep 90`) in parallel with a teammate dispatch that will reply during the sleep. +2. Confirm `INBOX_GREW` line(s) appear during the sleep window. +3. Confirm the sleep completes and only AFTER its return does the lead's next turn process the inbox content. + +**Pass criteria**: timing matches the empirical anchor (peer send ~T+0; `INBOX_GREW` ~T+10–15s during the tool call; tool return at T+90s; teammate-message delivery in the next turn). The wake mechanism does NOT interrupt mid-tool. + +--- + +## Step 4 — Teammate-Side Arm at SubagentStart + +**Goal**: confirm `peer_inject.py` emits the teammate-side wake-arm directive on SubagentStart and the spawned teammate arms its own Monitor on its own inbox. + +1. Spawn a teammate (any pact-* agent type). +2. Confirm the teammate's first turn shows the wake-arm directive in additionalContext with the teammate's `agent_name` interpolated. +3. After the teammate's first non-Read tool call, verify `~/.claude/teams/{TEAM}/inbox-wake-state-{teammate-name}.json` exists with valid 3-field schema. +4. Confirm the teammate's `monitor_task_id` is distinct from the lead's and is live. + +**Pass criteria**: per-teammate STATE_FILE present, schema valid, Monitor live, distinct from lead's Monitor. + +--- + +## Step 5 — Teammate-Side Teardown on shutdown_request + +**Goal**: confirm a teammate executes Teardown before approving `shutdown_request` (per `pact-agent-teams` `## Shutdown`), stopping its Monitor and unlinking its sidecar. + +1. Send `shutdown_request` to the teammate spawned in Step 4. +2. Confirm in the teammate's response that it invoked `Skill("PACT:inbox-wake")` Teardown with its own `agent_name` BEFORE approving. +3. After teammate process terminates, verify `inbox-wake-state-{teammate-name}.json` is gone. +4. Verify the teammate's `monitor_task_id` is no longer live (`TaskGet` reports completed/stopped or not-found). + +**Pass criteria**: teammate sidecar unlinked, teammate Monitor stopped, teardown ran before approval. + +**Tolerance check**: if the teammate's Monitor died silently mid-session, `TaskStop` returns a `tool_use_error`; teardown must continue and unlink the sidecar regardless. The skill body's §Teardown Block phrase "ignoring not-found errors" is the load-bearing guard. + +--- + +## Step 6 — Lead-Side Teardown via /wrap-up or /pause + +**Goal**: confirm the lead's `/wrap-up` step 3 (Workspace Cleanup) and `/pause` step 6 (Tear Down the Lead's Wake Mechanism) execute Teardown for the lead, stopping the lead's Monitor and unlinking `inbox-wake-state-team-lead.json`. + +1. Run `/PACT:wrap-up` (or `/PACT:pause`). +2. Observe the Teardown invocation in the command flow. +3. Verify `inbox-wake-state-team-lead.json` is gone after teardown step. +4. Verify the lead's `monitor_task_id` is no longer live. + +**Pass criteria**: lead sidecar unlinked, lead Monitor stopped, Teardown ran in the documented step order (wrap-up §3 BEFORE §6 worktree cleanup; pause §6 BEFORE §7 teammate shutdown). + +--- + +## Step 7 — Force-Termination Glob Cleanup + +**Goal**: confirm `session_end.py::cleanup_wake_registry` globs `inbox-wake-state-*.json` and unlinks every per-agent sidecar in one pass when the lead's session terminates without running through `/wrap-up` or `/pause`. + +1. With at least one lead sidecar AND one or more teammate sidecars present (e.g., teammates that were force-terminated without their `## Shutdown` Teardown firing), force-terminate the lead's session. +2. Verify `~/.claude/teams/{TEAM}/inbox-wake-state-*.json` returns no matches after the SessionEnd hook fires. + +**Pass criteria**: zero `inbox-wake-state-*.json` files remain after force-termination. + +**Note**: orphaned Monitor tasks die with the team's process tree; the registry sidecars are what `cleanup_wake_registry` exists to reap. + +--- + +## Cleanup + +After the run, regardless of pass/fail: + +1. Force-stop any leftover Monitor tasks: `TaskList` and `TaskStop` any task whose subject contains the inbox-grow loop body. +2. Remove any leftover sidecars: `rm -f ~/.claude/teams/{TEAM}/inbox-wake-state-*.json`. +3. Record run results inline in this file (date, pass/fail per step, anomalies) or in a session journal entry. + +--- + +## References + +- Skill body: `pact-plugin/skills/inbox-wake/SKILL.md` +- Architect doc: `docs/architecture/591-inbox-wake-skill.md` — §10 (charter scope), §11 (test invariants), §12 (failure modes), §15 (symmetric scope) +- Charter: `pact-plugin/protocols/pact-communication-charter.md` §Wake Mechanism +- PREPARE: `docs/preparation/591-inbox-wake-skill.md` — §C kill-mechanism, §D alternatives, PASSIVE WAKE TEST series +- Issue #591 (this feature); #594 (skill-body line-count ceiling); #444 (compaction durability + hook-emitted-directives) From 59b9fe8a6b66bee40f8813ee7355f9c65daabca9 Mon Sep 17 00:00:00 2001 From: michael-wojcik <5386199+michael-wojcik@users.noreply.github.com> Date: Wed, 29 Apr 2026 21:11:29 -0400 Subject: [PATCH 4/6] Bump plugin version to 3.21.0 Minor bump for the substantial new capability (inbox-wake skill, symmetric lead+teammate scope, session_init/peer_inject wake-arm directives, session_end cleanup helper, charter contract, dogfood runbook). Lockstep across the four version-tracking files: - pact-plugin/.claude-plugin/plugin.json (authoritative) - .claude-plugin/marketplace.json - README.md - pact-plugin/README.md --- .claude-plugin/marketplace.json | 2 +- README.md | 2 +- pact-plugin/.claude-plugin/plugin.json | 2 +- pact-plugin/README.md | 2 +- 4 files changed, 4 insertions(+), 4 deletions(-) diff --git a/.claude-plugin/marketplace.json b/.claude-plugin/marketplace.json index 04862c1c..52de8ec5 100644 --- a/.claude-plugin/marketplace.json +++ b/.claude-plugin/marketplace.json @@ -12,7 +12,7 @@ "name": "PACT", "source": "./pact-plugin", "description": "Orchestration harness that turns Claude Code into a coordinated team of specialist AI agents", - "version": "3.20.4", + "version": "3.21.0", "author": { "name": "Synaptic-Labs-AI" }, diff --git a/README.md b/README.md index dfa1700b..87d619aa 100644 --- a/README.md +++ b/README.md @@ -471,7 +471,7 @@ When installed as a plugin, PACT lives in your plugin cache: │ └── cache/ │ └── pact-plugin/ │ └── PACT/ -│ └── 3.20.4/ # Plugin version +│ └── 3.21.0/ # Plugin version │ ├── agents/ │ ├── commands/ │ ├── skills/ diff --git a/pact-plugin/.claude-plugin/plugin.json b/pact-plugin/.claude-plugin/plugin.json index ae9256ab..8127798e 100644 --- a/pact-plugin/.claude-plugin/plugin.json +++ b/pact-plugin/.claude-plugin/plugin.json @@ -1,6 +1,6 @@ { "name": "PACT", - "version": "3.20.4", + "version": "3.21.0", "description": "Orchestration harness that turns Claude Code into a coordinated team of specialist AI agents", "author": { "name": "Synaptic-Labs-AI", diff --git a/pact-plugin/README.md b/pact-plugin/README.md index 83b86716..f64b658a 100644 --- a/pact-plugin/README.md +++ b/pact-plugin/README.md @@ -1,6 +1,6 @@ # PACT — Orchestration Harness for Claude Code -> **Version**: 3.20.4 +> **Version**: 3.21.0 Turn a single Claude Code session into a managed team of specialist AI agents that prepare, design, build, and test your code systematically. From ebaa67d4ec6935378bafd69a0592f6b1893e68a6 Mon Sep 17 00:00:00 2001 From: michael-wojcik <5386199+michael-wojcik@users.noreply.github.com> Date: Thu, 30 Apr 2026 01:14:36 -0400 Subject: [PATCH 5/6] Add CI tests for inbox-wake skill MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Five new test files covering architect §11 invariants for #591: - test_inbox_wake_skill_structure.py (321L): structural-pattern tests asserting skill body shape — section presence, alarm-clock paragraphs verbatim, §12.a + §12.b empirical timing tokens, F1/F6/F7 invariants, audit annotations, compaction-budget assertion (<=292L), and NEGATIVE invariants fencing re-introduction of Cron Block / Wake-State-Check Algorithm / Recovery / Per-Branch sections - test_session_init_wake_directive.py (116L): hook-test invariants for the lead-side wake-arm directive append at line 709 region - test_peer_inject_wake_directive.py (97L): hook-test invariants for the teammate-side wake-arm directive in _WAKE_ARM_TEMPLATE, gap-filling against Wave-2 chain-position tests - test_inbox_wake_teardown_callsites.py (131L): structural-pattern tests asserting Skill("PACT:inbox-wake") + Teardown invocation presence in pact-agent-teams §Shutdown, /wrap-up, /pause; NEGATIVE invariant for /imPACT (deliberately omitted) - test_session_end_wake_glob.py (95L): asserts cleanup_wake_registry's glob pattern catches inbox-wake-state-*.json correctly 63 test cases total. Smoke run of all five new files plus the three sibling Wave-2 files (test_session_init / test_peer_inject / test_session_end) — 517 tests, fully green. Tests apply phantom-green mitigation: assertions against semantic anchors (operation names, file names, atomic-rename token, threshold references) rather than full-sentence matches. --- .../tests/test_inbox_wake_skill_structure.py | 321 ++++++++++++++++++ .../test_inbox_wake_teardown_callsites.py | 131 +++++++ .../tests/test_peer_inject_wake_directive.py | 97 ++++++ .../tests/test_session_end_wake_glob.py | 95 ++++++ .../tests/test_session_init_wake_directive.py | 116 +++++++ 5 files changed, 760 insertions(+) create mode 100644 pact-plugin/tests/test_inbox_wake_skill_structure.py create mode 100644 pact-plugin/tests/test_inbox_wake_teardown_callsites.py create mode 100644 pact-plugin/tests/test_peer_inject_wake_directive.py create mode 100644 pact-plugin/tests/test_session_end_wake_glob.py create mode 100644 pact-plugin/tests/test_session_init_wake_directive.py diff --git a/pact-plugin/tests/test_inbox_wake_skill_structure.py b/pact-plugin/tests/test_inbox_wake_skill_structure.py new file mode 100644 index 00000000..96ea0c5f --- /dev/null +++ b/pact-plugin/tests/test_inbox_wake_skill_structure.py @@ -0,0 +1,321 @@ +""" +Structural-pattern tests for pact-plugin/skills/inbox-wake/SKILL.md. + +The skill body is the agent-execution surface for the wake mechanism. These +tests are semantic-anchor checks: they parse the skill body and assert +load-bearing tokens, sections, phrases, and explicit absences (negative +invariants). They do NOT execute the skill or simulate agent behavior — +the dogfood runbook covers behavioral verification. + +Phantom-green mitigation: assertions match short semantic anchors +(operation names, file names, threshold tokens, atomic-rename keywords), +not full sentences that an editing LLM could inadvertently rewrite without +breaking meaning. + +Negative invariants are load-bearing: they fence against architectural +drift (re-introduction of cron/watchdog/Recovery branches that PREPARE §C +falsified). +""" +import re +from pathlib import Path + +import pytest + + +SKILL_BODY_PATH = ( + Path(__file__).parent.parent / "skills" / "inbox-wake" / "SKILL.md" +) + + +@pytest.fixture(scope="module") +def skill_body() -> str: + return SKILL_BODY_PATH.read_text(encoding="utf-8") + + +@pytest.fixture(scope="module") +def skill_body_lines(skill_body: str) -> list[str]: + return skill_body.splitlines() + + +class TestSkillBodyFile: + """Skill body file existence + frontmatter.""" + + def test_skill_body_file_exists(self): + assert SKILL_BODY_PATH.is_file(), ( + f"Skill body must exist at {SKILL_BODY_PATH.relative_to(SKILL_BODY_PATH.parents[3])}" + ) + + def test_frontmatter_has_name_inbox_wake(self, skill_body: str): + # Frontmatter is the first --- delimited block. + match = re.match(r"^---\n(.*?)\n---\n", skill_body, re.DOTALL) + assert match is not None, "Skill body must open with YAML frontmatter" + frontmatter = match.group(1) + assert re.search(r"^name:\s*inbox-wake\s*$", frontmatter, re.MULTILINE), ( + "Frontmatter must declare name: inbox-wake" + ) + + def test_frontmatter_has_non_empty_description(self, skill_body: str): + match = re.match(r"^---\n(.*?)\n---\n", skill_body, re.DOTALL) + assert match is not None + frontmatter = match.group(1) + # description: may be a single-line value or a YAML block scalar (|). + # Accept either; require non-whitespace content after "description:". + assert re.search(r"^description:\s*(\S|\|)", frontmatter, re.MULTILINE), ( + "Frontmatter must declare a non-empty description" + ) + + +class TestCompactionLineBudget: + """#594 compaction-restoration ceiling for skill bodies.""" + + def test_skill_body_within_compaction_ceiling(self, skill_body_lines: list[str]): + # Per #444 four-tier durability model: Tier 1 inline-skill restoration + # caps at ~292 lines. Going over silently sheds content on compaction. + assert len(skill_body_lines) <= 292, ( + f"Skill body has {len(skill_body_lines)} lines; ceiling is 292" + ) + + +class TestRequiredSectionsPresent: + """All canonical D1 sections present per architect §5.""" + + REQUIRED_HEADERS = [ + "## Overview", + "## When to Invoke", + "## Operations", + "## Monitor Block", + "## WriteStateFile Block", + "## Teardown Block", + "## Failure Modes", + "## Verification", + "## References", + ] + + @pytest.mark.parametrize("header", REQUIRED_HEADERS) + def test_required_section_present(self, header: str, skill_body_lines: list[str]): + assert header in skill_body_lines, ( + f"Required section header missing: {header!r}" + ) + + +class TestNegativeInvariants: + """Sections that MUST NOT appear — D1 architectural fence. + + PREPARE §C falsified the cron+Monitor self-defeating loop. Re-introducing + any of these sections silently restores the killed-Monitor failure mode. + """ + + FORBIDDEN_HEADERS = [ + "## Cron Block", + "## Wake-State-Check Algorithm", + "## Per-Branch Action Sequences", + "## Recovery", + "### Recovery", + ] + + @pytest.mark.parametrize("header", FORBIDDEN_HEADERS) + def test_forbidden_section_absent(self, header: str, skill_body_lines: list[str]): + assert header not in skill_body_lines, ( + f"Forbidden section reintroduced: {header!r} — D1 deliberately drops " + "the watchdog layer per PREPARE §C kill-mechanism finding" + ) + + def test_no_cron_job_id_as_schema_field(self, skill_body: str): + # STATE_FILE schema is intentionally minimal (3 fields). cron_job_id + # was a rev-3 concept dropped with cron. The token may appear in + # anti-rule prose ("do not re-add cron_job_id"); the negative invariant + # is the SCHEMA shape — check for JSON-style field declarations. + for shape in ('"cron_job_id":', "'cron_job_id':", "cron_job_id ="): + assert shape not in skill_body, ( + f"cron_job_id reintroduced as schema field via {shape!r} — D1 has no watchdog" + ) + + def test_no_heartbeat_field_in_schema(self, skill_body: str): + # HB_FILE / heartbeat field were rev-3 concepts dropped with cron. + # The literal token "heartbeat" may appear ONLY in the schema-minimality + # rationale or anti-rule prose, not as a written/read field. Assert no + # schema-field reintroduction by checking common field-token shapes. + for token in ('"heartbeat":', "'heartbeat':", "heartbeat ="): + assert token not in skill_body, ( + f"Heartbeat reintroduced as schema field via {token!r} — D1 has no heartbeat" + ) + + +class TestOperationsExactlyArmAndTeardown: + """`## Operations` enumerates exactly Arm and Teardown — no Recovery.""" + + def test_arm_subsection_present(self, skill_body: str): + assert re.search(r"^###\s+Arm\b", skill_body, re.MULTILINE), ( + "## Operations must contain ### Arm subsection" + ) + + def test_teardown_subsection_present(self, skill_body: str): + assert re.search(r"^###\s+Teardown\b", skill_body, re.MULTILINE), ( + "## Operations must contain ### Teardown subsection" + ) + + def test_no_recovery_subsection(self, skill_body: str): + assert not re.search(r"^###\s+Recovery\b", skill_body, re.MULTILINE), ( + "Recovery operation reintroduced — D1 has only Arm and Teardown" + ) + + +class TestAlarmClockFraming: + """Both alarm-clock paragraphs are non-negotiable in `## Overview`. + + First paragraph: signal-not-content scope (prevents "lead parses wake stdout"). + Second paragraph: between-tool-call scope (prevents "expects mid-tool interrupt"). + """ + + def test_alarm_clock_paragraph_present(self, skill_body: str): + assert "Monitor is an alarm clock, not a mailbox" in skill_body, ( + "First alarm-clock paragraph (signal-not-content) missing from Overview" + ) + + def test_between_tool_calls_paragraph_present(self, skill_body: str): + # The paragraph carries the scope claim. Anchor on the load-bearing + # phrase pair: "between tool calls" + "not mid-tool". + assert "between tool calls within a turn" in skill_body, ( + "Second alarm-clock paragraph (between-tool-calls scope) missing" + ) + assert "not mid-tool" in skill_body, ( + "Second alarm-clock paragraph must explicitly state 'not mid-tool'" + ) + + +class TestTeardownF6Tolerance: + """`## Teardown Block` contains the F6 tolerance phrasing. + + Removing 'ignoring not-found errors' silently restores crash-on-stale-ID + when TaskStop runs against a Monitor that died silently mid-session. + """ + + def test_ignoring_not_found_errors_phrase(self, skill_body: str): + assert "ignoring not-found errors" in skill_body, ( + "Teardown Block must contain literal phrase 'ignoring not-found errors' " + "(F6 tolerance invariant per PREPARE)" + ) + + +class TestMonitorBlockStdoutDiscipline: + """`## Monitor Block` distinguishes turn-firing stdout from non-firing channels.""" + + def test_inbox_grew_token_present(self, skill_body: str): + assert "INBOX_GREW" in skill_body, ( + "Monitor Block must reference the INBOX_GREW stdout token" + ) + + def test_stderr_non_firing_anchor(self, skill_body: str): + # Anchor on any of the canonical stderr-non-firing phrasings — the + # invariant is "stdout fires turns, stderr does not." Phantom-green + # mitigation: accept multiple canonical phrasings. + anchors = [">&2", "stderr does not turn-fire", "diagnostic and lifecycle output goes to"] + assert any(a in skill_body.lower() for a in [a.lower() for a in anchors]), ( + "Monitor Block must distinguish turn-firing stdout from non-firing stderr" + ) + + +class TestMonitorBlockSingleFileInbox: + """F1 invariant: inbox is single JSON file, byte-grow via wc -c. + + Path is parametric (`{agent_name}`), NOT hardcoded `team-lead.json` — + the same Monitor body is used for both lead and teammate sessions. + """ + + def test_wc_byte_grow_detection(self, skill_body: str): + assert "wc -c" in skill_body, ( + "Monitor Block must use `wc -c` for byte-grow detection (F1 invariant)" + ) + + def test_inbox_path_is_parametric_with_agent_name(self, skill_body: str): + # The Monitor block must carry the parametric path token, not the + # rev-3 hardcoded lead path. Symmetric scope (§15) requires this. + assert "inboxes/{agent_name}.json" in skill_body, ( + "Inbox path must interpolate {agent_name} (covers both lead and teammate)" + ) + + +class TestStateFileSchemaThreeFields: + """`## WriteStateFile Block` schema: exactly v, monitor_task_id, armed_at.""" + + def test_v_field_present(self, skill_body: str): + assert '"v":' in skill_body or "'v':" in skill_body, ( + "WriteStateFile Block schema must declare v field" + ) + + def test_monitor_task_id_field_present(self, skill_body: str): + assert "monitor_task_id" in skill_body, ( + "WriteStateFile Block schema must declare monitor_task_id field" + ) + + def test_armed_at_field_present(self, skill_body: str): + assert "armed_at" in skill_body, ( + "WriteStateFile Block schema must declare armed_at field" + ) + + def test_per_agent_state_filename_token(self, skill_body: str): + # Per-agent suffix lives in the FILENAME, not the schema. + assert "inbox-wake-state-{agent_name}.json" in skill_body, ( + "STATE_FILE filename must interpolate {agent_name} (per-agent suffix)" + ) + + +class TestLongToolEmpiricalAnchor: + """`## Failure Modes` contains §12.b empirical-timing tokens. + + The empirical anchor (00:01:34Z send / INBOX_GREW fired during sleep / + 00:02:23Z tool return) makes the scope claim observable, not just asserted. + Without these tokens, an editing LLM could remove the scope claim and + silently overpromise mid-tool interrupt. + """ + + def test_long_tool_failure_mode_header_present(self, skill_body: str): + # Anchor on the section's distinguishing phrase, not the full sentence. + assert "Long single-tool calls block wake delivery" in skill_body, ( + "Failure Modes must contain the long-single-tool-blocks-wake entry" + ) + + def test_empirical_timing_tokens_present(self, skill_body: str): + # Both peer-send and tool-return timestamps anchor the empirical claim. + assert "00:01:34Z" in skill_body, ( + "Empirical timing token 00:01:34Z (peer send) must appear" + ) + assert "00:02:23Z" in skill_body, ( + "Empirical timing token 00:02:23Z (tool return) must appear" + ) + + +class TestAtomicRenameWritePattern: + """Atomic-rename token presence in WriteStateFile Block prose/pseudocode.""" + + def test_atomic_rename_token(self, skill_body: str): + # `os.replace` or "atomic rename" or `.tmp` + rename — anchor on any. + anchors = ["os.replace", "atomic rename", "atomic-rename"] + assert any(a in skill_body for a in anchors), ( + "WriteStateFile Block must use atomic-rename pattern (os.replace / atomic rename)" + ) + + +class TestSymmetricScopeArmCoversLeadAndTeammate: + """Arm prose explicitly states symmetric scope: BOTH lead and teammate. + + Per architect §15: one skill, two invocation sites. The `## Operations` + Arm subsection must mention both roles so an editing LLM cannot accidentally + fork the skill into lead-only or teammate-only. + """ + + def test_arm_subsection_mentions_lead_and_teammate(self, skill_body: str): + # Slice the Arm subsection. Anchor on the role tokens. + match = re.search( + r"^###\s+Arm\b(.*?)(?=^###\s+|^##\s+)", + skill_body, + re.MULTILINE | re.DOTALL, + ) + assert match is not None, "### Arm subsection must be present" + arm_section = match.group(1).lower() + assert "team-lead" in arm_section or "lead" in arm_section, ( + "Arm subsection must reference the lead role (symmetric scope)" + ) + assert "teammate" in arm_section, ( + "Arm subsection must reference the teammate role (symmetric scope)" + ) diff --git a/pact-plugin/tests/test_inbox_wake_teardown_callsites.py b/pact-plugin/tests/test_inbox_wake_teardown_callsites.py new file mode 100644 index 00000000..49ab3e20 --- /dev/null +++ b/pact-plugin/tests/test_inbox_wake_teardown_callsites.py @@ -0,0 +1,131 @@ +""" +Structural-pattern tests for Teardown invocation callsites. + +Per architect §3 + §11: the Teardown operation is invoked from four +callsites (3 lead-side commands + 1 teammate-side skill section). One +command (/imPACT) is the explicit NEGATIVE invariant — none of imPACT's +six outcomes warrant lead-side teardown (continue work, or escalate to +user), so the absence of an invocation is by design and must be fenced +against accidental re-introduction. + +Phantom-green mitigation: assertions match short semantic anchors +(skill slug + operation token), not full sentences. +""" +from pathlib import Path + +import pytest + + +PLUGIN_ROOT = Path(__file__).parent.parent +COMMANDS_DIR = PLUGIN_ROOT / "commands" +PACT_AGENT_TEAMS_SKILL = ( + PLUGIN_ROOT / "skills" / "pact-agent-teams" / "SKILL.md" +) + +WRAP_UP_PATH = COMMANDS_DIR / "wrap-up.md" +PAUSE_PATH = COMMANDS_DIR / "pause.md" +IMPACT_PATH = COMMANDS_DIR / "imPACT.md" + +WAKE_SKILL_SLUG = 'Skill("PACT:inbox-wake")' +TEARDOWN_TOKEN = "Teardown" + + +def _read(path: Path) -> str: + assert path.is_file(), f"Required callsite file missing: {path}" + return path.read_text(encoding="utf-8") + + +class TestLeadSideCallsitesPresent: + """/wrap-up and /pause must invoke the wake-skill Teardown. + + These commands tear down the lead session cleanly, so the lead's Monitor + is stopped and the registry sidecar removed before the session exits. + """ + + @pytest.mark.parametrize( + "command_path", + [WRAP_UP_PATH, PAUSE_PATH], + ids=lambda p: p.name, + ) + def test_lead_command_invokes_wake_teardown(self, command_path: Path): + body = _read(command_path) + assert WAKE_SKILL_SLUG in body, ( + f"{command_path.name} must invoke {WAKE_SKILL_SLUG}" + ) + assert TEARDOWN_TOKEN in body, ( + f"{command_path.name} must reference the Teardown operation" + ) + + +class TestImpactNegativeInvariant: + """/imPACT must NOT invoke the wake-skill Teardown. + + Architectural fence: imPACT outcomes are continue-work or escalate-to-user + (no session shutdown). Re-introducing a Teardown call would prematurely + stop the Monitor while the lead is still active. This negative invariant + catches any accidental copy-paste from /wrap-up or /pause. + """ + + def test_impact_does_not_invoke_wake_skill(self): + body = _read(IMPACT_PATH) + assert WAKE_SKILL_SLUG not in body, ( + f"{IMPACT_PATH.name} must NOT invoke {WAKE_SKILL_SLUG} — " + "imPACT does not shut down the session, so Teardown would prematurely " + "stop the Monitor while the lead is still active" + ) + + +class TestTeammateSideShutdownInvocation: + """pact-agent-teams §Shutdown must instruct teammates to Teardown before approving. + + Per architect §15.3: the agent-side Teardown is the ONLY mechanism that + can call TaskStop on the teammate's Monitor (hooks cannot reach + agent-runtime tools). The shutdown_request flow is the natural insertion + point — there is no equivalent of /wrap-up for teammates. + """ + + def test_pact_agent_teams_skill_exists(self): + assert PACT_AGENT_TEAMS_SKILL.is_file(), ( + f"pact-agent-teams skill body missing at {PACT_AGENT_TEAMS_SKILL}" + ) + + def test_shutdown_section_invokes_wake_teardown(self): + body = _read(PACT_AGENT_TEAMS_SKILL) + # Slice out the ## Shutdown section. Anchor on the header. + assert "## Shutdown" in body, ( + "pact-agent-teams must contain a ## Shutdown section" + ) + shutdown_idx = body.index("## Shutdown") + shutdown_section = body[shutdown_idx:] + # Bound the section to the next top-level header if present. + for next_header in ("\n## ", "\n# "): + next_idx = shutdown_section.find(next_header, len("## Shutdown")) + if next_idx > 0: + shutdown_section = shutdown_section[:next_idx] + break + assert WAKE_SKILL_SLUG in shutdown_section, ( + "## Shutdown must invoke Skill(\"PACT:inbox-wake\")" + ) + assert TEARDOWN_TOKEN in shutdown_section, ( + "## Shutdown must reference the Teardown operation" + ) + + def test_shutdown_section_uses_before_approving_timing(self): + """Timing prose anchors the agent-vs-hook capability asymmetry. + + The Teardown must run BEFORE the teammate approves shutdown_request — + once approved, the agent's process terminates and TaskStop is no + longer reachable. Per architect §15.3 audit annotation. + """ + body = _read(PACT_AGENT_TEAMS_SKILL) + shutdown_idx = body.index("## Shutdown") + shutdown_section = body[shutdown_idx:] + for next_header in ("\n## ", "\n# "): + next_idx = shutdown_section.find(next_header, len("## Shutdown")) + if next_idx > 0: + shutdown_section = shutdown_section[:next_idx] + break + assert "before approving" in shutdown_section.lower(), ( + "## Shutdown must specify the Teardown runs BEFORE approving " + "shutdown_request — TaskStop becomes unreachable after process termination" + ) diff --git a/pact-plugin/tests/test_peer_inject_wake_directive.py b/pact-plugin/tests/test_peer_inject_wake_directive.py new file mode 100644 index 00000000..afc4a53e --- /dev/null +++ b/pact-plugin/tests/test_peer_inject_wake_directive.py @@ -0,0 +1,97 @@ +""" +Hook-side tests for the wake-arm directive emitted by peer_inject.py. + +The teammate-side wake-arm directive is appended to additionalContext on +SubagentStart per architect §15.2 — Tier-0 hook delivery so the directive +is durable across compaction and bypasses the Read-tracker budget. + +These tests focus on semantic-anchor invariants (skill slug, operation +name, agent_name interpolation, timing-gap-closure phrase distinct from +the lead-side directive). Existing tests/test_peer_inject.py already +covers chain-end positioning of _WAKE_ARM_TEMPLATE relative to the +completion-authority note across all spawnable pact-* roles; this file +fills the semantic-content gap without duplicating those positional +assertions. +""" +import sys +from pathlib import Path + +import pytest + + +sys.path.insert(0, str(Path(__file__).parent.parent / "hooks")) + + +@pytest.fixture(scope="module") +def wake_arm_template() -> str: + from peer_inject import _WAKE_ARM_TEMPLATE + return _WAKE_ARM_TEMPLATE + + +class TestWakeArmTemplateSemanticAnchors: + """Verbatim tokens that must appear in the rendered template.""" + + def test_template_references_inbox_wake_skill_slug(self, wake_arm_template: str): + assert 'Skill("PACT:inbox-wake")' in wake_arm_template, ( + "Teammate-side directive must reference exact slug Skill(\"PACT:inbox-wake\")" + ) + + def test_template_references_arm_operation(self, wake_arm_template: str): + assert "Arm operation" in wake_arm_template, ( + "Teammate-side directive must reference the Arm operation by name" + ) + + def test_template_carries_teammate_side_timing_phrase(self, wake_arm_template: str): + # Teammate-side timing is "before any tool call" — distinct from + # lead-side "before any teammate dispatch". Teammates don't dispatch + # other teammates, but they DO issue tool calls; the wake must be + # armed before the first one. This anchor prevents copy-paste of + # the lead template into the teammate site. + assert "before any tool call" in wake_arm_template, ( + "Teammate-side directive must use 'before any tool call' timing phrase" + ) + + def test_template_does_not_use_lead_side_timing_phrase(self, wake_arm_template: str): + # Negative anchor: ensures lead/teammate timing phrases stay distinct. + assert "before any teammate dispatch" not in wake_arm_template, ( + "Teammate-side directive must NOT carry the lead-side " + "'before any teammate dispatch' phrase — teammates don't dispatch teammates" + ) + + def test_template_carries_idempotency_phrase(self, wake_arm_template: str): + assert "idempotent" in wake_arm_template.lower(), ( + "Teammate-side directive must carry an idempotency clause — " + "guards against LLM-self-diagnosis re-introduction" + ) + + +class TestWakeArmAgentNameInterpolation: + """The directive must parametrize on agent_name so each teammate watches its own inbox.""" + + def test_template_contains_agent_name_placeholder(self, wake_arm_template: str): + # The unrendered template must carry the {agent_name} placeholder so + # the call site can interpolate the spawning teammate's name. + assert "{agent_name}" in wake_arm_template, ( + "Template must contain {agent_name} placeholder for per-teammate interpolation" + ) + + def test_rendered_template_substitutes_agent_name(self, wake_arm_template: str): + rendered = wake_arm_template.format(agent_name="architect") + assert "{agent_name}" not in rendered, ( + "Rendered template must not retain unsubstituted {agent_name} placeholder" + ) + assert "architect" in rendered, ( + "Rendered template must contain the substituted agent_name value" + ) + + def test_rendered_template_carries_distinct_agent_name(self, wake_arm_template: str): + # Two different agent names yield two different rendered strings — + # confirms the interpolation site is load-bearing, not decorative. + a = wake_arm_template.format(agent_name="architect") + b = wake_arm_template.format(agent_name="preparer") + assert a != b, ( + "Rendered templates for different agent names must differ — " + "agent_name interpolation must be in a content-bearing position" + ) + assert "architect" in a and "architect" not in b + assert "preparer" in b and "preparer" not in a diff --git a/pact-plugin/tests/test_session_end_wake_glob.py b/pact-plugin/tests/test_session_end_wake_glob.py new file mode 100644 index 00000000..169370ec --- /dev/null +++ b/pact-plugin/tests/test_session_end_wake_glob.py @@ -0,0 +1,95 @@ +""" +Hook-side test for session_end.cleanup_wake_registry()'s glob behavior. + +Per architect §15.4: the helper globs `inbox-wake-state-*.json` (not a +hardcoded single filename) so a single force-termination cleanup pass +removes the lead's STATE_FILE plus every teammate's STATE_FILE that +wasn't reached by their respective ## Shutdown Teardown invocation. + +Path-traversal discipline (§9): is_safe_path_component(team_name) + +team_dir.relative_to(teams_root) gate the glob; the glob result inherits +the validation transitively. +""" +import sys +from pathlib import Path + +import pytest + + +sys.path.insert(0, str(Path(__file__).parent.parent / "hooks")) + + +@pytest.fixture +def teams_root(tmp_path: Path, monkeypatch) -> Path: + """Point Path.home() at tmp_path and return the teams root directory.""" + monkeypatch.setattr(Path, "home", lambda: tmp_path) + root = tmp_path / ".claude" / "teams" + root.mkdir(parents=True) + return root + + +class TestCleanupWakeRegistryGlob: + """The helper unlinks every per-agent inbox-wake-state sidecar in one pass.""" + + def test_unlinks_lead_and_multiple_teammate_sidecars(self, teams_root: Path): + """Lead + several teammate STATE_FILEs all match the glob and are unlinked.""" + from session_end import cleanup_wake_registry + + team_name = "pact-test01" + team_dir = teams_root / team_name + team_dir.mkdir() + + sidecars = [ + team_dir / "inbox-wake-state-team-lead.json", + team_dir / "inbox-wake-state-architect.json", + team_dir / "inbox-wake-state-preparer.json", + team_dir / "inbox-wake-state-test-engineer.json", + ] + for s in sidecars: + s.write_text('{"v": 1, "monitor_task_id": "abc", "armed_at": "2026-04-30T00:00:00Z"}') + + cleanup_wake_registry(team_name) + + for s in sidecars: + assert not s.exists(), ( + f"cleanup_wake_registry must unlink {s.name} via glob pattern" + ) + + def test_does_not_unlink_unrelated_files_in_team_dir(self, teams_root: Path): + """Glob is `inbox-wake-state-*.json` only — siblings remain untouched.""" + from session_end import cleanup_wake_registry + + team_name = "pact-test02" + team_dir = teams_root / team_name + team_dir.mkdir() + + # Wake-state sidecar (should be unlinked) + unrelated team artifacts. + wake_state = team_dir / "inbox-wake-state-team-lead.json" + wake_state.write_text("{}") + (team_dir / "config.json").write_text("{}") + (team_dir / "inboxes").mkdir() + (team_dir / "inboxes" / "team-lead.json").write_text("[]") + (team_dir / "tasks.json").write_text("[]") + + cleanup_wake_registry(team_name) + + assert not wake_state.exists(), "Wake-state sidecar must be unlinked" + assert (team_dir / "config.json").exists(), "Glob must not match config.json" + assert (team_dir / "inboxes" / "team-lead.json").exists(), ( + "Glob must not descend into inboxes/ directory" + ) + assert (team_dir / "tasks.json").exists(), "Glob must not match tasks.json" + + def test_no_state_files_is_a_no_op(self, teams_root: Path): + """Empty team dir: glob yields no matches, helper returns cleanly.""" + from session_end import cleanup_wake_registry + + team_name = "pact-test03" + team_dir = teams_root / team_name + team_dir.mkdir() + + # Should not raise even when there's nothing to unlink. + cleanup_wake_registry(team_name) + + # Team dir intact. + assert team_dir.is_dir() diff --git a/pact-plugin/tests/test_session_init_wake_directive.py b/pact-plugin/tests/test_session_init_wake_directive.py new file mode 100644 index 00000000..b26d92fe --- /dev/null +++ b/pact-plugin/tests/test_session_init_wake_directive.py @@ -0,0 +1,116 @@ +""" +Hook-side tests for the wake-arm directive emitted by session_init.py. + +The lead-side wake-arm directive is appended to additionalContext on every +SessionStart fire (startup / resume / clear / compact) per #444's +"hook-emitted directives: unconditional > conditional" discipline. These +tests assert the directive's verbatim presence and idempotency across all +four sources. + +Phantom-green mitigation: assertions use semantic anchors (skill slug, +operation name, timing-gap-closure phrase, idempotency phrase), not the +full sentence — an editing LLM reformatting whitespace must still pass. +""" +import io +import json +import sys +from pathlib import Path +from unittest.mock import patch + +import pytest + + +sys.path.insert(0, str(Path(__file__).parent.parent / "hooks")) + + +def _run_session_init_main(monkeypatch, tmp_path: Path, source: str) -> str: + """Invoke session_init.main() with the given SessionStart source value. + + Returns the additionalContext string from the hook output. Mocks all + side-effecting helpers so the hook runs deterministically against + tmp_path as $HOME. + """ + from session_init import main + + monkeypatch.setenv("CLAUDE_PROJECT_DIR", "/Users/example/Sites/test-project") + monkeypatch.setattr(Path, "home", lambda: tmp_path) + + stdin_data = json.dumps({ + "session_id": "aabb1122-0000-0000-0000-000000000000", + "source": source, + }) + + with patch("session_init.setup_plugin_symlinks", return_value=None), \ + patch("session_init.remove_stale_kernel_block", return_value=None), \ + patch("session_init.update_pact_routing", return_value=None), \ + patch("session_init.ensure_project_memory_md", return_value=None), \ + patch("session_init.check_pinned_staleness", return_value=None), \ + patch("session_init.update_session_info", return_value=None), \ + patch("session_init.get_task_list", return_value=None), \ + patch("session_init.restore_last_session", return_value=None), \ + patch("session_init.check_paused_state", return_value=None), \ + patch("sys.stdin", io.StringIO(stdin_data)), \ + patch("sys.stdout", new_callable=io.StringIO) as mock_stdout: + with pytest.raises(SystemExit) as exc_info: + main() + + assert exc_info.value.code == 0 + output = json.loads(mock_stdout.getvalue()) + return output["hookSpecificOutput"]["additionalContext"] + + +class TestWakeArmDirectiveUnconditional: + """Directive emits on EVERY SessionStart source — no LLM-self-diagnosis gate. + + Per #444 working-memory entry: hook-emitted directives use unconditional + wording. Conditional emission ("if not loaded") requires LLM self-diagnosis, + which is the failure mode the discipline closes. + """ + + @pytest.mark.parametrize("source", ["startup", "resume", "clear", "compact"]) + def test_directive_emitted_for_source(self, source: str, monkeypatch, tmp_path): + additional = _run_session_init_main(monkeypatch, tmp_path, source) + assert 'Arm wake mechanism: invoke Skill("PACT:inbox-wake")' in additional, ( + f"Wake-arm directive missing for source={source!r} — emission must be unconditional" + ) + + +class TestWakeArmDirectiveSemanticAnchors: + """Directive carries the load-bearing tokens. + + Each anchor protects against a specific drift: + - skill slug: prevents rename-without-callsite-update + - 'Arm' operation: prevents drift to a different operation name + - 'before any teammate dispatch': lead-side timing-gap-closure (distinct + from teammate-side 'before any tool call') + - 'idempotent': prevents an editing LLM from adding a self-diagnosis guard + """ + + def test_directive_references_inbox_wake_skill_slug(self, monkeypatch, tmp_path): + additional = _run_session_init_main(monkeypatch, tmp_path, "startup") + assert 'Skill("PACT:inbox-wake")' in additional, ( + "Directive must reference exact skill slug Skill(\"PACT:inbox-wake\")" + ) + + def test_directive_references_arm_operation(self, monkeypatch, tmp_path): + additional = _run_session_init_main(monkeypatch, tmp_path, "startup") + assert "Arm operation" in additional, ( + "Directive must reference the Arm operation by name" + ) + + def test_directive_carries_lead_side_timing_phrase(self, monkeypatch, tmp_path): + # Lead-side timing is "before any teammate dispatch" — distinct from + # teammate-side "before any tool call". This anchor prevents copy-paste + # of the teammate template into the lead site. + additional = _run_session_init_main(monkeypatch, tmp_path, "startup") + assert "before any teammate dispatch" in additional, ( + "Lead-side directive must use 'before any teammate dispatch' timing phrase" + ) + + def test_directive_carries_idempotency_phrase(self, monkeypatch, tmp_path): + additional = _run_session_init_main(monkeypatch, tmp_path, "startup") + # Anchor on the load-bearing word; the surrounding sentence may vary. + assert "idempotent" in additional.lower(), ( + "Directive must carry an idempotency clause — guards against " + "LLM-self-diagnosis re-introduction" + ) From 0a03f6dd20b29cbdfd44f6c0551d9d6e75732892 Mon Sep 17 00:00:00 2001 From: michael-wojcik <5386199+michael-wojcik@users.noreply.github.com> Date: Thu, 30 Apr 2026 01:19:45 -0400 Subject: [PATCH 6/6] Tighten alarm-clock paragraph: no acknowledgment text on wake MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Make the no-narration discipline explicit in the §Overview alarm-clock paragraph and audit annotation. Empirically observed in dogfood: even an agent with the wait-in-silence feedback memory loaded drifts into emitting "(Alarm.)" / "(Idle ping.)" turn-fillers on INBOX_GREW. Implicit "end the turn and return to idle" doesn't catch it; explicit "without emitting acknowledgment text or narrating the wake event" does. Audit annotation cites the failure mode and its empirical surfacing. --- pact-plugin/skills/inbox-wake/SKILL.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/pact-plugin/skills/inbox-wake/SKILL.md b/pact-plugin/skills/inbox-wake/SKILL.md index b01f0800..e00e3a5c 100644 --- a/pact-plugin/skills/inbox-wake/SKILL.md +++ b/pact-plugin/skills/inbox-wake/SKILL.md @@ -14,7 +14,7 @@ Per-agent wake mechanism for PACT teams: a single Monitor task per agent watches ## Overview -> **Monitor is an alarm clock, not a mailbox.** On `INBOX_GREW`, end the turn and return to idle — the platform's idle-delivery is the channel-of-record for content. Never read the inbox file or parse the wake's stdout payload yourself. +> **Monitor is an alarm clock, not a mailbox.** On `INBOX_GREW`, end the turn and return to idle without emitting acknowledgment text or narrating the wake event — the platform's idle-delivery is the channel-of-record for content. Never read the inbox file or parse the wake's stdout payload yourself. > > **Wake surfaces between tool calls within a turn, not mid-tool.** Monitor's `INBOX_GREW` emit cannot interrupt a single in-flight tool call. The platform queues `INBOX_GREW` events that fire during a long-running tool and delivers them when the tool returns, bundled with the tool's result. The wake mechanism's promise is "messages surface between tool calls within a turn," NOT "instant interrupt anywhere." For multi-tool turns the wake reliably opens the poller-gate between tools; for single long tools (e.g., a 90-second blocking sleep) the agent is effectively unwakeable until the tool returns. @@ -22,7 +22,7 @@ Problem this solves: during long-running operations, the platform's `useInboxPol Single-Monitor model, no in-session watchdog. Lifetime is session-scoped per agent. Inbox path is a single JSON file (`inboxes/{agent-name}.json`), not a directory. -**Audit**: both alarm-clock paragraphs are non-negotiable. The first prevents an editing LLM from writing "parse the wake stdout to extract content" — wake is signal, not content. The second prevents an editing LLM from inferring mid-tool interrupt from "wake on inbox grow" — the substrate's actual capability is between-tool, not anywhere. Removing either paragraph silently overpromises the mechanism. +**Audit**: both alarm-clock paragraphs are non-negotiable. The first prevents two failure modes: (a) an editing LLM writing "parse the wake stdout to extract content" — wake is signal, not content; and (b) the woken agent emitting acknowledgment text like "(Alarm.)" or "(Idle ping.)" instead of returning to silent idle — empirically observed even with the wait-in-silence feedback memory loaded, so the no-narration clause must be explicit in the principle anchor itself. The second prevents an editing LLM from inferring mid-tool interrupt from "wake on inbox grow" — the substrate's actual capability is between-tool, not anywhere. Removing either paragraph silently overpromises the mechanism. ## When to Invoke