Add inbox-wake skill (lead-only) + 20/20/60 quiet-window + PostToolUse lifecycle by michael-wojcik · Pull Request #603 · Synaptic-Labs-AI/PACT-Plugin

michael-wojcik · 2026-04-30T15:48:14Z

Summary

Adds pact-plugin/skills/inbox-wake/SKILL.md as the lead-only canonical SSOT for the wake mechanism, with a 20/20/60 quiet-window state machine in the Monitor Block to coalesce wake-fire inflation under bursty traffic
Adds PostToolUse-driven lifecycle automation (wake_lifecycle_emitter.py + shared/wake_lifecycle.py) that arms Monitor on first active task and tears it down on last active task; session_init.py closes the resume-with-active-tasks gap; session_end.py adds cleanup_wake_registry with path-traversal defense
Parallel safety net via commands/{wrap-up,pause,imPACT}.md Teardown invocations + charter §Wake Mechanism cross-ref + operational runbook + plugin version bump 3.20.4 → 3.21.0
138 file-parsing structural tests across 7 files, all green; counter-test-by-revert verified on 4 load-bearing assertions

Replaces #602 (symmetric scope) with a lead-only redesign authored fresh on a new branch from main. Empirical Phase 0 routing probe (3 rounds) established that PostToolUse with matcher TaskCreate|TaskUpdate|Task|Agent is the only viable channel for hook-emitted Arm/Teardown directives — TaskCreated/TaskCompleted/SubagentStop do not support hookSpecificOutput.additionalContext per platform schema. Plan: docs/plans/591-inbox-wake-skill-lead-only-plan.md (local, gitignored).

Test plan

pytest pact-plugin/tests/test_inbox_wake_*.py — 138/138 pass
Counter-test-by-revert cardinality verified (F6 phrase, POLL=20, glob-in-cleanup, emitter slug)
Live in-session validation: Monitor bu4hxc2bh armed with the canonical 20/20/60 block produced clean FIRST_GROW + LAST_GROW per burst; MAX_DELAY ceiling fired once on sustained traffic
Multi-reviewer peer review (architect + test-engineer + backend-coder + security-engineer)

Authors pact-plugin/skills/inbox-wake/SKILL.md as the lead-only canonical reference for the Monitor-arm/teardown wake mechanism. 9 sections per architect rev5 §5; alarm-clock framing with no-narration tightening; 20/20/60 quiet-window state machine (PENDING/GROWING with FIRST_GROW + LAST_GROW + MAX_DELAY edge emits) coalesces wake-fire inflation under bursty traffic; F1/F6/F7 invariants; Failure Modes including the new wake-fire-inflation anti-pattern anchored to the 20/20/60 design.

Adds the PostToolUse-driven lifecycle automation that arms and tears down the inbox-wake Monitor based on active-task transitions. New pact-plugin/hooks/shared/wake_lifecycle.py provides count_active_tasks() and the _lifecycle_relevant predicate that reuses SELF_COMPLETE_EXEMPT_AGENTS from shared/intentional_wait.py to filter out signal-tasks (auditor signal/blocker/algedonic) and self-complete exempt agents. New pact-plugin/hooks/wake_lifecycle_emitter.py is a PostToolUse hook with matcher TaskCreate|TaskUpdate|Task|Agent that detects 0->1 transitions (emits Arm directive) and 1->0 transitions (emits Teardown directive). JSON output includes the required hookEventName field per platform schema. session_init.py adds Option-C resume-with-active-tasks gap closure with unconditional directive prose. session_end.py adds cleanup_wake_registry helper with path-traversal defense (is_safe_path_component + resolve+relative_to + Path.unlink missing_ok=True wrapped in try/except OSError) and single-file unlink for the lead-only inbox-wake-state.json STATE_FILE.

Adds the parallel safety-net layer for the inbox-wake mechanism: charter §Wake Mechanism subsection in pact-communication-charter.md cross-refs the canonical skill via slug-link; commands/wrap-up.md, commands/pause.md, and the imPACT.md force-termination branch invoke the skill's Teardown operation so Monitor stops cleanly on session-end paths even if the PostToolUse 1->0 transition is missed. New pact-plugin/tests/runbooks/ 591-inbox-wake.md documents Arm/Teardown lifecycle, 20/20/60 quiet- window verification (FIRST_GROW + LAST_GROW + MAX_DELAY edges), failure modes, and STATE_FILE inspection. Plugin version bumps minor: 3.20.4 -> 3.21.0 (new capability category) across plugin.json, marketplace.json, README.md, and pact-plugin/README.md.

Adds file-parsing structural tests covering the inbox-wake lead-only implementation across all three Wave commits. 138 invariants split across 7 files: skill structure (47), lifecycle emitter (16), lifecycle helper (30), session_init (8), session_end cleanup (20), callsites (9), and version bump (8). Counter-test-by-revert verified cardinality on 4 load-bearing assertions (F6 phrase, POLL=20 timing, glob-in-cleanup, emitter slug). No skill execution or live agent loop; all assertions parse files directly. Documents the count_active_tasks pure-never-raises property structurally so the redundant try/except guard at session_init.py:728-730 can be removed in a future cleanup.

Charter §Wake Mechanism collapses to 5 slug-link bullets pointing at the canonical SKILL.md anchors instead of restating the alarm-clock framing, between-tool-call scope claim, and Arm/Teardown trigger sites. SKILL.md remains the single drift anchor for those load-bearing prose blocks. imPACT force-termination Teardown drops the "Skip if other active teammate tasks remain" guard. The conditional required LLM-self-diagnosis of active-task state and was stale under the lead-only design (only one Monitor exists per session). Symmetric with /wrap-up and /pause: the skill's Teardown is idempotent, so unconditional invocation is safe.

A1: drops the hypothetical_pre reverse-derivation in wake_lifecycle_emitter.py. The previous logic only reverted status when deciding pre-state, so a TaskUpdate that bundled status=completed with an owner or metadata flip in the same call could suppress a legitimate Teardown emit. The simplified _decide_directive uses post-only detection: any TaskUpdate(status=completed) at post=0 emits Teardown. Skill's Teardown is idempotent, so over-eager invocations no-op when no STATE_FILE is on disk. _read_task_from_disk and the _lifecycle_relevant import are removed; emitter shrinks by ~46 lines. A2: adds is_safe_path_component(team_name) gate at count_active_tasks entry in shared/wake_lifecycle.py, mirroring the cleanup_wake_registry defense in session_end.py. team_name is producer-side canonicalized at session_init, so this is belt-and-suspenders defense-in-depth covering the failure mode where a poisoned session context reaches the helper. B2: pure-never-raises property pinned for _lifecycle_relevant on adversarial task shapes (~30 cells, sweep status/owner/metadata combos). This closes the gate documented at fb6fa435 — once both count_active_tasks and _lifecycle_relevant are structurally pinned never-raise, the redundant try/except guard wrapping count_active_tasks in session_init.py can be safely removed in a follow-up. B3: literal Arm/Teardown precondition phrases pinned in emitter tests ("First active teammate task created", "Last active teammate task completed"), symmetric with the directive-prose pins already in session_init coverage. B4: behavioral pins for cleanup_wake_registry callsite in session_end.py and the session_init Arm-emit gate (subprocess + synthesized stdin). Closes the false-RED-prone gap where structural-only pins would pass through a benign refactor that broke runtime behavior. Existing test test_no_teardown_on_completion_of_signal_task replaced with test_teardown_emits_on_signal_task_completion_at_post_zero documenting the new contract. 173 inbox-wake tests pass; full suite 7165 pass.

The unconditional Teardown invocation in commands/pause.md and the force-termination branch of commands/imPACT.md tore down the lead's Monitor when other teammates were still active. Force-terminating teammate A while B and C remained in flight dropped the active count from 3 to 2; the PostToolUse hook correctly skipped Teardown (post>0), but the LLM-invoked callsite ran it anyway, unlinking STATE_FILE and stopping the Monitor that B and C still depended on. The lead then fell back to baseline InboxPoller — the exact failure mode #591 was designed to prevent. The PostToolUse hook handles the 1->0 transition authoritatively for both /pause and /imPACT, so the callsites were redundant in the common case and actively harmful in the active-siblings case. The /wrap-up callsite stays: it fires after every task is already completed (count is 0), so the over-eager Teardown is harmless and provides a hook-silent-fail safety net. SKILL.md frontmatter description and §When to Invoke table narrow to the /wrap-up callsite. Charter §Wake Mechanism bullet 4 reframes around the same. test_inbox_wake_callsites.py replaces the parametrized assertion with a /wrap-up-only positive assertion plus two negative assertions that imPACT.md and pause.md must not invoke the skill — guards against future re-introduction without re-deriving the per-callsite all-paths-safe invariant.

…cing Empirical dogfood data from the previous session showed 20/20/60 fired roughly one wake per minute under teammate-heavy phases (5+ MAX_DELAY ceiling fires across ~100 minutes), producing visible noise on every reviewer-message burst. The 20/20/60 ratios used QUIET_REQUIRED = POLL, so a single quiet poll cycle was enough to declare a burst over — multi-message reviewer batches with internal 30-50s sub-pauses fragmented into multiple FIRST_GROW + LAST_GROW pairs. The new 30/60/120 ratios are POLL=30, QUIET_REQUIRED=60 (= 2*POLL), and MAX_DELAY=120 (= 2*QUIET_REQUIRED, equivalently 4*POLL). Two consecutive quiet poll cycles must pass before LAST_GROW fires, so a typical reviewer batch coalesces into one logical burst. The same batch that 20/20/60 fragmented into ~3 pairs (6 fires) now coalesces to ~1 pair plus an optional MAX_DELAY ceiling (2-3 fires). Single-message responsiveness is preserved at FIRST_GROW within ~30 s. Sustained-traffic ceiling at 120 s replaces the previous 60 s — in exchange for stronger coalescing during ordinary bursts. The audit annotation explicitly anchors both design ratios so future edits cannot collapse QUIET back to POLL ("ping-pong" regression) or shrink MAX_DELAY below 2*QUIET ("refragmenting coalesced bursts"). The constraints QUIET_REQUIRED >= POLL and MAX_DELAY >= 2*POLL are unchanged. Runbook also picks up cycle-2 drift cleanup: dropped stale references to /pause and /imPACT inbox-wake teardown callsites (removed in 8f17f1e), with an explicit audit anchor explaining why those commands deliberately do not invoke Teardown — they may run with active teammate tasks and the lead's Monitor must stay armed.

…ommands Splits the unified inbox-wake skill into two slash-typeable commands so the user can manually arm or unwatch the lead's inbox monitor for debug/recovery — particularly /PACT:unwatch-inbox to silence Monitor notifications mid-session if the wake fires become noisy. The PostToolUse hook continues to emit Arm and Teardown directives automatically on first/last active-task transitions; the directives now reference the command pair (/PACT:watch-inbox and /PACT:unwatch-inbox) instead of the unified skill. Audit anchor allocation: - /PACT:watch-inbox owns alarm-clock framing, 30/60/120 dual-ratio audit (QUIET = 2*POLL + MAX_DELAY = 2*QUIET), F1 single-file inbox, F7 stdout discipline, Monitor block, WriteStateFile block, and the monitor-side failure modes (silent death, long-tool-blocks-wake, wake-fire inflation, malformed STATE_FILE, concurrent re-arm). - /PACT:unwatch-inbox owns F6 (TaskStop tolerates not-found), the best-effort/opportunistic framing, Teardown ordering, and the torn-down-session-may-have-lost-Monitor failure mode. Each command cross-links the other under ## References, so an LLM loading either half can reach the partner if needed. Test files split symmetrically: structural assertions for each command live in test_watch_inbox_command_structure.py and test_unwatch_inbox_command_structure.py, with the original test_inbox_wake_skill_structure.py removed. Mechanism tests (test_inbox_wake_lifecycle_emitter.py, test_inbox_wake_session_init.py, test_inbox_wake_callsites.py) keep the inbox_wake_ prefix as a mechanism-level grouping. Negative assertions in test_inbox_wake_callsites.py forbid all three slugs (legacy inbox-wake plus watch-inbox and unwatch-inbox) in commands/imPACT.md and commands/pause.md as defense-in-depth against regression of the F3-followup learning. 186 inbox-wake tests pass.

Removes dated session telemetry, migration labels, and review-cycle finding refs from committed plugin agent-consumed docs per the pinned "current-state instructions only" axiom: watch-inbox.md drops the Option-C resume-gap-closure label, the silent-Monitor-death blockquote's verbatim 2026-04-29 timestamp + monitor_task_id anchor, the audit anchor citing those same dated literals, and the long-single-tool blockquote's 2026-04-30 session-test telemetry. wake_lifecycle.py docstring drops the "per the review-security-engineer SE-1 finding" ref, citing the sibling cleanup_wake_registry pattern instead as current-state evidence. Audit anchors stay intact; their dated evidence sentences become present-tense rules. Surrounding instructions read coherently without the archaeology. PostToolUse hook matcher in hooks.json prunes from TaskCreate|TaskUpdate|Task|Agent to TaskCreate|TaskUpdate. The hook explicitly skipped Task and Agent at line 184 (the spawn tools don't change the active-task count), so the wider matcher forked the hook process for a no-op return on every teammate spawn. The line-184 skip-check stays as defensive belt-and-suspenders against future matcher widening. emitter module docstring + lifecycle-emitter test docstring + assertion + runbook prose all updated to the new pruned matcher. test_long_single_tool_failure_mode_with_empirical_anchor renames to test_long_single_tool_failure_mode_documented and asserts on the present-tense scope phrases ("between tool calls", "not mid-tool") instead of dated session timestamps. Counter-test-by-revert: each load-bearing assertion produces RED on revert.

Closes nine blind-round findings. watch-inbox / unwatch-inbox each gain a Lead-Session Guard (Step 0) that reads the team config's leadSessionId and refuses to execute when invoked from a teammate session. The Monitor watches a fixed inboxes/team-lead.json path; arming it from a non-lead session would fire the wake event in the wrong session and silently lose the signal. The audit anchor explains why leadSessionId is the canonical signal source (the session-context schema does not carry agent_type). unwatch-inbox's Teardown Block adds a task-id allowlist gate (^[a-z0-9]{6,}$) before TaskStop. A tampered STATE_FILE could otherwise specify an arbitrary task_id; the gate refuses TaskStop on malformed input but still proceeds to STATE_FILE unlink (corrupt registry cleanup is path-validated separately and independently safe). A one-line audit comment notes the documented TOCTOU window between resolve() and unlink(): same-user trust assumption bounds the residual risk, and a same-user attacker has equivalent capability via direct os.unlink anyway. Path-traversal defense is now centralized in task_utils.iter_team_task_jsons(team_name), which owns is_safe_path_component, resolve()+relative_to(tasks_root), and the dotfile-exclusion filter (Path.glob does not exclude dotfiles on POSIX, so the helper applies an explicit startswith filter). count_active_tasks becomes a one-line sum delegating to the helper, removing the inline iteration that had drifted from the cleanup-side defense. The PostToolUse emitter generalizes from a "completed-only" terminal check to a TERMINAL_STATUSES set that covers both "completed" and "deleted". A task transitioning to either terminal state at post==0 now emits the Teardown directive on the 1->0 transition; previously only status=completed triggered it. Tests pin five new structural invariants (Lead-Session Guard wording, task-id allowlist regex, TOCTOU audit comment, deleted-status terminal, shell-injection-vector absence in the Monitor block) plus tightened assertions on the silent-Monitor-death failure-mode wording and the dotfile-exclusion filter mechanism. Counter-test-by-revert verified on each load-bearing assertion. 198 inbox-wake tests pass. A blind-finding flagged the metadata-isinstance carve-out as unreachable; verification confirmed the gate is intentional (the existing isinstance check returns True on parse-failed metadata as a conservative count-the-task fallback rather than silently exempting it via downstream carve-outs). No code change.

Closes the blind-round-2 elevated findings. STATE_FILE schema extends from three fields to four. The new armed_by_session_id field records the lead's session_id at Arm time; the unwatch-inbox Teardown reads it back and validates the same-session invariant before invoking TaskStop. A concurrent same-user session (another worktree, another /PACT:orchestrate run) that planted a malicious STATE_FILE would have its session_id mismatched, so Teardown refuses TaskStop on the planted file. Unlink still proceeds because the path is allowlist-validated separately and the Monitor in the original lead session keeps running until that session ends. The PostToolUse emitter now gates on the same lead-session check before deciding any directive. Previously every teammate session's TaskCreate or TaskUpdate emitted a watch-inbox directive into the teammate's context (and a non-compliant teammate LLM running the Monitor body would silently take over the wake mechanism in the wrong session). Silent fail-open on non-lead sessions matches the livelock-class hook contract, so the false-positive emit is closed without surfacing a hook-error UI on every teammate Task-tool fire. The terminal-status check now runs before count_active_tasks, so metadata-only TaskUpdates (teachback_submit, intentional_wait, handoff, progress, memory_saved) no longer glob the team's task directory. On a thirty-task team that eliminates roughly nine thousand wasted reads per session. Documentation drift cleanup: the unwatch-inbox When-to-Invoke row, the runbook lifecycle table, and the related audit anchor now read "TaskUpdate with terminal status (completed or deleted)" in line with the cycle-6 emitter behaviour. The runbook §3.4 reference to test_inbox_wake_lifecycle.py is replaced with the actual split test files. The pre-prune Task/Agent fall-through comment block in wake_lifecycle_emitter.py is rewritten to describe current state. Tests pin the new four-field schema, the armed_by_session_id validation step, the lead-session emitter guard, the perf reorder (via mock.patch on the imported binding to assert call count), and the terminal-status doc wording. Counter-test-by-revert confirmed load-bearing on each. 208 inbox-wake tests pass.

…in-depth Closes the remaining blind-round-2 minors and futures. _lifecycle_relevant hoists the SELF_COMPLETE_EXEMPT_AGENTS owner check above the metadata-shape gate. A secretary-owned task with corrupted metadata previously short-circuited to the conservative-count branch and bypassed the owner carve-out, asymmetric with the sibling is_self_complete_exempt predicate; the new ordering restores symmetry. iter_team_task_jsons now skips per-file symlink-yielded entries inside the glob loop, closing the per-file escape vector that the directory- boundary defense did not cover. session_init.py drops the redundant try/except around count_active_tasks now that cycle 6 added the structural pure-never-raises pin (the cleanup-opportunity tracked through six earlier cycles is closed). The Tier-0 unconditional Arm directive picks up an explicit audit anchor naming the LLM-self-diagnosis anti-pattern so future edits cannot quietly reintroduce conditional emit logic. The wake-arm step label moves from 4d to 5_pre to match its physical position in the surrounding session_init flow. wake_lifecycle_emitter now caps stdin reads at 1MB before json.loads, so a malformed-or-malicious payload exits cleanly via the existing fail-open path instead of risking OOM on the hook process. unwatch-inbox tightens the monitor_task_id allowlist regex from ^[a-z0-9]{6,}$ to ^[a-z0-9]{6,}\Z so a newline-suffixed id can no longer slip through Python re's $ semantics. The audit anchor cites session_end.py::_UUID_PATTERN as the in-plugin precedent and retains one $-form literal as an explicit anti-pattern callout for future editors. The Lead-Session Guard sections in both watch-inbox and unwatch-inbox extend their audit anchors to document the guard as foot-gun protection (typo / wrong-window / cross-session-LLM speculation), not a security boundary against same-user adversaries - the user-local-trust assumption bounds the residual exposure. The concurrent-rearm Failure Mode tightens to "STATE_FILE points only at the winner; loser's Monitor has no registry pointer for any subsequent /PACT:unwatch-inbox to find" instead of the prior ambiguous "TaskStop only stops the winner" framing. The runbook drops the incorrect "(or TaskUpdate with owner assignment)" parenthetical so the documented Arm trigger matches the emitter's actual TaskCreate-only behaviour. Tests pin all the new structural and behavioural invariants. The Lead-Session Guard test tightens from substring presence to section-body extraction so an inline cross-ref reference no longer phantom-greens the assertion. The shell-injection negative-allowlist gains $(rm, $(python, <(, and >( as forbidden patterns. New counter-tests cover the owner-check hoist, per-file symlink skip, \Z regex anchor, and 1MB payload cap, with revert cardinality verified for each. 214 inbox-wake tests pass.

Remediation for three reviewer follow-ups accepted on PR #609. M1 (3-way reviewer convergence): rename test_inbox_wake_version_bump.py to test_plugin_version_bump.py. The file guards a generic 4-file plugin-version-consistency invariant; the inbox_wake_ provenance prefix dated to #603 and no longer reflects the test's scope. F3 (3-way convergence): add test_every_skill_subdir_has_skill_md to test_plugin_manifest_parity.py. Skills register via a directory pointer ("./skills/") in plugin.json, so a subdir without a SKILL.md ships silently non-discoverable rather than being caught by the array-based parity invariants. Counter-test-by-revert: renaming skills/orchestration/SKILL.md fails only the new test (cardinality 1). F4 (test-engineer): add test_agent_frontmatter_name_matches_filename to the same file plus a small regex-based _frontmatter_name helper. Claude Code resolves agents by frontmatter name; the filename-stem parity test passes when frontmatter and filename diverge. The new invariant pins them in lockstep. Counter-test-by-revert: mutating pact-architect.md frontmatter name fails only the new test (cardinality 1). No PyYAML dependency added. 14 manifest-parity + plugin-version-bump tests pass. Full suite 7212 pass + 1 pre-existing unrelated failure (skills/orchestration/SKILL.md exceeds 600 lines, covered by #594).

* Register watch-inbox + unwatch-inbox + prune-memory in plugin.json (#608) PR #603 shipped commands/watch-inbox.md and commands/unwatch-inbox.md without adding them to .claude-plugin/plugin.json's commands array; commands/prune-memory.md (from #493) had the same gap. Result: all three commands were non-discoverable as slash commands and non-invokable via the Skill tool, leaving the entire inbox-wake mechanism shipped in #591/#603 silently non-functional in the installed plugin. Append the three entries to plugin.json's commands array (15 total). Bump version 3.21.0 → 3.21.1 across the four version-tracked files (plugin.json, marketplace.json, root README.md, pact-plugin/README.md) and update test_inbox_wake_version_bump.py constants accordingly. Add pact-plugin/tests/test_plugin_manifest_parity.py with four set-membership invariants pinning manifest-vs-filesystem parity for both commands and agents in both directions. Counter-test-by-revert verified RED-on-revert for each. The invariant prevents this bug class from recurring: any future commands/*.md or agents/*.md added without manifest registration will fail CI before merge. * Address peer review M1+F3+F4 (#608/#609) Remediation for three reviewer follow-ups accepted on PR #609. M1 (3-way reviewer convergence): rename test_inbox_wake_version_bump.py to test_plugin_version_bump.py. The file guards a generic 4-file plugin-version-consistency invariant; the inbox_wake_ provenance prefix dated to #603 and no longer reflects the test's scope. F3 (3-way convergence): add test_every_skill_subdir_has_skill_md to test_plugin_manifest_parity.py. Skills register via a directory pointer ("./skills/") in plugin.json, so a subdir without a SKILL.md ships silently non-discoverable rather than being caught by the array-based parity invariants. Counter-test-by-revert: renaming skills/orchestration/SKILL.md fails only the new test (cardinality 1). F4 (test-engineer): add test_agent_frontmatter_name_matches_filename to the same file plus a small regex-based _frontmatter_name helper. Claude Code resolves agents by frontmatter name; the filename-stem parity test passes when frontmatter and filename diverge. The new invariant pins them in lockstep. Counter-test-by-revert: mutating pact-architect.md frontmatter name fails only the new test (cardinality 1). No PyYAML dependency added. 14 manifest-parity + plugin-version-bump tests pass. Full suite 7212 pass + 1 pre-existing unrelated failure (skills/orchestration/SKILL.md exceeds 600 lines, covered by #594).

@pytest

) (#638) * feat: add fixture-parity capture-provenance convention for hook stdin captures Establishes pact-plugin/tests/fixtures/wake_lifecycle/ as the canonical location for hook PostToolUse stdin captures, with a `_meta` sibling-key provenance convention (capture_session_id, capture_date, capture_method, issue_ref). Structural defense against the failure class that produced #620: hand- constructed fixtures diverged from production payload shape, tests stayed green while production was broken. The README mandates the convention for future hook-stdin fixtures. Initial fixtures cite #612 (logging-shim capture from session pact-56ce3a2a on 2026-05-02): TaskCreate production-nested shape, TaskUpdate flat shape (fossilized for future regression backstop), and the broken pre-#620 legacy shape preserved as a regression backstop. Refs: #612, #620 * fix: extend _extract_task_id to probe nested tool_response.task.id (#620) Production TaskCreate `tool_response` is nested (`tool_response.task.id`) per #612 logging-shim evidence. The prior flat-only probe returned None on every TaskCreate, killing the auto-Arm path on the first 0->1 active- task transition since #603 merged. Manual `/PACT:watch-inbox` continued to work, masking the regression. Extends `_extract_task_id` with three-block precedence: tool_input --> tool_response.task (nested, NEW) --> tool_response (flat, fallback). Public contract unchanged; signature widened to `dict[str, Any] | None` to make the pre-existing None-input defense honest. Docstring enumerates all 8 probe paths in precedence order and documents the WHY (cites #612 + capture session pact-56ce3a2a 2026-05-02). Tests: - New TestExtractTaskIdShapeResilience class with 7 named methods pinning behavior across input-side, nested-response, and flat- response paths plus failure modes (unknown shape, empty dicts, non-string ids, None input). - End-to-end regression test piping the captured production fixture through the full hook entry-point asserts an Arm directive is emitted. - 12 existing TaskCreate fixture stubs rewritten from flat to nested shape to match production going forward. Counter-test-by-revert verified: cardinality {3} on revert (the named #620 regression test, the end-to-end fixture test, and the None-input-defense test that fossilizes the isinstance guard). Refs: #620, #612, PR #603 * chore: bump version 4.0.0 -> 4.0.1 Patch release for #620 watch-inbox Arm regression fix. * test: generalize plugin.json version guardrail (4.0.0 -> EXPECTED_VERSION constant) The v4.0.0 release added test_plugin_json_version_is_4_0_0 as a one-shot guardrail to verify the BREAKING bump landed. With v4.0.1 it would have needed bumping again; instead, generalize to a module-top EXPECTED_VERSION constant so future bumps update one location. Also scrubs planning-artifact identifiers (C1/C9/C11) from the module docstring per the agent-consumed-docs current-state-only convention; the docstring now describes what the tests assert today rather than which planning commits brought them in. Refs: #620 * fix: harden _extract_task_id (drop unreachable None-defense + reject whitespace-only ids) Two coupled review-driven hardenings on the extractor: D2-a: drop the dict[str, Any] | None signature widening + isinstance top-level guard. main() at L327 pre-filters non-dicts; the guard tested an unreachable production branch (#538-adjacent capability-without- consumer). Per-sub-dict isinstance guards on tool_input/tool_response are kept — main() does NOT pre-filter sub-fields, so they catch malformed-but-dict shapes that are reachable. The matching test_none_input_data_returns_none case is dropped from the regression class; counter-test-by-revert cardinality on the #620 contract goes {3} -> {2} (named regression test + end-to-end fixture test still hold). F5: reject whitespace-only ids at all 3 probe sites by adding `.strip()` to the existing isinstance(tid, str) and tid check. Previously `{tool_response: {id: ' '}}` returned `' '` (truthy whitespace), which would fail downstream TaskStop with semantically-empty id. Strengthening is symmetric across input-side, nested-response, and flat-response probes; cascading-fallthrough invariant verified (whitespace at higher-precedence falls through to next valid probe rather than short-circuiting to None). New TestExtractTaskIdShape Resilience::test_whitespace_only_id_returns_none parametrized over 5 whitespace shapes pins the contract. Also adds test_all_wake_lifecycle_fixtures_carry_meta_provenance: a top-level enforcement test that globs tests/fixtures/wake_lifecycle/ and asserts each fixture carries the README-mandated _meta provenance (capture_session_id, capture_date, capture_method, issue_ref). Closes the README-discipline-only gap surfaced by the test-engineer's adversarial-coverage review of PR #638. Refs: #620, #638 * test: align version pinning with sibling pattern + tighten README assertions Two version-pinning test refinements driven by PR #638 architect + test-engineer reviews: D4-a: replace the EXPECTED_VERSION = "4.0.1" literal in test_plugin_json_orchestrator.py with json.loads(plugin.json)["version"] dynamic-source pattern matching sibling test_plugin_version_bump.py:23. Also drops the now-misleading "Update EXPECTED_VERSION when bumping" comment — with dynamic-source, plugin.json itself is the source of truth and there is nothing to update at this site. The structural- existence-check role is documented in the test docstring; cross-file drift detection is owned by sibling test_plugin_version_bump.py. option-b: tighten test_root_readme_version + test_pact_plugin_readme_version in test_plugin_version_bump.py with a word-bounded regex instead of plain substring search. _TARGET_VERSION_PATTERN compiles `(?<![\d.])` + re.escape(TARGET_VERSION) + `(?![\d.])` once at module load. Closes the substring-match edge case where '4.0.1' would falsely match '4.0.10' in a README. test_marketplace_json_version is left untouched (set-membership comparison is already exact). Refs: #638 * test: clean up pre-existing Pyright unused-var warnings (5 sites) Latent unused-import + 4 unused-vars in test_inbox_wake_lifecycle_emitter.py and test_plugin_version_bump.py, surfaced by Pyright re-analysis after PR #638's edits to those files. Pre-existing in main; not introduced by review remediation. Bundled here for cleanliness. test_plugin_version_bump.py: L18 drop unused `import pytest` (no @pytest.fixture / @pytest.mark / pytest.raises in this file) test_inbox_wake_lifecycle_emitter.py: L46 drop unused `team_name` param from `_pact_session_env`; 3 callers updated (L117/L123/L137). Internal helper, all callers pass the literal "t", body never references it. L436 drop unused `tmp_path` fixture from `test_is_terminal_status_update_matches_completed_and_deleted` signature; the test is in-memory predicate-only. L582 drop `result =` LHS in `test_count_active_tasks_called_on_taskupdate_terminal_status`; assertion is on `mock_count.call_count`, the call is for side effects. L603 same pattern in `test_count_active_tasks_called_on_taskcreate`. L543 sibling occurrence is deferred to #640 (suite-wide unused-var/ import scrub). Refs: #638 * test: drop tautological plugin.json version test (post-D4-a) After D4-a (Commit 9504594) made EXPECTED_VERSION dynamic-source (`json.loads(plugin.json)["version"]`), the `test_plugin_json_version_is_pinned_to_current_release` test in test_plugin_json_orchestrator.py asserts plugin.json's version equals plugin.json's version — structurally tautological, catches nothing. Cross-file drift detection is owned by test_plugin_version_bump.py (which checks plugin.json against marketplace.json + both READMEs). Drop the no-op test rather than carry it. Drops: - test_plugin_json_version_is_pinned_to_current_release - EXPECTED_VERSION module-level computation (orphan after the test goes) Keeps: - plugin_json fixture (5 sibling tests consume it) - import json + PLUGIN_JSON_PATH (fixture needs both) - all 5 surviving tests (13-entry agents array invariants + bootstrap-commands-absent invariant) Module docstring rewritten to lead with the surviving 13-entry agents array claim + bootstrap-commands-absent invariant + cross-ref to test_plugin_version_bump.py for version-pinning ownership. Refs: #638

michael-wojcik added 11 commits April 30, 2026 11:08

michael-wojcik mentioned this pull request May 1, 2026

inbox-wake: silent Monitor death documented but unmonitored #604

Open

michael-wojcik mentioned this pull request May 1, 2026

inbox-wake: owner-flip-to-exempt without status change leaves Monitor armed orphaned #605

Open

This was referenced May 1, 2026

Add end-to-end integration test for inbox-wake mechanism #606

Open

Refactor session_init.py — exceeds 600-line threshold (1,142 lines) #607

Open

michael-wojcik merged commit 28189e5 into main May 1, 2026

michael-wojcik deleted the feat/591-inbox-wake-lead-only branch May 1, 2026 21:31

This was referenced May 2, 2026

plugin.json missing commands/watch-inbox.md + commands/unwatch-inbox.md — PR #603 inbox-wake mechanism non-functional in installed plugin #608

Closed

Register inbox-wake + prune-memory commands in plugin.json (#608) #609

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add inbox-wake skill (lead-only) + 20/20/60 quiet-window + PostToolUse lifecycle#603

Add inbox-wake skill (lead-only) + 20/20/60 quiet-window + PostToolUse lifecycle#603
michael-wojcik merged 13 commits into
mainfrom
feat/591-inbox-wake-lead-only

michael-wojcik commented Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

michael-wojcik commented Apr 30, 2026

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant