Add inbox-wake skill (lead-only) + 20/20/60 quiet-window + PostToolUse lifecycle#603
Merged
Conversation
Authors pact-plugin/skills/inbox-wake/SKILL.md as the lead-only canonical reference for the Monitor-arm/teardown wake mechanism. 9 sections per architect rev5 §5; alarm-clock framing with no-narration tightening; 20/20/60 quiet-window state machine (PENDING/GROWING with FIRST_GROW + LAST_GROW + MAX_DELAY edge emits) coalesces wake-fire inflation under bursty traffic; F1/F6/F7 invariants; Failure Modes including the new wake-fire-inflation anti-pattern anchored to the 20/20/60 design.
Adds the PostToolUse-driven lifecycle automation that arms and tears down the inbox-wake Monitor based on active-task transitions. New pact-plugin/hooks/shared/wake_lifecycle.py provides count_active_tasks() and the _lifecycle_relevant predicate that reuses SELF_COMPLETE_EXEMPT_AGENTS from shared/intentional_wait.py to filter out signal-tasks (auditor signal/blocker/algedonic) and self-complete exempt agents. New pact-plugin/hooks/wake_lifecycle_emitter.py is a PostToolUse hook with matcher TaskCreate|TaskUpdate|Task|Agent that detects 0->1 transitions (emits Arm directive) and 1->0 transitions (emits Teardown directive). JSON output includes the required hookEventName field per platform schema. session_init.py adds Option-C resume-with-active-tasks gap closure with unconditional directive prose. session_end.py adds cleanup_wake_registry helper with path-traversal defense (is_safe_path_component + resolve+relative_to + Path.unlink missing_ok=True wrapped in try/except OSError) and single-file unlink for the lead-only inbox-wake-state.json STATE_FILE.
Adds the parallel safety-net layer for the inbox-wake mechanism: charter §Wake Mechanism subsection in pact-communication-charter.md cross-refs the canonical skill via slug-link; commands/wrap-up.md, commands/pause.md, and the imPACT.md force-termination branch invoke the skill's Teardown operation so Monitor stops cleanly on session-end paths even if the PostToolUse 1->0 transition is missed. New pact-plugin/tests/runbooks/ 591-inbox-wake.md documents Arm/Teardown lifecycle, 20/20/60 quiet- window verification (FIRST_GROW + LAST_GROW + MAX_DELAY edges), failure modes, and STATE_FILE inspection. Plugin version bumps minor: 3.20.4 -> 3.21.0 (new capability category) across plugin.json, marketplace.json, README.md, and pact-plugin/README.md.
Adds file-parsing structural tests covering the inbox-wake lead-only implementation across all three Wave commits. 138 invariants split across 7 files: skill structure (47), lifecycle emitter (16), lifecycle helper (30), session_init (8), session_end cleanup (20), callsites (9), and version bump (8). Counter-test-by-revert verified cardinality on 4 load-bearing assertions (F6 phrase, POLL=20 timing, glob-in-cleanup, emitter slug). No skill execution or live agent loop; all assertions parse files directly. Documents the count_active_tasks pure-never-raises property structurally so the redundant try/except guard at session_init.py:728-730 can be removed in a future cleanup.
Charter §Wake Mechanism collapses to 5 slug-link bullets pointing at the canonical SKILL.md anchors instead of restating the alarm-clock framing, between-tool-call scope claim, and Arm/Teardown trigger sites. SKILL.md remains the single drift anchor for those load-bearing prose blocks. imPACT force-termination Teardown drops the "Skip if other active teammate tasks remain" guard. The conditional required LLM-self-diagnosis of active-task state and was stale under the lead-only design (only one Monitor exists per session). Symmetric with /wrap-up and /pause: the skill's Teardown is idempotent, so unconditional invocation is safe.
A1: drops the hypothetical_pre reverse-derivation in
wake_lifecycle_emitter.py. The previous logic only reverted status when
deciding pre-state, so a TaskUpdate that bundled status=completed with
an owner or metadata flip in the same call could suppress a legitimate
Teardown emit. The simplified _decide_directive uses post-only
detection: any TaskUpdate(status=completed) at post=0 emits Teardown.
Skill's Teardown is idempotent, so over-eager invocations no-op when
no STATE_FILE is on disk. _read_task_from_disk and the _lifecycle_relevant
import are removed; emitter shrinks by ~46 lines.
A2: adds is_safe_path_component(team_name) gate at count_active_tasks
entry in shared/wake_lifecycle.py, mirroring the cleanup_wake_registry
defense in session_end.py. team_name is producer-side canonicalized at
session_init, so this is belt-and-suspenders defense-in-depth covering
the failure mode where a poisoned session context reaches the helper.
B2: pure-never-raises property pinned for _lifecycle_relevant on
adversarial task shapes (~30 cells, sweep status/owner/metadata combos).
This closes the gate documented at fb6fa435 — once both
count_active_tasks and _lifecycle_relevant are structurally pinned
never-raise, the redundant try/except guard wrapping count_active_tasks
in session_init.py can be safely removed in a follow-up.
B3: literal Arm/Teardown precondition phrases pinned in emitter tests
("First active teammate task created", "Last active teammate task
completed"), symmetric with the directive-prose pins already in
session_init coverage.
B4: behavioral pins for cleanup_wake_registry callsite in session_end.py
and the session_init Arm-emit gate (subprocess + synthesized stdin).
Closes the false-RED-prone gap where structural-only pins would pass
through a benign refactor that broke runtime behavior.
Existing test test_no_teardown_on_completion_of_signal_task replaced with
test_teardown_emits_on_signal_task_completion_at_post_zero documenting
the new contract.
173 inbox-wake tests pass; full suite 7165 pass.
The unconditional Teardown invocation in commands/pause.md and the force-termination branch of commands/imPACT.md tore down the lead's Monitor when other teammates were still active. Force-terminating teammate A while B and C remained in flight dropped the active count from 3 to 2; the PostToolUse hook correctly skipped Teardown (post>0), but the LLM-invoked callsite ran it anyway, unlinking STATE_FILE and stopping the Monitor that B and C still depended on. The lead then fell back to baseline InboxPoller — the exact failure mode #591 was designed to prevent. The PostToolUse hook handles the 1->0 transition authoritatively for both /pause and /imPACT, so the callsites were redundant in the common case and actively harmful in the active-siblings case. The /wrap-up callsite stays: it fires after every task is already completed (count is 0), so the over-eager Teardown is harmless and provides a hook-silent-fail safety net. SKILL.md frontmatter description and §When to Invoke table narrow to the /wrap-up callsite. Charter §Wake Mechanism bullet 4 reframes around the same. test_inbox_wake_callsites.py replaces the parametrized assertion with a /wrap-up-only positive assertion plus two negative assertions that imPACT.md and pause.md must not invoke the skill — guards against future re-introduction without re-deriving the per-callsite all-paths-safe invariant.
…cing
Empirical dogfood data from the previous session showed 20/20/60
fired roughly one wake per minute under teammate-heavy phases (5+
MAX_DELAY ceiling fires across ~100 minutes), producing visible noise
on every reviewer-message burst. The 20/20/60 ratios used QUIET_REQUIRED
= POLL, so a single quiet poll cycle was enough to declare a burst
over — multi-message reviewer batches with internal 30-50s sub-pauses
fragmented into multiple FIRST_GROW + LAST_GROW pairs.
The new 30/60/120 ratios are POLL=30, QUIET_REQUIRED=60 (= 2*POLL),
and MAX_DELAY=120 (= 2*QUIET_REQUIRED, equivalently 4*POLL). Two
consecutive quiet poll cycles must pass before LAST_GROW fires, so a
typical reviewer batch coalesces into one logical burst. The same
batch that 20/20/60 fragmented into ~3 pairs (6 fires) now coalesces
to ~1 pair plus an optional MAX_DELAY ceiling (2-3 fires).
Single-message responsiveness is preserved at FIRST_GROW within ~30 s.
Sustained-traffic ceiling at 120 s replaces the previous 60 s — in
exchange for stronger coalescing during ordinary bursts.
The audit annotation explicitly anchors both design ratios so future
edits cannot collapse QUIET back to POLL ("ping-pong" regression) or
shrink MAX_DELAY below 2*QUIET ("refragmenting coalesced bursts").
The constraints QUIET_REQUIRED >= POLL and MAX_DELAY >= 2*POLL are
unchanged.
Runbook also picks up cycle-2 drift cleanup: dropped stale references
to /pause and /imPACT inbox-wake teardown callsites (removed in
8f17f1e), with an explicit audit anchor explaining why those
commands deliberately do not invoke Teardown — they may run with
active teammate tasks and the lead's Monitor must stay armed.
…ommands Splits the unified inbox-wake skill into two slash-typeable commands so the user can manually arm or unwatch the lead's inbox monitor for debug/recovery — particularly /PACT:unwatch-inbox to silence Monitor notifications mid-session if the wake fires become noisy. The PostToolUse hook continues to emit Arm and Teardown directives automatically on first/last active-task transitions; the directives now reference the command pair (/PACT:watch-inbox and /PACT:unwatch-inbox) instead of the unified skill. Audit anchor allocation: - /PACT:watch-inbox owns alarm-clock framing, 30/60/120 dual-ratio audit (QUIET = 2*POLL + MAX_DELAY = 2*QUIET), F1 single-file inbox, F7 stdout discipline, Monitor block, WriteStateFile block, and the monitor-side failure modes (silent death, long-tool-blocks-wake, wake-fire inflation, malformed STATE_FILE, concurrent re-arm). - /PACT:unwatch-inbox owns F6 (TaskStop tolerates not-found), the best-effort/opportunistic framing, Teardown ordering, and the torn-down-session-may-have-lost-Monitor failure mode. Each command cross-links the other under ## References, so an LLM loading either half can reach the partner if needed. Test files split symmetrically: structural assertions for each command live in test_watch_inbox_command_structure.py and test_unwatch_inbox_command_structure.py, with the original test_inbox_wake_skill_structure.py removed. Mechanism tests (test_inbox_wake_lifecycle_emitter.py, test_inbox_wake_session_init.py, test_inbox_wake_callsites.py) keep the inbox_wake_ prefix as a mechanism-level grouping. Negative assertions in test_inbox_wake_callsites.py forbid all three slugs (legacy inbox-wake plus watch-inbox and unwatch-inbox) in commands/imPACT.md and commands/pause.md as defense-in-depth against regression of the F3-followup learning. 186 inbox-wake tests pass.
Removes dated session telemetry, migration labels, and review-cycle
finding refs from committed plugin agent-consumed docs per the pinned
"current-state instructions only" axiom: watch-inbox.md drops the
Option-C resume-gap-closure label, the silent-Monitor-death blockquote's
verbatim 2026-04-29 timestamp + monitor_task_id anchor, the audit
anchor citing those same dated literals, and the long-single-tool
blockquote's 2026-04-30 session-test telemetry. wake_lifecycle.py
docstring drops the "per the review-security-engineer SE-1 finding"
ref, citing the sibling cleanup_wake_registry pattern instead as
current-state evidence.
Audit anchors stay intact; their dated evidence sentences become
present-tense rules. Surrounding instructions read coherently without
the archaeology.
PostToolUse hook matcher in hooks.json prunes from
TaskCreate|TaskUpdate|Task|Agent to TaskCreate|TaskUpdate. The hook
explicitly skipped Task and Agent at line 184 (the spawn tools don't
change the active-task count), so the wider matcher forked the hook
process for a no-op return on every teammate spawn. The line-184
skip-check stays as defensive belt-and-suspenders against future
matcher widening. emitter module docstring + lifecycle-emitter test
docstring + assertion + runbook prose all updated to the new pruned
matcher.
test_long_single_tool_failure_mode_with_empirical_anchor renames to
test_long_single_tool_failure_mode_documented and asserts on the
present-tense scope phrases ("between tool calls", "not mid-tool")
instead of dated session timestamps. Counter-test-by-revert: each
load-bearing assertion produces RED on revert.
Closes nine blind-round findings.
watch-inbox / unwatch-inbox each gain a Lead-Session Guard (Step 0)
that reads the team config's leadSessionId and refuses to execute when
invoked from a teammate session. The Monitor watches a fixed
inboxes/team-lead.json path; arming it from a non-lead session would
fire the wake event in the wrong session and silently lose the signal.
The audit anchor explains why leadSessionId is the canonical signal
source (the session-context schema does not carry agent_type).
unwatch-inbox's Teardown Block adds a task-id allowlist gate
(^[a-z0-9]{6,}$) before TaskStop. A tampered STATE_FILE could
otherwise specify an arbitrary task_id; the gate refuses TaskStop on
malformed input but still proceeds to STATE_FILE unlink (corrupt
registry cleanup is path-validated separately and independently safe).
A one-line audit comment notes the documented TOCTOU window between
resolve() and unlink(): same-user trust assumption bounds the residual
risk, and a same-user attacker has equivalent capability via direct
os.unlink anyway.
Path-traversal defense is now centralized in
task_utils.iter_team_task_jsons(team_name), which owns
is_safe_path_component, resolve()+relative_to(tasks_root), and the
dotfile-exclusion filter (Path.glob does not exclude dotfiles on POSIX,
so the helper applies an explicit startswith filter). count_active_tasks
becomes a one-line sum delegating to the helper, removing the inline
iteration that had drifted from the cleanup-side defense.
The PostToolUse emitter generalizes from a "completed-only" terminal
check to a TERMINAL_STATUSES set that covers both "completed" and
"deleted". A task transitioning to either terminal state at post==0 now
emits the Teardown directive on the 1->0 transition; previously only
status=completed triggered it.
Tests pin five new structural invariants (Lead-Session Guard wording,
task-id allowlist regex, TOCTOU audit comment, deleted-status terminal,
shell-injection-vector absence in the Monitor block) plus tightened
assertions on the silent-Monitor-death failure-mode wording and the
dotfile-exclusion filter mechanism. Counter-test-by-revert verified
on each load-bearing assertion. 198 inbox-wake tests pass.
A blind-finding flagged the metadata-isinstance carve-out as
unreachable; verification confirmed the gate is intentional (the
existing isinstance check returns True on parse-failed metadata as a
conservative count-the-task fallback rather than silently exempting it
via downstream carve-outs). No code change.
Closes the blind-round-2 elevated findings. STATE_FILE schema extends from three fields to four. The new armed_by_session_id field records the lead's session_id at Arm time; the unwatch-inbox Teardown reads it back and validates the same-session invariant before invoking TaskStop. A concurrent same-user session (another worktree, another /PACT:orchestrate run) that planted a malicious STATE_FILE would have its session_id mismatched, so Teardown refuses TaskStop on the planted file. Unlink still proceeds because the path is allowlist-validated separately and the Monitor in the original lead session keeps running until that session ends. The PostToolUse emitter now gates on the same lead-session check before deciding any directive. Previously every teammate session's TaskCreate or TaskUpdate emitted a watch-inbox directive into the teammate's context (and a non-compliant teammate LLM running the Monitor body would silently take over the wake mechanism in the wrong session). Silent fail-open on non-lead sessions matches the livelock-class hook contract, so the false-positive emit is closed without surfacing a hook-error UI on every teammate Task-tool fire. The terminal-status check now runs before count_active_tasks, so metadata-only TaskUpdates (teachback_submit, intentional_wait, handoff, progress, memory_saved) no longer glob the team's task directory. On a thirty-task team that eliminates roughly nine thousand wasted reads per session. Documentation drift cleanup: the unwatch-inbox When-to-Invoke row, the runbook lifecycle table, and the related audit anchor now read "TaskUpdate with terminal status (completed or deleted)" in line with the cycle-6 emitter behaviour. The runbook §3.4 reference to test_inbox_wake_lifecycle.py is replaced with the actual split test files. The pre-prune Task/Agent fall-through comment block in wake_lifecycle_emitter.py is rewritten to describe current state. Tests pin the new four-field schema, the armed_by_session_id validation step, the lead-session emitter guard, the perf reorder (via mock.patch on the imported binding to assert call count), and the terminal-status doc wording. Counter-test-by-revert confirmed load-bearing on each. 208 inbox-wake tests pass.
…in-depth
Closes the remaining blind-round-2 minors and futures.
_lifecycle_relevant hoists the SELF_COMPLETE_EXEMPT_AGENTS owner check
above the metadata-shape gate. A secretary-owned task with corrupted
metadata previously short-circuited to the conservative-count branch
and bypassed the owner carve-out, asymmetric with the sibling
is_self_complete_exempt predicate; the new ordering restores symmetry.
iter_team_task_jsons now skips per-file symlink-yielded entries inside
the glob loop, closing the per-file escape vector that the directory-
boundary defense did not cover. session_init.py drops the redundant
try/except around count_active_tasks now that cycle 6 added the
structural pure-never-raises pin (the cleanup-opportunity tracked
through six earlier cycles is closed). The Tier-0 unconditional Arm
directive picks up an explicit audit anchor naming the
LLM-self-diagnosis anti-pattern so future edits cannot quietly
reintroduce conditional emit logic. The wake-arm step label moves
from 4d to 5_pre to match its physical position in the surrounding
session_init flow.
wake_lifecycle_emitter now caps stdin reads at 1MB before json.loads,
so a malformed-or-malicious payload exits cleanly via the existing
fail-open path instead of risking OOM on the hook process.
unwatch-inbox tightens the monitor_task_id allowlist regex from
^[a-z0-9]{6,}$ to ^[a-z0-9]{6,}\Z so a newline-suffixed id can no
longer slip through Python re's $ semantics. The audit anchor cites
session_end.py::_UUID_PATTERN as the in-plugin precedent and retains
one $-form literal as an explicit anti-pattern callout for future
editors. The Lead-Session Guard sections in both watch-inbox and
unwatch-inbox extend their audit anchors to document the guard as
foot-gun protection (typo / wrong-window / cross-session-LLM
speculation), not a security boundary against same-user adversaries
- the user-local-trust assumption bounds the residual exposure.
The concurrent-rearm Failure Mode tightens to "STATE_FILE points
only at the winner; loser's Monitor has no registry pointer for any
subsequent /PACT:unwatch-inbox to find" instead of the prior
ambiguous "TaskStop only stops the winner" framing.
The runbook drops the incorrect "(or TaskUpdate with owner
assignment)" parenthetical so the documented Arm trigger matches
the emitter's actual TaskCreate-only behaviour.
Tests pin all the new structural and behavioural invariants. The
Lead-Session Guard test tightens from substring presence to
section-body extraction so an inline cross-ref reference no longer
phantom-greens the assertion. The shell-injection negative-allowlist
gains $(rm, $(python, <(, and >( as forbidden patterns. New
counter-tests cover the owner-check hoist, per-file symlink skip,
\Z regex anchor, and 1MB payload cap, with revert cardinality
verified for each. 214 inbox-wake tests pass.
This was referenced May 1, 2026
michael-wojcik
added a commit
that referenced
this pull request
May 2, 2026
Remediation for three reviewer follow-ups accepted on PR #609. M1 (3-way reviewer convergence): rename test_inbox_wake_version_bump.py to test_plugin_version_bump.py. The file guards a generic 4-file plugin-version-consistency invariant; the inbox_wake_ provenance prefix dated to #603 and no longer reflects the test's scope. F3 (3-way convergence): add test_every_skill_subdir_has_skill_md to test_plugin_manifest_parity.py. Skills register via a directory pointer ("./skills/") in plugin.json, so a subdir without a SKILL.md ships silently non-discoverable rather than being caught by the array-based parity invariants. Counter-test-by-revert: renaming skills/orchestration/SKILL.md fails only the new test (cardinality 1). F4 (test-engineer): add test_agent_frontmatter_name_matches_filename to the same file plus a small regex-based _frontmatter_name helper. Claude Code resolves agents by frontmatter name; the filename-stem parity test passes when frontmatter and filename diverge. The new invariant pins them in lockstep. Counter-test-by-revert: mutating pact-architect.md frontmatter name fails only the new test (cardinality 1). No PyYAML dependency added. 14 manifest-parity + plugin-version-bump tests pass. Full suite 7212 pass + 1 pre-existing unrelated failure (skills/orchestration/SKILL.md exceeds 600 lines, covered by #594).
michael-wojcik
added a commit
that referenced
this pull request
May 2, 2026
* Register watch-inbox + unwatch-inbox + prune-memory in plugin.json (#608) PR #603 shipped commands/watch-inbox.md and commands/unwatch-inbox.md without adding them to .claude-plugin/plugin.json's commands array; commands/prune-memory.md (from #493) had the same gap. Result: all three commands were non-discoverable as slash commands and non-invokable via the Skill tool, leaving the entire inbox-wake mechanism shipped in #591/#603 silently non-functional in the installed plugin. Append the three entries to plugin.json's commands array (15 total). Bump version 3.21.0 → 3.21.1 across the four version-tracked files (plugin.json, marketplace.json, root README.md, pact-plugin/README.md) and update test_inbox_wake_version_bump.py constants accordingly. Add pact-plugin/tests/test_plugin_manifest_parity.py with four set-membership invariants pinning manifest-vs-filesystem parity for both commands and agents in both directions. Counter-test-by-revert verified RED-on-revert for each. The invariant prevents this bug class from recurring: any future commands/*.md or agents/*.md added without manifest registration will fail CI before merge. * Address peer review M1+F3+F4 (#608/#609) Remediation for three reviewer follow-ups accepted on PR #609. M1 (3-way reviewer convergence): rename test_inbox_wake_version_bump.py to test_plugin_version_bump.py. The file guards a generic 4-file plugin-version-consistency invariant; the inbox_wake_ provenance prefix dated to #603 and no longer reflects the test's scope. F3 (3-way convergence): add test_every_skill_subdir_has_skill_md to test_plugin_manifest_parity.py. Skills register via a directory pointer ("./skills/") in plugin.json, so a subdir without a SKILL.md ships silently non-discoverable rather than being caught by the array-based parity invariants. Counter-test-by-revert: renaming skills/orchestration/SKILL.md fails only the new test (cardinality 1). F4 (test-engineer): add test_agent_frontmatter_name_matches_filename to the same file plus a small regex-based _frontmatter_name helper. Claude Code resolves agents by frontmatter name; the filename-stem parity test passes when frontmatter and filename diverge. The new invariant pins them in lockstep. Counter-test-by-revert: mutating pact-architect.md frontmatter name fails only the new test (cardinality 1). No PyYAML dependency added. 14 manifest-parity + plugin-version-bump tests pass. Full suite 7212 pass + 1 pre-existing unrelated failure (skills/orchestration/SKILL.md exceeds 600 lines, covered by #594).
This was referenced May 2, 2026
michael-wojcik
added a commit
that referenced
this pull request
May 5, 2026
) (#638) * feat: add fixture-parity capture-provenance convention for hook stdin captures Establishes pact-plugin/tests/fixtures/wake_lifecycle/ as the canonical location for hook PostToolUse stdin captures, with a `_meta` sibling-key provenance convention (capture_session_id, capture_date, capture_method, issue_ref). Structural defense against the failure class that produced #620: hand- constructed fixtures diverged from production payload shape, tests stayed green while production was broken. The README mandates the convention for future hook-stdin fixtures. Initial fixtures cite #612 (logging-shim capture from session pact-56ce3a2a on 2026-05-02): TaskCreate production-nested shape, TaskUpdate flat shape (fossilized for future regression backstop), and the broken pre-#620 legacy shape preserved as a regression backstop. Refs: #612, #620 * fix: extend _extract_task_id to probe nested tool_response.task.id (#620) Production TaskCreate `tool_response` is nested (`tool_response.task.id`) per #612 logging-shim evidence. The prior flat-only probe returned None on every TaskCreate, killing the auto-Arm path on the first 0->1 active- task transition since #603 merged. Manual `/PACT:watch-inbox` continued to work, masking the regression. Extends `_extract_task_id` with three-block precedence: tool_input --> tool_response.task (nested, NEW) --> tool_response (flat, fallback). Public contract unchanged; signature widened to `dict[str, Any] | None` to make the pre-existing None-input defense honest. Docstring enumerates all 8 probe paths in precedence order and documents the WHY (cites #612 + capture session pact-56ce3a2a 2026-05-02). Tests: - New TestExtractTaskIdShapeResilience class with 7 named methods pinning behavior across input-side, nested-response, and flat- response paths plus failure modes (unknown shape, empty dicts, non-string ids, None input). - End-to-end regression test piping the captured production fixture through the full hook entry-point asserts an Arm directive is emitted. - 12 existing TaskCreate fixture stubs rewritten from flat to nested shape to match production going forward. Counter-test-by-revert verified: cardinality {3} on revert (the named #620 regression test, the end-to-end fixture test, and the None-input-defense test that fossilizes the isinstance guard). Refs: #620, #612, PR #603 * chore: bump version 4.0.0 -> 4.0.1 Patch release for #620 watch-inbox Arm regression fix. * test: generalize plugin.json version guardrail (4.0.0 -> EXPECTED_VERSION constant) The v4.0.0 release added test_plugin_json_version_is_4_0_0 as a one-shot guardrail to verify the BREAKING bump landed. With v4.0.1 it would have needed bumping again; instead, generalize to a module-top EXPECTED_VERSION constant so future bumps update one location. Also scrubs planning-artifact identifiers (C1/C9/C11) from the module docstring per the agent-consumed-docs current-state-only convention; the docstring now describes what the tests assert today rather than which planning commits brought them in. Refs: #620 * fix: harden _extract_task_id (drop unreachable None-defense + reject whitespace-only ids) Two coupled review-driven hardenings on the extractor: D2-a: drop the dict[str, Any] | None signature widening + isinstance top-level guard. main() at L327 pre-filters non-dicts; the guard tested an unreachable production branch (#538-adjacent capability-without- consumer). Per-sub-dict isinstance guards on tool_input/tool_response are kept — main() does NOT pre-filter sub-fields, so they catch malformed-but-dict shapes that are reachable. The matching test_none_input_data_returns_none case is dropped from the regression class; counter-test-by-revert cardinality on the #620 contract goes {3} -> {2} (named regression test + end-to-end fixture test still hold). F5: reject whitespace-only ids at all 3 probe sites by adding `.strip()` to the existing isinstance(tid, str) and tid check. Previously `{tool_response: {id: ' '}}` returned `' '` (truthy whitespace), which would fail downstream TaskStop with semantically-empty id. Strengthening is symmetric across input-side, nested-response, and flat-response probes; cascading-fallthrough invariant verified (whitespace at higher-precedence falls through to next valid probe rather than short-circuiting to None). New TestExtractTaskIdShape Resilience::test_whitespace_only_id_returns_none parametrized over 5 whitespace shapes pins the contract. Also adds test_all_wake_lifecycle_fixtures_carry_meta_provenance: a top-level enforcement test that globs tests/fixtures/wake_lifecycle/ and asserts each fixture carries the README-mandated _meta provenance (capture_session_id, capture_date, capture_method, issue_ref). Closes the README-discipline-only gap surfaced by the test-engineer's adversarial-coverage review of PR #638. Refs: #620, #638 * test: align version pinning with sibling pattern + tighten README assertions Two version-pinning test refinements driven by PR #638 architect + test-engineer reviews: D4-a: replace the EXPECTED_VERSION = "4.0.1" literal in test_plugin_json_orchestrator.py with json.loads(plugin.json)["version"] dynamic-source pattern matching sibling test_plugin_version_bump.py:23. Also drops the now-misleading "Update EXPECTED_VERSION when bumping" comment — with dynamic-source, plugin.json itself is the source of truth and there is nothing to update at this site. The structural- existence-check role is documented in the test docstring; cross-file drift detection is owned by sibling test_plugin_version_bump.py. option-b: tighten test_root_readme_version + test_pact_plugin_readme_version in test_plugin_version_bump.py with a word-bounded regex instead of plain substring search. _TARGET_VERSION_PATTERN compiles `(?<![\d.])` + re.escape(TARGET_VERSION) + `(?![\d.])` once at module load. Closes the substring-match edge case where '4.0.1' would falsely match '4.0.10' in a README. test_marketplace_json_version is left untouched (set-membership comparison is already exact). Refs: #638 * test: clean up pre-existing Pyright unused-var warnings (5 sites) Latent unused-import + 4 unused-vars in test_inbox_wake_lifecycle_emitter.py and test_plugin_version_bump.py, surfaced by Pyright re-analysis after PR #638's edits to those files. Pre-existing in main; not introduced by review remediation. Bundled here for cleanliness. test_plugin_version_bump.py: L18 drop unused `import pytest` (no @pytest.fixture / @pytest.mark / pytest.raises in this file) test_inbox_wake_lifecycle_emitter.py: L46 drop unused `team_name` param from `_pact_session_env`; 3 callers updated (L117/L123/L137). Internal helper, all callers pass the literal "t", body never references it. L436 drop unused `tmp_path` fixture from `test_is_terminal_status_update_matches_completed_and_deleted` signature; the test is in-memory predicate-only. L582 drop `result =` LHS in `test_count_active_tasks_called_on_taskupdate_terminal_status`; assertion is on `mock_count.call_count`, the call is for side effects. L603 same pattern in `test_count_active_tasks_called_on_taskcreate`. L543 sibling occurrence is deferred to #640 (suite-wide unused-var/ import scrub). Refs: #638 * test: drop tautological plugin.json version test (post-D4-a) After D4-a (Commit 9504594) made EXPECTED_VERSION dynamic-source (`json.loads(plugin.json)["version"]`), the `test_plugin_json_version_is_pinned_to_current_release` test in test_plugin_json_orchestrator.py asserts plugin.json's version equals plugin.json's version — structurally tautological, catches nothing. Cross-file drift detection is owned by test_plugin_version_bump.py (which checks plugin.json against marketplace.json + both READMEs). Drop the no-op test rather than carry it. Drops: - test_plugin_json_version_is_pinned_to_current_release - EXPECTED_VERSION module-level computation (orphan after the test goes) Keeps: - plugin_json fixture (5 sibling tests consume it) - import json + PLUGIN_JSON_PATH (fixture needs both) - all 5 surviving tests (13-entry agents array invariants + bootstrap-commands-absent invariant) Module docstring rewritten to lead with the surviving 13-entry agents array claim + bootstrap-commands-absent invariant + cross-ref to test_plugin_version_bump.py for version-pinning ownership. Refs: #638
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
pact-plugin/skills/inbox-wake/SKILL.mdas the lead-only canonical SSOT for the wake mechanism, with a 20/20/60 quiet-window state machine in the Monitor Block to coalesce wake-fire inflation under bursty trafficwake_lifecycle_emitter.py+shared/wake_lifecycle.py) that arms Monitor on first active task and tears it down on last active task;session_init.pycloses the resume-with-active-tasks gap;session_end.pyaddscleanup_wake_registrywith path-traversal defensecommands/{wrap-up,pause,imPACT}.mdTeardown invocations + charter §Wake Mechanism cross-ref + operational runbook + plugin version bump 3.20.4 → 3.21.0Replaces #602 (symmetric scope) with a lead-only redesign authored fresh on a new branch from main. Empirical Phase 0 routing probe (3 rounds) established that
PostToolUsewith matcherTaskCreate|TaskUpdate|Task|Agentis the only viable channel for hook-emitted Arm/Teardown directives —TaskCreated/TaskCompleted/SubagentStopdo not supporthookSpecificOutput.additionalContextper platform schema. Plan:docs/plans/591-inbox-wake-skill-lead-only-plan.md(local, gitignored).Test plan
pytest pact-plugin/tests/test_inbox_wake_*.py— 138/138 passbu4hxc2bharmed with the canonical 20/20/60 block produced clean FIRST_GROW + LAST_GROW per burst; MAX_DELAY ceiling fired once on sustained traffic