Problem
PR #603 shipped the lead-only inbox-wake redesign with strong per-component test coverage:
- Command-structure tests (
test_watch_inbox_command_structure.py, test_unwatch_inbox_command_structure.py)
- Lifecycle-emitter behavior tests (
test_inbox_wake_lifecycle_emitter.py)
- Helper tests (
test_inbox_wake_lifecycle_helper.py)
- Session-init Tier-0 directive tests (
test_inbox_wake_session_init.py)
- Session-end cleanup tests (
test_inbox_wake_session_end_cleanup.py)
- Callsite-presence tests (
test_inbox_wake_callsites.py)
What we don't have: a single test exercising the full runtime flow as one integration:
TaskCreate (0→1) → PostToolUse fires → Arm directive emitted →
Skill invocation → Monitor spawned → STATE_FILE written →
Teammate SendMessage → inbox bytes grow → INBOX_GREW edge fires →
Lead receives → final TaskUpdate (1→0) → Teardown directive →
TaskStop → STATE_FILE removed
Why It Matters
The 30/60/120 quiet-window state machine, the Monitor-stdout edge-emit timing, and the Lead-Session Guard discipline are all empirically verified by dogfooding during the PR development itself, but that confidence does not ship as test coverage. Future refactors of wake_lifecycle_emitter.py, count_active_tasks, or the canonical Monitor block in watch-inbox.md could regress integration behavior without breaking any per-component test.
Constraints
- TaskCreate / TaskUpdate / Monitor / SendMessage are platform primitives — not directly invokable from pytest. The test would need either:
- A Claude Code subprocess harness (heavy; precedent:
pact-plugin/tests/runbooks/ is manual today)
- A simulated tool-event ingestion path (mocks PostToolUse stdin, asserts directive emit + STATE_FILE)
- Lean on the existing dogfood pattern: a runbook + scripted assertion runner
Suggested Approach
Start with the third — extend pact-plugin/tests/runbooks/591-inbox-wake.md with explicit GIVEN/WHEN/THEN steps and a companion scripts/verify-591-runbook.sh that asserts STATE_FILE shape, Monitor task existence, and inbox-grow edge timing. Defer the full subprocess harness until we have a second feature with the same testing-gap shape.
Acceptance
Origin
Identified during PR #603 merge-readiness review (cycle 8 closeout, 2026-04-30). Companion to #604 (silent Monitor death tracking) and #605 (owner-flip-to-exempt edge case).
Problem
PR #603 shipped the lead-only inbox-wake redesign with strong per-component test coverage:
test_watch_inbox_command_structure.py,test_unwatch_inbox_command_structure.py)test_inbox_wake_lifecycle_emitter.py)test_inbox_wake_lifecycle_helper.py)test_inbox_wake_session_init.py)test_inbox_wake_session_end_cleanup.py)test_inbox_wake_callsites.py)What we don't have: a single test exercising the full runtime flow as one integration:
Why It Matters
The 30/60/120 quiet-window state machine, the Monitor-stdout edge-emit timing, and the Lead-Session Guard discipline are all empirically verified by dogfooding during the PR development itself, but that confidence does not ship as test coverage. Future refactors of
wake_lifecycle_emitter.py,count_active_tasks, or the canonical Monitor block inwatch-inbox.mdcould regress integration behavior without breaking any per-component test.Constraints
pact-plugin/tests/runbooks/is manual today)Suggested Approach
Start with the third — extend
pact-plugin/tests/runbooks/591-inbox-wake.mdwith explicit GIVEN/WHEN/THEN steps and a companionscripts/verify-591-runbook.shthat asserts STATE_FILE shape, Monitor task existence, and inbox-grow edge timing. Defer the full subprocess harness until we have a second feature with the same testing-gap shape.Acceptance
Origin
Identified during PR #603 merge-readiness review (cycle 8 closeout, 2026-04-30). Companion to #604 (silent Monitor death tracking) and #605 (owner-flip-to-exempt edge case).