Skip to content

Add end-to-end integration test for inbox-wake mechanism #606

@michael-wojcik

Description

@michael-wojcik

Problem

PR #603 shipped the lead-only inbox-wake redesign with strong per-component test coverage:

  • Command-structure tests (test_watch_inbox_command_structure.py, test_unwatch_inbox_command_structure.py)
  • Lifecycle-emitter behavior tests (test_inbox_wake_lifecycle_emitter.py)
  • Helper tests (test_inbox_wake_lifecycle_helper.py)
  • Session-init Tier-0 directive tests (test_inbox_wake_session_init.py)
  • Session-end cleanup tests (test_inbox_wake_session_end_cleanup.py)
  • Callsite-presence tests (test_inbox_wake_callsites.py)

What we don't have: a single test exercising the full runtime flow as one integration:

TaskCreate (0→1) → PostToolUse fires → Arm directive emitted →
Skill invocation → Monitor spawned → STATE_FILE written →
Teammate SendMessage → inbox bytes grow → INBOX_GREW edge fires →
Lead receives → final TaskUpdate (1→0) → Teardown directive →
TaskStop → STATE_FILE removed

Why It Matters

The 30/60/120 quiet-window state machine, the Monitor-stdout edge-emit timing, and the Lead-Session Guard discipline are all empirically verified by dogfooding during the PR development itself, but that confidence does not ship as test coverage. Future refactors of wake_lifecycle_emitter.py, count_active_tasks, or the canonical Monitor block in watch-inbox.md could regress integration behavior without breaking any per-component test.

Constraints

  • TaskCreate / TaskUpdate / Monitor / SendMessage are platform primitives — not directly invokable from pytest. The test would need either:
    • A Claude Code subprocess harness (heavy; precedent: pact-plugin/tests/runbooks/ is manual today)
    • A simulated tool-event ingestion path (mocks PostToolUse stdin, asserts directive emit + STATE_FILE)
    • Lean on the existing dogfood pattern: a runbook + scripted assertion runner

Suggested Approach

Start with the third — extend pact-plugin/tests/runbooks/591-inbox-wake.md with explicit GIVEN/WHEN/THEN steps and a companion scripts/verify-591-runbook.sh that asserts STATE_FILE shape, Monitor task existence, and inbox-grow edge timing. Defer the full subprocess harness until we have a second feature with the same testing-gap shape.

Acceptance

  • Runbook executes end-to-end with documented expected outputs at each phase
  • Companion verification script asserts the load-bearing invariants
  • CI flag (or doc note) for when to run it manually pre-merge on changes touching wake_lifecycle, watch-inbox, or unwatch-inbox

Origin

Identified during PR #603 merge-readiness review (cycle 8 closeout, 2026-04-30). Companion to #604 (silent Monitor death tracking) and #605 (owner-flip-to-exempt edge case).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions