Summary
pact-plugin/tests/test_inbox_wake_lifecycle_emitter.py (and any future PostToolUse-emitter hook tests) use synthesized stdin fixtures that do not match the platform's actual PostToolUse payload shape. This let a silent production failure ship in #603 (see companion issue #612 — wake_lifecycle_emitter Arm never emits): tests pass, production no-ops.
The class of bug isn't hook-specific — it's a test-fixture-vs-production shape drift. We need a captured-payload corpus and a parity invariant so this can't recur.
Empirical Evidence
Captured live PostToolUse stdin from session pact-56ce3a2a on 2026-05-02:
TaskCreate:
{\"tool_name\":\"TaskCreate\",
\"tool_input\":{\"subject\":\"...\",\"description\":\"...\"},
\"tool_response\":{\"task\":{\"id\":\"5\",\"subject\":\"...\"}}}
TaskUpdate:
{\"tool_name\":\"TaskUpdate\",
\"tool_input\":{\"taskId\":\"5\",\"status\":\"deleted\"},
\"tool_response\":{\"success\":true,\"taskId\":\"5\",\"updatedFields\":[\"deleted\"],\"statusChange\":{\"from\":\"pending\",\"to\":\"deleted\"}}}
Test fixture in test_inbox_wake_lifecycle_emitter.py:200-203 for the Arm-emitted-on-first-task-create case:
\"tool_input\": {\"taskId\": \"task-1\"},
\"tool_response\": {\"id\": \"task-1\"},
The fixture's tool_response: {\"id\": \"task-1\"} is the wrong shape for TaskCreate — production sends tool_response: {\"task\": {\"id\": \"5\"}}. The fixture also includes tool_input.taskId for TaskCreate, which production omits (id is platform-assigned post-create).
Proposal
Two complementary mitigations:
1. Captured-payload corpus
Add pact-plugin/tests/fixtures/post_tool_use/ with one JSON file per (tool_name, scenario), captured from live sessions. Examples:
task_create.json — production shape with tool_response.task.id
task_update_terminal.json — TaskUpdate(status=completed) and TaskUpdate(status=deleted)
task_update_metadata_only.json — non-terminal TaskUpdate
task_update_owner.json — owner change
- (extend per-tool as needed)
Document the capture procedure in pact-plugin/tests/runbooks/post-tool-use-payload-capture.md: install logging shim, trigger real tool call, capture stdin, store sanitized fixture.
2. Production-shape regression test
For every PostToolUse-emitter hook (currently wake_lifecycle_emitter.py), add an integration test that pipes each captured fixture through the hook and asserts the expected emit/no-emit outcome. This is the parity invariant — synthesized fixtures alone cannot satisfy it.
Pattern (sketch):
@pytest.mark.parametrize(\"fixture_name,expected_directive\", [
(\"task_create_first.json\", \"watch-inbox\"),
(\"task_create_second.json\", None),
(\"task_update_terminal_last.json\", \"unwatch-inbox\"),
...
])
def test_emitter_handles_production_payload(fixture_name, expected_directive, ...):
payload = (FIXTURES / fixture_name).read_text()
out = _run_emitter(payload, env_extra=...)
if expected_directive:
assert expected_directive in out[\"hookSpecificOutput\"][\"additionalContext\"]
else:
assert out == {\"suppressOutput\": True}
3. (Optional) Schema lint
A test that asserts every tool_response shape used in test fixtures is also represented in the captured-payload corpus. Catches drift from a different angle: a synthesized fixture that doesn't match any captured payload is suspect.
Out of Scope
Source
Surfaced 2026-05-02 in dogfood orchestration session pact-56ce3a2a after end-to-end probe of wake_lifecycle_emitter revealed test-fixture/production shape divergence.
Summary
pact-plugin/tests/test_inbox_wake_lifecycle_emitter.py(and any future PostToolUse-emitter hook tests) use synthesized stdin fixtures that do not match the platform's actualPostToolUsepayload shape. This let a silent production failure ship in #603 (see companion issue #612 — wake_lifecycle_emitter Arm never emits): tests pass, production no-ops.The class of bug isn't hook-specific — it's a test-fixture-vs-production shape drift. We need a captured-payload corpus and a parity invariant so this can't recur.
Empirical Evidence
Captured live
PostToolUsestdin from sessionpact-56ce3a2aon 2026-05-02:TaskCreate:
{\"tool_name\":\"TaskCreate\", \"tool_input\":{\"subject\":\"...\",\"description\":\"...\"}, \"tool_response\":{\"task\":{\"id\":\"5\",\"subject\":\"...\"}}}TaskUpdate:
{\"tool_name\":\"TaskUpdate\", \"tool_input\":{\"taskId\":\"5\",\"status\":\"deleted\"}, \"tool_response\":{\"success\":true,\"taskId\":\"5\",\"updatedFields\":[\"deleted\"],\"statusChange\":{\"from\":\"pending\",\"to\":\"deleted\"}}}Test fixture in
test_inbox_wake_lifecycle_emitter.py:200-203for the Arm-emitted-on-first-task-create case:The fixture's
tool_response: {\"id\": \"task-1\"}is the wrong shape forTaskCreate— production sendstool_response: {\"task\": {\"id\": \"5\"}}. The fixture also includestool_input.taskIdforTaskCreate, which production omits (id is platform-assigned post-create).Proposal
Two complementary mitigations:
1. Captured-payload corpus
Add
pact-plugin/tests/fixtures/post_tool_use/with one JSON file per(tool_name, scenario), captured from live sessions. Examples:task_create.json— production shape withtool_response.task.idtask_update_terminal.json— TaskUpdate(status=completed) and TaskUpdate(status=deleted)task_update_metadata_only.json— non-terminal TaskUpdatetask_update_owner.json— owner changeDocument the capture procedure in
pact-plugin/tests/runbooks/post-tool-use-payload-capture.md: install logging shim, trigger real tool call, capture stdin, store sanitized fixture.2. Production-shape regression test
For every PostToolUse-emitter hook (currently
wake_lifecycle_emitter.py), add an integration test that pipes each captured fixture through the hook and asserts the expected emit/no-emit outcome. This is the parity invariant — synthesized fixtures alone cannot satisfy it.Pattern (sketch):
3. (Optional) Schema lint
A test that asserts every
tool_responseshape used in test fixtures is also represented in the captured-payload corpus. Catches drift from a different angle: a synthesized fixture that doesn't match any captured payload is suspect.Out of Scope
TaskCreate,TaskUpdate,Task,Bash,Edit,Write).Source
Surfaced 2026-05-02 in dogfood orchestration session pact-56ce3a2a after end-to-end probe of wake_lifecycle_emitter revealed test-fixture/production shape divergence.