Hook integration tests: pin actual platform PostToolUse payload shapes (test-fixture-vs-production parity)

## Summary

`pact-plugin/tests/test_inbox_wake_lifecycle_emitter.py` (and any future PostToolUse-emitter hook tests) use synthesized stdin fixtures that do not match the platform's actual `PostToolUse` payload shape. This let a silent production failure ship in #603 (see companion issue #612 — wake_lifecycle_emitter Arm never emits): tests pass, production no-ops.

The class of bug isn't hook-specific — it's a test-fixture-vs-production shape drift. We need a captured-payload corpus and a parity invariant so this can't recur.

## Empirical Evidence

Captured live `PostToolUse` stdin from session `pact-56ce3a2a` on 2026-05-02:

**TaskCreate**:
```json
{\"tool_name\":\"TaskCreate\",
 \"tool_input\":{\"subject\":\"...\",\"description\":\"...\"},
 \"tool_response\":{\"task\":{\"id\":\"5\",\"subject\":\"...\"}}}
```

**TaskUpdate**:
```json
{\"tool_name\":\"TaskUpdate\",
 \"tool_input\":{\"taskId\":\"5\",\"status\":\"deleted\"},
 \"tool_response\":{\"success\":true,\"taskId\":\"5\",\"updatedFields\":[\"deleted\"],\"statusChange\":{\"from\":\"pending\",\"to\":\"deleted\"}}}
```

**Test fixture in `test_inbox_wake_lifecycle_emitter.py:200-203`** for the Arm-emitted-on-first-task-create case:
```python
\"tool_input\": {\"taskId\": \"task-1\"},
\"tool_response\": {\"id\": \"task-1\"},
```

The fixture's `tool_response: {\"id\": \"task-1\"}` is the wrong shape for `TaskCreate` — production sends `tool_response: {\"task\": {\"id\": \"5\"}}`. The fixture also includes `tool_input.taskId` for `TaskCreate`, which production omits (id is platform-assigned post-create).

## Proposal

Two complementary mitigations:

### 1. Captured-payload corpus

Add `pact-plugin/tests/fixtures/post_tool_use/` with one JSON file per `(tool_name, scenario)`, captured from live sessions. Examples:
- `task_create.json` — production shape with `tool_response.task.id`
- `task_update_terminal.json` — TaskUpdate(status=completed) and TaskUpdate(status=deleted)
- `task_update_metadata_only.json` — non-terminal TaskUpdate
- `task_update_owner.json` — owner change
- (extend per-tool as needed)

Document the capture procedure in `pact-plugin/tests/runbooks/post-tool-use-payload-capture.md`: install logging shim, trigger real tool call, capture stdin, store sanitized fixture.

### 2. Production-shape regression test

For every PostToolUse-emitter hook (currently `wake_lifecycle_emitter.py`), add an integration test that pipes each captured fixture through the hook and asserts the expected emit/no-emit outcome. This is the parity invariant — synthesized fixtures alone cannot satisfy it.

Pattern (sketch):
```python
@pytest.mark.parametrize(\"fixture_name,expected_directive\", [
    (\"task_create_first.json\", \"watch-inbox\"),
    (\"task_create_second.json\", None),
    (\"task_update_terminal_last.json\", \"unwatch-inbox\"),
    ...
])
def test_emitter_handles_production_payload(fixture_name, expected_directive, ...):
    payload = (FIXTURES / fixture_name).read_text()
    out = _run_emitter(payload, env_extra=...)
    if expected_directive:
        assert expected_directive in out[\"hookSpecificOutput\"][\"additionalContext\"]
    else:
        assert out == {\"suppressOutput\": True}
```

### 3. (Optional) Schema lint

A test that asserts every `tool_response` shape used in test fixtures is also represented in the captured-payload corpus. Catches drift from a different angle: a synthesized fixture that doesn't match any captured payload is suspect.

## Out of Scope

- The wake-emitter Arm bug itself — tracked in companion issue #612. This issue is about preventing the *class* of failure.
- Capturing payloads for every Claude Code tool — start with the tools that PACT hooks actually consume (`TaskCreate`, `TaskUpdate`, `Task`, `Bash`, `Edit`, `Write`).

## Source

Surfaced 2026-05-02 in dogfood orchestration session pact-56ce3a2a after end-to-end probe of wake_lifecycle_emitter revealed test-fixture/production shape divergence.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hook integration tests: pin actual platform PostToolUse payload shapes (test-fixture-vs-production parity) #613

Summary

Empirical Evidence

Proposal

1. Captured-payload corpus

2. Production-shape regression test

3. (Optional) Schema lint

Out of Scope

Source

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Hook integration tests: pin actual platform PostToolUse payload shapes (test-fixture-vs-production parity) #613

Description

Summary

Empirical Evidence

Proposal

1. Captured-payload corpus

2. Production-shape regression test

3. (Optional) Schema lint

Out of Scope

Source

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions