Skip to content

Hook integration tests: pin actual platform PostToolUse payload shapes (test-fixture-vs-production parity) #613

@michael-wojcik

Description

@michael-wojcik

Summary

pact-plugin/tests/test_inbox_wake_lifecycle_emitter.py (and any future PostToolUse-emitter hook tests) use synthesized stdin fixtures that do not match the platform's actual PostToolUse payload shape. This let a silent production failure ship in #603 (see companion issue #612 — wake_lifecycle_emitter Arm never emits): tests pass, production no-ops.

The class of bug isn't hook-specific — it's a test-fixture-vs-production shape drift. We need a captured-payload corpus and a parity invariant so this can't recur.

Empirical Evidence

Captured live PostToolUse stdin from session pact-56ce3a2a on 2026-05-02:

TaskCreate:

{\"tool_name\":\"TaskCreate\",
 \"tool_input\":{\"subject\":\"...\",\"description\":\"...\"},
 \"tool_response\":{\"task\":{\"id\":\"5\",\"subject\":\"...\"}}}

TaskUpdate:

{\"tool_name\":\"TaskUpdate\",
 \"tool_input\":{\"taskId\":\"5\",\"status\":\"deleted\"},
 \"tool_response\":{\"success\":true,\"taskId\":\"5\",\"updatedFields\":[\"deleted\"],\"statusChange\":{\"from\":\"pending\",\"to\":\"deleted\"}}}

Test fixture in test_inbox_wake_lifecycle_emitter.py:200-203 for the Arm-emitted-on-first-task-create case:

\"tool_input\": {\"taskId\": \"task-1\"},
\"tool_response\": {\"id\": \"task-1\"},

The fixture's tool_response: {\"id\": \"task-1\"} is the wrong shape for TaskCreate — production sends tool_response: {\"task\": {\"id\": \"5\"}}. The fixture also includes tool_input.taskId for TaskCreate, which production omits (id is platform-assigned post-create).

Proposal

Two complementary mitigations:

1. Captured-payload corpus

Add pact-plugin/tests/fixtures/post_tool_use/ with one JSON file per (tool_name, scenario), captured from live sessions. Examples:

  • task_create.json — production shape with tool_response.task.id
  • task_update_terminal.json — TaskUpdate(status=completed) and TaskUpdate(status=deleted)
  • task_update_metadata_only.json — non-terminal TaskUpdate
  • task_update_owner.json — owner change
  • (extend per-tool as needed)

Document the capture procedure in pact-plugin/tests/runbooks/post-tool-use-payload-capture.md: install logging shim, trigger real tool call, capture stdin, store sanitized fixture.

2. Production-shape regression test

For every PostToolUse-emitter hook (currently wake_lifecycle_emitter.py), add an integration test that pipes each captured fixture through the hook and asserts the expected emit/no-emit outcome. This is the parity invariant — synthesized fixtures alone cannot satisfy it.

Pattern (sketch):

@pytest.mark.parametrize(\"fixture_name,expected_directive\", [
    (\"task_create_first.json\", \"watch-inbox\"),
    (\"task_create_second.json\", None),
    (\"task_update_terminal_last.json\", \"unwatch-inbox\"),
    ...
])
def test_emitter_handles_production_payload(fixture_name, expected_directive, ...):
    payload = (FIXTURES / fixture_name).read_text()
    out = _run_emitter(payload, env_extra=...)
    if expected_directive:
        assert expected_directive in out[\"hookSpecificOutput\"][\"additionalContext\"]
    else:
        assert out == {\"suppressOutput\": True}

3. (Optional) Schema lint

A test that asserts every tool_response shape used in test fixtures is also represented in the captured-payload corpus. Catches drift from a different angle: a synthesized fixture that doesn't match any captured payload is suspect.

Out of Scope

Source

Surfaced 2026-05-02 in dogfood orchestration session pact-56ce3a2a after end-to-end probe of wake_lifecycle_emitter revealed test-fixture/production shape divergence.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions