fix: extend _extract_task_id to probe nested tool_response.task.id (#620) by michael-wojcik · Pull Request #638 · Synaptic-Labs-AI/PACT-Plugin

michael-wojcik · 2026-05-05T21:15:23Z

Summary

Fixes #620 watch-inbox auto-Arm regression introduced by #603. Production TaskCreate tool_response is nested (tool_response.task.id) per #612 logging-shim evidence; the prior flat-only probe in _extract_task_id returned None on every TaskCreate, killing the auto-Arm path on the first 0→1 active-task transition. Manual /PACT:watch-inbox continued to work, masking the regression — including throughout this very session (STATE_FILE was never created during the dispatch of the fix itself).

Changes

feat: add fixture-parity capture-provenance convention for hook stdin captures — establishes pact-plugin/tests/fixtures/wake_lifecycle/ with mandatory _meta provenance (capture_session_id, capture_date, capture_method, issue_ref). Structural defense against the failure class: hand-constructed fixtures diverged from production and tests stayed green.
fix: extend _extract_task_id to probe nested tool_response.task.id (#620) — three-block precedence: tool_input → tool_response.task (nested, NEW) → tool_response (flat, fallback). Public contract unchanged; signature widened to dict[str, Any] | None to honestly type the pre-existing None-input defense. New TestExtractTaskIdShapeResilience class with 7 named methods + 1 end-to-end regression piping the captured production fixture. 12 existing TaskCreate fixture stubs rewritten flat → nested.
chore: bump version 4.0.0 → 4.0.1 — 4-file dance per plugin-version-bump pin.
test: generalize plugin.json version guardrail — replaces the v4.0.0 one-shot release guardrail with an EXPECTED_VERSION module constant; scrubs stale planning-artifact identifiers (C1/C9/C11) from the docstring per the agent-consumed-docs current-state-only convention.

Verification

Counter-test-by-revert verified empirically (cp HEAD source → pytest → restore): cardinality {3} on revert (named watch-inbox Arm fails on session-resume + first-active-task transition (recurring; second observed instance) #620 regression test, end-to-end fixture test, None-input-defense test).
Targeted test file: 42/42 PASS.
Broader pact-plugin test suite: 6889 passed, 7 skipped, 0 failed.

Out of scope

wake_lifecycle_emitter: Arm directive failed to surface on TaskCreate (live regression observed in session pact-2877fe69) #637 sibling regression (parallel TaskCreate burst) — separate concurrency surface, not shape. Failure-shape observation captured in backend-coder HANDOFF for follow-up.
Restore session_init inline directives + peer_inject hook (5 functions) deleted in C4/C7 of #621 #628 sibling restoration plan (session_init inline directives) — independent file surface; either PR can land first.
Cross-file version-consistency test — recurring gap (no test asserts the 4 version-bearing files agree); to be filed as Phase 2 follow-up.
test_inbox_wake_lifecycle_emitter.py 800+ line file — file-size hook recommends refactor; separate Phase 2 issue.

Plan reference

docs/plans/620-watch-inbox-arm-fix-plan.md (status: IMPLEMENTED).

Test plan

CI green (full test suite)
Smoke: after merge, verify a fresh PACT session armed via auto-Arm on first TaskCreate (i.e. STATE_FILE appears at ~/.claude/teams/{team}/inbox-wake-state.json)
After release v4.0.1: edit GitHub release v4.0.0 description with a "Patches" entry linking v4.0.1

Refs: #620, #612, PR #603, #628, #637

Coverage notes

Three existing TaskCreate Arm-positive tests (test_arm_emitted_on_first_task_create, test_arm_includes_idempotency_clause, test_arm_directive_contains_precondition_phrase) pass on counter-test-by-revert because they include tool_input.taskId AND tool_response.task.id. The function's input-side probe (precedence rule 1) extracts via tool_input regardless of the tool_response shape, so reverting the nested-task probe doesn't make these tests fail. They retain their original load-bearing role (idempotency / precondition prose pinning) but are NOT counter-tests for #620 specifically.

The #620-specific counter-tests are:

TestExtractTaskIdShapeResilience::test_taskcreate_production_nested_task_shape — pipes {tool_response: {task: {id: "5"}}} only (no tool_input fallback); the named watch-inbox Arm fails on session-resume + first-active-task transition (recurring; second observed instance) #620 regression test.
test_arm_emitted_on_captured_production_taskcreate_payload — pipes the captured production fixture through the full subprocess hook; the end-to-end regression.

Reverting the nested-task probe block in _extract_task_id causes both of these tests to fail (cardinality {2}, was {3} before remediation Cycle 1 dropped the unreachable-branch None-input test).

… captures Establishes pact-plugin/tests/fixtures/wake_lifecycle/ as the canonical location for hook PostToolUse stdin captures, with a `_meta` sibling-key provenance convention (capture_session_id, capture_date, capture_method, issue_ref). Structural defense against the failure class that produced #620: hand- constructed fixtures diverged from production payload shape, tests stayed green while production was broken. The README mandates the convention for future hook-stdin fixtures. Initial fixtures cite #612 (logging-shim capture from session pact-56ce3a2a on 2026-05-02): TaskCreate production-nested shape, TaskUpdate flat shape (fossilized for future regression backstop), and the broken pre-#620 legacy shape preserved as a regression backstop. Refs: #612, #620

) Production TaskCreate `tool_response` is nested (`tool_response.task.id`) per #612 logging-shim evidence. The prior flat-only probe returned None on every TaskCreate, killing the auto-Arm path on the first 0->1 active- task transition since #603 merged. Manual `/PACT:watch-inbox` continued to work, masking the regression. Extends `_extract_task_id` with three-block precedence: tool_input --> tool_response.task (nested, NEW) --> tool_response (flat, fallback). Public contract unchanged; signature widened to `dict[str, Any] | None` to make the pre-existing None-input defense honest. Docstring enumerates all 8 probe paths in precedence order and documents the WHY (cites #612 + capture session pact-56ce3a2a 2026-05-02). Tests: - New TestExtractTaskIdShapeResilience class with 7 named methods pinning behavior across input-side, nested-response, and flat- response paths plus failure modes (unknown shape, empty dicts, non-string ids, None input). - End-to-end regression test piping the captured production fixture through the full hook entry-point asserts an Arm directive is emitted. - 12 existing TaskCreate fixture stubs rewritten from flat to nested shape to match production going forward. Counter-test-by-revert verified: cardinality {3} on revert (the named #620 regression test, the end-to-end fixture test, and the None-input-defense test that fossilizes the isinstance guard). Refs: #620, #612, PR #603

Patch release for #620 watch-inbox Arm regression fix.

…SION constant) The v4.0.0 release added test_plugin_json_version_is_4_0_0 as a one-shot guardrail to verify the BREAKING bump landed. With v4.0.1 it would have needed bumping again; instead, generalize to a module-top EXPECTED_VERSION constant so future bumps update one location. Also scrubs planning-artifact identifiers (C1/C9/C11) from the module docstring per the agent-consumed-docs current-state-only convention; the docstring now describes what the tests assert today rather than which planning commits brought them in. Refs: #620

…whitespace-only ids) Two coupled review-driven hardenings on the extractor: D2-a: drop the dict[str, Any] | None signature widening + isinstance top-level guard. main() at L327 pre-filters non-dicts; the guard tested an unreachable production branch (#538-adjacent capability-without- consumer). Per-sub-dict isinstance guards on tool_input/tool_response are kept — main() does NOT pre-filter sub-fields, so they catch malformed-but-dict shapes that are reachable. The matching test_none_input_data_returns_none case is dropped from the regression class; counter-test-by-revert cardinality on the #620 contract goes {3} -> {2} (named regression test + end-to-end fixture test still hold). F5: reject whitespace-only ids at all 3 probe sites by adding `.strip()` to the existing isinstance(tid, str) and tid check. Previously `{tool_response: {id: ' '}}` returned `' '` (truthy whitespace), which would fail downstream TaskStop with semantically-empty id. Strengthening is symmetric across input-side, nested-response, and flat-response probes; cascading-fallthrough invariant verified (whitespace at higher-precedence falls through to next valid probe rather than short-circuiting to None). New TestExtractTaskIdShape Resilience::test_whitespace_only_id_returns_none parametrized over 5 whitespace shapes pins the contract. Also adds test_all_wake_lifecycle_fixtures_carry_meta_provenance: a top-level enforcement test that globs tests/fixtures/wake_lifecycle/ and asserts each fixture carries the README-mandated _meta provenance (capture_session_id, capture_date, capture_method, issue_ref). Closes the README-discipline-only gap surfaced by the test-engineer's adversarial-coverage review of PR #638. Refs: #620, #638

…ertions Two version-pinning test refinements driven by PR #638 architect + test-engineer reviews: D4-a: replace the EXPECTED_VERSION = "4.0.1" literal in test_plugin_json_orchestrator.py with json.loads(plugin.json)["version"] dynamic-source pattern matching sibling test_plugin_version_bump.py:23. Also drops the now-misleading "Update EXPECTED_VERSION when bumping" comment — with dynamic-source, plugin.json itself is the source of truth and there is nothing to update at this site. The structural- existence-check role is documented in the test docstring; cross-file drift detection is owned by sibling test_plugin_version_bump.py. option-b: tighten test_root_readme_version + test_pact_plugin_readme_version in test_plugin_version_bump.py with a word-bounded regex instead of plain substring search. _TARGET_VERSION_PATTERN compiles `(?<![\d.])` + re.escape(TARGET_VERSION) + `(?![\d.])` once at module load. Closes the substring-match edge case where '4.0.1' would falsely match '4.0.10' in a README. test_marketplace_json_version is left untouched (set-membership comparison is already exact). Refs: #638

@pytest

Latent unused-import + 4 unused-vars in test_inbox_wake_lifecycle_emitter.py and test_plugin_version_bump.py, surfaced by Pyright re-analysis after PR #638's edits to those files. Pre-existing in main; not introduced by review remediation. Bundled here for cleanliness. test_plugin_version_bump.py: L18 drop unused `import pytest` (no @pytest.fixture / @pytest.mark / pytest.raises in this file) test_inbox_wake_lifecycle_emitter.py: L46 drop unused `team_name` param from `_pact_session_env`; 3 callers updated (L117/L123/L137). Internal helper, all callers pass the literal "t", body never references it. L436 drop unused `tmp_path` fixture from `test_is_terminal_status_update_matches_completed_and_deleted` signature; the test is in-memory predicate-only. L582 drop `result =` LHS in `test_count_active_tasks_called_on_taskupdate_terminal_status`; assertion is on `mock_count.call_count`, the call is for side effects. L603 same pattern in `test_count_active_tasks_called_on_taskcreate`. L543 sibling occurrence is deferred to #640 (suite-wide unused-var/ import scrub). Refs: #638

After D4-a (Commit 9504594) made EXPECTED_VERSION dynamic-source (`json.loads(plugin.json)["version"]`), the `test_plugin_json_version_is_pinned_to_current_release` test in test_plugin_json_orchestrator.py asserts plugin.json's version equals plugin.json's version — structurally tautological, catches nothing. Cross-file drift detection is owned by test_plugin_version_bump.py (which checks plugin.json against marketplace.json + both READMEs). Drop the no-op test rather than carry it. Drops: - test_plugin_json_version_is_pinned_to_current_release - EXPECTED_VERSION module-level computation (orphan after the test goes) Keeps: - plugin_json fixture (5 sibling tests consume it) - import json + PLUGIN_JSON_PATH (fixture needs both) - all 5 surviving tests (13-entry agents array invariants + bootstrap-commands-absent invariant) Module docstring rewritten to lead with the surviving 13-entry agents array claim + bootstrap-commands-absent invariant + cross-ref to test_plugin_version_bump.py for version-pinning ownership. Refs: #638

has_task_assigned read `~/.claude/teams/{team_name}/tasks/` but the canonical task store is `~/.claude/tasks/{team_name}/` (per shared/task_utils.py L49). On main this caused every legitimate pact-* specialist dispatch to F6-DENY in production; the bug was masked because tests/_seed_team wrote to the same wrong path. Fix: - shared/dispatch_helpers.py L130: path corrected to canonical store - tests/_seed_team helpers in test_dispatch_gate.py and test_dispatch_gate_smoke.py write tasks at the canonical path; team config.json stays under teams/{team_name}/ - 3 new regression tests (test_dispatch_gate.py): canonical-only path satisfies has_task_assigned; legacy-only path does not; cross- references task_utils.py to lock the path against future drift Counter-test cardinality verified per #638 discipline: temp-revert of the path fix → 3/3 new tests fail; revert restored → 61/61 dispatch tests pass. Test cardinality: 7331 → 7334 (+3). Zero regressions. pyright clean on changed files.

michael-wojcik added 4 commits May 5, 2026 17:05

chore: bump version 4.0.0 -> 4.0.1

76c6a9a

Patch release for #620 watch-inbox Arm regression fix.

This was referenced May 5, 2026

test fixture-parity convention: extend to peer_inject SubagentStart captures #639

Open

test cleanup: scrub planning-artifact identifiers from test names and section headers (suite-wide) #640

Open

michael-wojcik added 4 commits May 5, 2026 17:52

michael-wojcik merged commit 10edcc3 into main May 5, 2026

This was referenced May 6, 2026

Restore session-startup ritual (#628) #641

Merged

Cross-file plugin-version-consistency test (4-file dance enforcement) #657

Open

Dispatch-protocol hardening: rename Task→Agent + dispatch_gate + task_lifecycle_gate + bootstrap_gate F24/F25 (#662) #663

Merged

michael-wojcik mentioned this pull request May 7, 2026

Capture real UserPromptSubmit stdin fixture per #664 §8.7 #672

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: extend _extract_task_id to probe nested tool_response.task.id (#620)#638

fix: extend _extract_task_id to probe nested tool_response.task.id (#620)#638
michael-wojcik merged 8 commits into
mainfrom
fix-620-watch-inbox-arm-resume

michael-wojcik commented May 5, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

michael-wojcik commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Verification

Out of scope

Plan reference

Test plan

Coverage notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

michael-wojcik commented May 5, 2026 •

edited

Loading