fix: extend _extract_task_id to probe nested tool_response.task.id (#620)#638
Merged
Conversation
… captures Establishes pact-plugin/tests/fixtures/wake_lifecycle/ as the canonical location for hook PostToolUse stdin captures, with a `_meta` sibling-key provenance convention (capture_session_id, capture_date, capture_method, issue_ref). Structural defense against the failure class that produced #620: hand- constructed fixtures diverged from production payload shape, tests stayed green while production was broken. The README mandates the convention for future hook-stdin fixtures. Initial fixtures cite #612 (logging-shim capture from session pact-56ce3a2a on 2026-05-02): TaskCreate production-nested shape, TaskUpdate flat shape (fossilized for future regression backstop), and the broken pre-#620 legacy shape preserved as a regression backstop. Refs: #612, #620
) Production TaskCreate `tool_response` is nested (`tool_response.task.id`) per #612 logging-shim evidence. The prior flat-only probe returned None on every TaskCreate, killing the auto-Arm path on the first 0->1 active- task transition since #603 merged. Manual `/PACT:watch-inbox` continued to work, masking the regression. Extends `_extract_task_id` with three-block precedence: tool_input --> tool_response.task (nested, NEW) --> tool_response (flat, fallback). Public contract unchanged; signature widened to `dict[str, Any] | None` to make the pre-existing None-input defense honest. Docstring enumerates all 8 probe paths in precedence order and documents the WHY (cites #612 + capture session pact-56ce3a2a 2026-05-02). Tests: - New TestExtractTaskIdShapeResilience class with 7 named methods pinning behavior across input-side, nested-response, and flat- response paths plus failure modes (unknown shape, empty dicts, non-string ids, None input). - End-to-end regression test piping the captured production fixture through the full hook entry-point asserts an Arm directive is emitted. - 12 existing TaskCreate fixture stubs rewritten from flat to nested shape to match production going forward. Counter-test-by-revert verified: cardinality {3} on revert (the named #620 regression test, the end-to-end fixture test, and the None-input-defense test that fossilizes the isinstance guard). Refs: #620, #612, PR #603
Patch release for #620 watch-inbox Arm regression fix.
…SION constant) The v4.0.0 release added test_plugin_json_version_is_4_0_0 as a one-shot guardrail to verify the BREAKING bump landed. With v4.0.1 it would have needed bumping again; instead, generalize to a module-top EXPECTED_VERSION constant so future bumps update one location. Also scrubs planning-artifact identifiers (C1/C9/C11) from the module docstring per the agent-consumed-docs current-state-only convention; the docstring now describes what the tests assert today rather than which planning commits brought them in. Refs: #620
…whitespace-only ids) Two coupled review-driven hardenings on the extractor: D2-a: drop the dict[str, Any] | None signature widening + isinstance top-level guard. main() at L327 pre-filters non-dicts; the guard tested an unreachable production branch (#538-adjacent capability-without- consumer). Per-sub-dict isinstance guards on tool_input/tool_response are kept — main() does NOT pre-filter sub-fields, so they catch malformed-but-dict shapes that are reachable. The matching test_none_input_data_returns_none case is dropped from the regression class; counter-test-by-revert cardinality on the #620 contract goes {3} -> {2} (named regression test + end-to-end fixture test still hold). F5: reject whitespace-only ids at all 3 probe sites by adding `.strip()` to the existing isinstance(tid, str) and tid check. Previously `{tool_response: {id: ' '}}` returned `' '` (truthy whitespace), which would fail downstream TaskStop with semantically-empty id. Strengthening is symmetric across input-side, nested-response, and flat-response probes; cascading-fallthrough invariant verified (whitespace at higher-precedence falls through to next valid probe rather than short-circuiting to None). New TestExtractTaskIdShape Resilience::test_whitespace_only_id_returns_none parametrized over 5 whitespace shapes pins the contract. Also adds test_all_wake_lifecycle_fixtures_carry_meta_provenance: a top-level enforcement test that globs tests/fixtures/wake_lifecycle/ and asserts each fixture carries the README-mandated _meta provenance (capture_session_id, capture_date, capture_method, issue_ref). Closes the README-discipline-only gap surfaced by the test-engineer's adversarial-coverage review of PR #638. Refs: #620, #638
…ertions Two version-pinning test refinements driven by PR #638 architect + test-engineer reviews: D4-a: replace the EXPECTED_VERSION = "4.0.1" literal in test_plugin_json_orchestrator.py with json.loads(plugin.json)["version"] dynamic-source pattern matching sibling test_plugin_version_bump.py:23. Also drops the now-misleading "Update EXPECTED_VERSION when bumping" comment — with dynamic-source, plugin.json itself is the source of truth and there is nothing to update at this site. The structural- existence-check role is documented in the test docstring; cross-file drift detection is owned by sibling test_plugin_version_bump.py. option-b: tighten test_root_readme_version + test_pact_plugin_readme_version in test_plugin_version_bump.py with a word-bounded regex instead of plain substring search. _TARGET_VERSION_PATTERN compiles `(?<![\d.])` + re.escape(TARGET_VERSION) + `(?![\d.])` once at module load. Closes the substring-match edge case where '4.0.1' would falsely match '4.0.10' in a README. test_marketplace_json_version is left untouched (set-membership comparison is already exact). Refs: #638
Latent unused-import + 4 unused-vars in test_inbox_wake_lifecycle_emitter.py and test_plugin_version_bump.py, surfaced by Pyright re-analysis after PR #638's edits to those files. Pre-existing in main; not introduced by review remediation. Bundled here for cleanliness. test_plugin_version_bump.py: L18 drop unused `import pytest` (no @pytest.fixture / @pytest.mark / pytest.raises in this file) test_inbox_wake_lifecycle_emitter.py: L46 drop unused `team_name` param from `_pact_session_env`; 3 callers updated (L117/L123/L137). Internal helper, all callers pass the literal "t", body never references it. L436 drop unused `tmp_path` fixture from `test_is_terminal_status_update_matches_completed_and_deleted` signature; the test is in-memory predicate-only. L582 drop `result =` LHS in `test_count_active_tasks_called_on_taskupdate_terminal_status`; assertion is on `mock_count.call_count`, the call is for side effects. L603 same pattern in `test_count_active_tasks_called_on_taskcreate`. L543 sibling occurrence is deferred to #640 (suite-wide unused-var/ import scrub). Refs: #638
After D4-a (Commit 9504594) made EXPECTED_VERSION dynamic-source (`json.loads(plugin.json)["version"]`), the `test_plugin_json_version_is_pinned_to_current_release` test in test_plugin_json_orchestrator.py asserts plugin.json's version equals plugin.json's version — structurally tautological, catches nothing. Cross-file drift detection is owned by test_plugin_version_bump.py (which checks plugin.json against marketplace.json + both READMEs). Drop the no-op test rather than carry it. Drops: - test_plugin_json_version_is_pinned_to_current_release - EXPECTED_VERSION module-level computation (orphan after the test goes) Keeps: - plugin_json fixture (5 sibling tests consume it) - import json + PLUGIN_JSON_PATH (fixture needs both) - all 5 surviving tests (13-entry agents array invariants + bootstrap-commands-absent invariant) Module docstring rewritten to lead with the surviving 13-entry agents array claim + bootstrap-commands-absent invariant + cross-ref to test_plugin_version_bump.py for version-pinning ownership. Refs: #638
michael-wojcik
added a commit
that referenced
this pull request
May 6, 2026
has_task_assigned read `~/.claude/teams/{team_name}/tasks/` but the
canonical task store is `~/.claude/tasks/{team_name}/` (per
shared/task_utils.py L49). On main this caused every legitimate pact-*
specialist dispatch to F6-DENY in production; the bug was masked
because tests/_seed_team wrote to the same wrong path.
Fix:
- shared/dispatch_helpers.py L130: path corrected to canonical store
- tests/_seed_team helpers in test_dispatch_gate.py and
test_dispatch_gate_smoke.py write tasks at the canonical path; team
config.json stays under teams/{team_name}/
- 3 new regression tests (test_dispatch_gate.py): canonical-only path
satisfies has_task_assigned; legacy-only path does not; cross-
references task_utils.py to lock the path against future drift
Counter-test cardinality verified per #638 discipline: temp-revert of
the path fix → 3/3 new tests fail; revert restored → 61/61 dispatch
tests pass.
Test cardinality: 7331 → 7334 (+3). Zero regressions. pyright clean on
changed files.
4 tasks
This was referenced May 18, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #620 watch-inbox auto-Arm regression introduced by #603. Production TaskCreate
tool_responseis nested (tool_response.task.id) per #612 logging-shim evidence; the prior flat-only probe in_extract_task_idreturnedNoneon every TaskCreate, killing the auto-Arm path on the first 0→1 active-task transition. Manual/PACT:watch-inboxcontinued to work, masking the regression — including throughout this very session (STATE_FILE was never created during the dispatch of the fix itself).Changes
feat: add fixture-parity capture-provenance convention for hook stdin captures— establishespact-plugin/tests/fixtures/wake_lifecycle/with mandatory_metaprovenance (capture_session_id,capture_date,capture_method,issue_ref). Structural defense against the failure class: hand-constructed fixtures diverged from production and tests stayed green.fix: extend _extract_task_id to probe nested tool_response.task.id (#620)— three-block precedence:tool_input→tool_response.task(nested, NEW) →tool_response(flat, fallback). Public contract unchanged; signature widened todict[str, Any] | Noneto honestly type the pre-existing None-input defense. NewTestExtractTaskIdShapeResilienceclass with 7 named methods + 1 end-to-end regression piping the captured production fixture. 12 existing TaskCreate fixture stubs rewritten flat → nested.chore: bump version 4.0.0 → 4.0.1— 4-file dance per plugin-version-bump pin.test: generalize plugin.json version guardrail— replaces the v4.0.0 one-shot release guardrail with anEXPECTED_VERSIONmodule constant; scrubs stale planning-artifact identifiers (C1/C9/C11) from the docstring per the agent-consumed-docs current-state-only convention.Verification
Out of scope
test_inbox_wake_lifecycle_emitter.py800+ line file — file-size hook recommends refactor; separate Phase 2 issue.Plan reference
docs/plans/620-watch-inbox-arm-fix-plan.md(status: IMPLEMENTED).Test plan
~/.claude/teams/{team}/inbox-wake-state.json)Refs: #620, #612, PR #603, #628, #637
Coverage notes
Three existing TaskCreate Arm-positive tests (
test_arm_emitted_on_first_task_create,test_arm_includes_idempotency_clause,test_arm_directive_contains_precondition_phrase) pass on counter-test-by-revert because they includetool_input.taskIdANDtool_response.task.id. The function's input-side probe (precedence rule 1) extracts viatool_inputregardless of thetool_responseshape, so reverting the nested-task probe doesn't make these tests fail. They retain their original load-bearing role (idempotency / precondition prose pinning) but are NOT counter-tests for #620 specifically.The #620-specific counter-tests are:
TestExtractTaskIdShapeResilience::test_taskcreate_production_nested_task_shape— pipes{tool_response: {task: {id: "5"}}}only (no tool_input fallback); the named watch-inbox Arm fails on session-resume + first-active-task transition (recurring; second observed instance) #620 regression test.test_arm_emitted_on_captured_production_taskcreate_payload— pipes the captured production fixture through the full subprocess hook; the end-to-end regression.Reverting the nested-task probe block in
_extract_task_idcauses both of these tests to fail (cardinality {2}, was {3} before remediation Cycle 1 dropped the unreachable-branch None-input test).