Background
PR #663 (closes #662) enforced behavioral test names (no planning-artifact identifiers like F-row labels or sketch IDs) and used a counter-test runbook for fresh-session validation of hook behavior. Two test-discipline follow-ups surfaced during review.
Task A — Behavioral-name linter for tests
Add a static check that flags planning-artifact identifiers in test names. The pinned feedback_no_planning_artifacts_in_repo rule covers this in prose; mechanical enforcement closes the drift surface.
Audit pattern:
rg -nE '\bF[0-9]+\b|\bT[0-9]+\b|\bS-[0-9]+\b|Sketch [A-Z]|architect §' \
pact-plugin/tests/ \
--type py
Should return zero hits. Wrap as a pytest plugin or a pre-commit hook. Counter-test by adding a test named test_f1_deny_when_name_empty and confirming the linter catches it.
Carve-outs:
Closes #N / Fixes #N in commit messages and runbook titles
- External standards refs (RFC, OWASP, CVE)
- Internal Python identifiers with no user-visible counterpart (e.g., a private module-local constant)
Surfaced by test-engineer-blind in PR #663 review as Q2.
Task B — Stage-readiness doctrine for runbooks
PR #663 added pact-plugin/tests/runbooks/662-dispatch-gate.md as a counter-test runbook for fresh-session validation. The runbook format works but lacks a documented standard for:
- When to add a runbook (which PRs warrant one)
- Required sections (counter-test by mutation, fail-closed wrapper sabotage, observable-behavior diff against
main)
- Stage-readiness criteria — when is a runbook "ready" to ship as part of a PR
Document the doctrine in pact-plugin/tests/runbooks/README.md. Reference the 662-dispatch-gate runbook as the canonical example.
Stage-readiness criteria draft:
- Counter-test cardinality is documented (e.g., "revert against
main produces deny cardinality {3} for matcher mutation")
- Each section header is behavioral, not planning-artifact (e.g., "Matcher registration fidelity (counter-test by mutation)" not "F22 fidelity")
- Section content includes the exact bash/python commands to reproduce, not just prose description
- Runbook title may include
(#N) provenance per the planning-artifact carve-out
Surfaced by test-engineer-blind in PR #663 review as Q3.
Relationship to other follow-ups
Test plan
For Task A: add a parametrized test that exercises each planning-artifact pattern against a known-bad fixture and confirms the linter flags it. Counter-test by removing the linter rule and confirming the test fails.
For Task B: this is documentation, not code. Stage-readiness is met when the doctrine README is written and the next runbook PR references it.
Background
PR #663 (closes #662) enforced behavioral test names (no planning-artifact identifiers like F-row labels or sketch IDs) and used a counter-test runbook for fresh-session validation of hook behavior. Two test-discipline follow-ups surfaced during review.
Task A — Behavioral-name linter for tests
Add a static check that flags planning-artifact identifiers in test names. The pinned
feedback_no_planning_artifacts_in_reporule covers this in prose; mechanical enforcement closes the drift surface.Audit pattern:
rg -nE '\bF[0-9]+\b|\bT[0-9]+\b|\bS-[0-9]+\b|Sketch [A-Z]|architect §' \ pact-plugin/tests/ \ --type pyShould return zero hits. Wrap as a pytest plugin or a pre-commit hook. Counter-test by adding a test named
test_f1_deny_when_name_emptyand confirming the linter catches it.Carve-outs:
Closes #N/Fixes #Nin commit messages and runbook titlesSurfaced by test-engineer-blind in PR #663 review as Q2.
Task B — Stage-readiness doctrine for runbooks
PR #663 added
pact-plugin/tests/runbooks/662-dispatch-gate.mdas a counter-test runbook for fresh-session validation. The runbook format works but lacks a documented standard for:main)Document the doctrine in
pact-plugin/tests/runbooks/README.md. Reference the 662-dispatch-gate runbook as the canonical example.Stage-readiness criteria draft:
mainproduces deny cardinality {3} for matcher mutation")(#N)provenance per the planning-artifact carve-outSurfaced by test-engineer-blind in PR #663 review as Q3.
Relationship to other follow-ups
feedback_no_planning_artifacts_in_repo(project CLAUDE.md pin) — Task A is the mechanical enforcement of this rule. The pin states the rule; this issue codifies it.Test plan
For Task A: add a parametrized test that exercises each planning-artifact pattern against a known-bad fixture and confirms the linter flags it. Counter-test by removing the linter rule and confirming the test fails.
For Task B: this is documentation, not code. Stage-readiness is met when the doctrine README is written and the next runbook PR references it.