Skip to content

Phase 2: flip teachback_gate to blocking mode (follow-up to #401) #481

@michael-wojcik

Description

@michael-wojcik

Context

Follow-up to issue #401 / PR #477. PR #477 ships Phase 1 (advisory mode) per the canonical plan's advisory-first discipline. This issue tracks Phase 2 (blocking mode), which was deliberately held from PR #477 pending empirical observation data.

See the canonical plan: docs/plans/teachback-gate-plan.md (local-only; also summarized in PR #477 body).

Scope

Flip teachback_gate.py from advisory to blocking mode. Single constant change + hook output semantics migration:

  • pact-plugin/hooks/teachback_gate.py: _TEACHBACK_MODE = "advisory""blocking"
  • Hook output path migration: systemMessage (exit 0, advisory) → hookSpecificOutput/permissionDecision=deny (exit 2, blocking)
  • Test assertions migrate from TestAdvisoryWarningMode expectations to TestBlockingMode expectations
  • hooks.json registration confirmed (already matcherless PreToolUse; no migration needed)

Once flipped: when a teammate at variety ≥ 7 attempts Edit/Write without valid teachback content, the hook blocks the tool call with a concrete rejection message pointing at the specific validation failure.

Gate criteria (from canonical plan §F10)

Phase 2 flip must satisfy:

Zero false-positive blocks observed over 2 consecutive full workflows at variety ≥ 7, measured via teachback_gate_advisory journal events with would_have_blocked=true.

A false positive is a well-formed teachback that the advisory rules flagged anyway — meaning the peripheral content-rules are too strict. Zero false positives over 2 workflows indicates the rules are calibrated correctly for the honest-reframe posture (ritual enforcement for honest-but-careless output).

Pre-existing diagnostic

pact-plugin/scripts/check_teachback_phase2_readiness.py (shipped in PR #477 commit c309f1c) reads session journals and computes the F10 criterion. Use this to determine when Phase 2 is ready to flip:

python3 pact-plugin/scripts/check_teachback_phase2_readiness.py
# → pass/fail + observation counts per workflow

Dependent follow-ups

Per peer-review synthesis on PR #477, these tuning questions become actionable after Phase 1 observation data surfaces:

Blocked until

Implementation outline

  1. Run the readiness diagnostic; confirm zero false positives over 2 workflows
  2. Review teachback_gate_advisory events to tune peripheral rules if needed (Post-Phase-1: tune template-density threshold (50% → 25%) based on observation data #479 deferral)
  3. Single-commit change: flip constant + migrate output semantics + update TestAdvisoryWarningModeTestBlockingMode
  4. Lock in _TEACHBACK_MODE removal from public-flag surface once Phase 2 is verified stable

Precedent

This mirrors PR #407 (bootstrap gate enforcement) → PR #415 + issue #414 multi-phase shipping: advisory-first then hardening-follow-up is a PACT-wide convention for enforcement-code rollouts.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions