Context
Tracked from PR #477 (#401 teachback gate) peer review. Finding F5 from review-security-engineer (task #20 HANDOFF). One of 3 peripheral content-validation rule strengthenings proposed; the other two (strict citation default, 2-token share) were shipped in cycle 2 remediation.
Problem
teachback_validate._template_density_fails rejects a field if template-phrases ("looks good", "no issues", "approved", "proceed", "understood", etc.) occupy ≥ 50% of content by character count.
At 50%, filler-around-rubber-stamp content passes. Security-engineer's example: "looks good overall but I have questions about implementation" = 17.8% density → PASSES. Only pure rubber-stamping hits 50%.
Two possible strengthenings
Option A: Lower threshold to 25%. "looks good overall but I have questions about implementation" becomes detected.
Option B: Switch to substring-count detection. ≥ 2 blocklist phrases = fail, regardless of content length. Catches "looks good. proceed. approved." even when padded with other content.
Why deferred from PR #477
Significant posture shift requiring empirical data before committing. Phase 1 advisory mode generates the observation signal:
- If Phase 1 shows high false-negative rate (teammates satisfying rules while writing cooperative-hollow content): lower threshold or switch to substring-count.
- If Phase 1 shows high false-positive rate (legitimate terse communication blocked): keep 50% or raise.
- Zero data yet; pre-tuning risks over-correction.
Dependency
Depends on Phase 2 flip readiness (follow-up PR per docs/plans/teachback-gate-plan.md LAST-COMMIT discipline + F10 F10 criterion). After 2 workflows at variety ≥ 7 generate Phase 1 data, revisit.
Related
Context
Tracked from PR #477 (#401 teachback gate) peer review. Finding F5 from review-security-engineer (task #20 HANDOFF). One of 3 peripheral content-validation rule strengthenings proposed; the other two (strict citation default, 2-token share) were shipped in cycle 2 remediation.
Problem
teachback_validate._template_density_failsrejects a field if template-phrases ("looks good", "no issues", "approved", "proceed", "understood", etc.) occupy ≥ 50% of content by character count.At 50%, filler-around-rubber-stamp content passes. Security-engineer's example:
"looks good overall but I have questions about implementation"= 17.8% density → PASSES. Only pure rubber-stamping hits 50%.Two possible strengthenings
Option A: Lower threshold to 25%. "looks good overall but I have questions about implementation" becomes detected.
Option B: Switch to substring-count detection. ≥ 2 blocklist phrases = fail, regardless of content length. Catches "looks good. proceed. approved." even when padded with other content.
Why deferred from PR #477
Significant posture shift requiring empirical data before committing. Phase 1 advisory mode generates the observation signal:
Dependency
Depends on Phase 2 flip readiness (follow-up PR per
docs/plans/teachback-gate-plan.mdLAST-COMMIT discipline + F10 F10 criterion). After 2 workflows at variety ≥ 7 generate Phase 1 data, revisit.Related