Skip to content

epic: ship LLM-Based Quality Gate (3-phase rollout) #252

@stevenobiajulu

Description

@stevenobiajulu

Overview

Port the LLM-Based Quality Gate from open-agreements/open-agreements (PRs #324 + #325, ~$0.067/PR total cost) to UseJunior/safe-docx, with a checklist tuned to safe-docx's bug history. Two-phase architecture: pre-merge blocking gate + post-merge non-blocking audit.

Checklist (14 items, safe-docx-tuned)

Items 1–13 derived from two PR-history mining passes (Codex + Gemini deep research) against real safe-docx merged PRs. Item #14 (test quality) added on cross-project prior.

  1. read_file response metadata parity (fix(docx-core): declare xmlns:w14/w15 on comments root before writing prefixed attributes (#154) #180, fix(docx-mcp): warn when read_file budget is exceeded by a single node (closes #184) #186, fix(docx-mcp): surface comment_load_error on the default budgeted read path (closes #189) #191)
  2. Live DOM namespace-safe OOXML writes (fix(docx-core): declare xmlns:w14/w15 on comments root before writing prefixed attributes (#154) #180)
  3. Deleted field markup keeps w:fldChar outside w:del (fix(docx-core): validate w:delInstrText placement and reject w:fldChar inside <w:del> #211, fix(docx-core): partition field-closure validation by ECMA-376 story (#212) #225, fix(docx-core): fragment w:fldChar outside w:del per ECMA-376 Part 4 #228)
  4. Field validation per story, not global (fix(docx-core): partition field-closure validation by ECMA-376 story (#212) #225, fix(docx-core): fragment w:fldChar outside w:del per ECMA-376 Part 4 #228, feat(docx-core): sweep side-part revisions on accept/reject #218)
  5. Revision IDs seeded from all revision-bearing side parts (fix(docx-mcp): seed revision ids from side parts #216)
  6. Accept/reject sweep side parts and caches (feat(docx-core): sweep side-part revisions on accept/reject #218, fix(docx-mcp): seed revision ids from side parts #216, fix(docx-core): partition field-closure validation by ECMA-376 story (#212) #225)
  7. DocumentViewNode.heading stays canonical (fix(docx-core): harden heading detection (#157 Phase 1) #178, fix(docx-core): suppress non-sectional false-positive headings (closes #187) #188, feat(docx-core): add derived heading object to DocumentViewNode (closes #179) #190)
  8. AI-author parity across entry points (feat(docx-mcp): wire configurable AI author through MCP layer (#142) #172, fix(docx-mcp): honor SAFE_DOCX_AI_AUTHOR in CLI entry points (#181) #182)
  9. Property-change wrapper discipline (*PrChange) (feat(docx-core): emit pPrChange/trPrChange/tcPrChange from layout setters (#140) #167, feat(docx-mcp): emit rPrChange from clear_formatting MCP tool (#141) #170, feat(docx-core): emit rPrChange for formatted paragraph replacements #215)
  10. SUPPORT.md Table A drift vs. implementation ([120.8] Regression suite for canonical revision emission across the surface #143 review of replaceParagraphTextRange should emit w:rPrChange when run formatting changes #173, addCommentReply should emit body revision markup OR SUPPORT.md should be softened #174)
  11. Table A / Table B boundary on side-part revisions ([120.3] Emit w:ins/w:del for comment body anchors #138, [120.4] Emit w:ins/w:del for footnote reference and text #139)
  12. Canonical-emission surface completeness (feat(docx-mcp): wire configurable AI author through MCP layer (#142) #172, test(docx-core,docx-mcp): final regression suite for canonical emission (#143) #175, feat(docx-core): emit rPrChange for formatted paragraph replacements #215)
  13. Lean predicate drift against engine semantics (asymmetric) (feat(verification): close inv_field_001 with Tier 2 OoxmlDoc subset #208, refactor(verification): weaken inv_field_001 axiom to document-level preservationFriendly (rebased follow-up to #208) #220)
  14. Unit-test quality (avoid tautological / change-detector tests) — cross-project prior

Rollout phases

  • Phase 1 — Advisory mode (single PR): ship full 14-item checklist with LLM_GATE_BLOCKING=false. Comment posts; never sets failure status. Validate on ~3 incoming PRs.
  • Phase 2 — Blocking + override label + synchronize trigger (second PR): set LLM_GATE_BLOCKING=true, create llm-gate/override label (if not pre-created), add synchronize to pull_request trigger (open-agreements lesson), add Aggregate and post review to required status checks on main.
  • Phase 3 — Post-merge audit (third PR): ship llm-based-quality-gate-post-merge.yml + harness. Verify on next merge to main.

Pre-flight (operator)

  • Add GEMINI_API_KEY repository secret
  • Create llm-gate/override label (yellow #FBCA04)
  • Create repo Actions variables: LLM_GATE_BLOCKING, LLM_GATE_MAX_PARALLEL, LLM_GATE_PRICE_PER_M_INPUT, LLM_GATE_PRICE_PER_M_OUTPUT, LLM_GATE_MODEL, LLM_GATE_CLI_VERSION (set atomically with Phase 1 PR)

Cost expectation

~$0.033/PR pre-merge with 14 items (extrapolated from open-agreements #325: $0.0334 / 14 items ≈ $0.0024/item). Post-merge audit roughly doubles to ~$0.067/PR total. Docs-only PRs are basically the same per-check cost as substantive PRs because input tokens dominate.

Bootstrap problem mitigation

Phase 1 PR will fail the gate's base_ref checkout (action.yml doesn't exist on main yet) — same trap as open-agreements PR #324. Since no LLM-gate check is in required status checks yet, this is non-blocking. Apply override label if needed.

Phase 2 mitigation: temporarily remove the new required check from branch protection in the same PR that adds it; re-add immediately after merge.

Precedent

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions