feat(harness): rewrite-quality A/B (single vs ouroboros multi-pass) (5.4.0) by devswha · Pull Request #500 · devswha/patina

devswha · 2026-06-15T11:06:42Z

Summary

Builds the robust measurement harness to answer pipeline questions with data (per the plan: build the harness first, then measure). Bumps to 5.4.0 (minor — opt-in contributor tool; no CLI/schema/detection-behavior change).

scripts/rewrite-ab.mjs (npm run quality:rewrite-ab) compares two rewrite configurations on the same live-quality fixtures:

produces a rewrite per config, model-grades both (before/after AI score, MPS, fidelity via the existing scoreText/scoreMPS/scoreFidelity),
measures word-level edit churn,
picks a per-fixture winner (lowest after-AI-score among configs meeting the MPS/fidelity floors; ties broken on churn),
reports per-config aggregates + head-to-head wins.

Why this shape

The earlier review flagged the multi-agent / --strict stack as over-engineered because it was never measured. The CLI multi-pass already exists as --ouroboros (detect → rewrite → score → rollback with MPS/fidelity floors), so the default A/B is single vs ouroboros — no redundant new --strict CLI mode. This makes "does a multi-pass pipeline rewrite better?" measurable, so the multi-agent surface can be kept or cut with evidence in the next (measurement) phase.

--strict itself lives only in SKILL.md (agent skill) and is not CLI-reachable; --ouroboros is its CLI-measurable proxy.

Verification

npm test — 797 pass / 0 fail (6 new rewrite-ab unit tests: editChurn, pickWinner, compare/aggregate, error handling — all with injected producers, no live model)
npm run lint — syntax OK (161 files), cspell 0 issues
npm run release:check — OK for 5.4.0
npm run check:no-private-assets — OK

LLM-backed and opt-in (--live / PATINA_LIVE), like quality:live; not in mandatory CI. Documented in docs/HARNESS.md and tests/quality/README.md.

Next phase (separate): run the A/B with a backend to measure single vs ouroboros, then decide keep/cut the multi-agent surface from data.

Adds scripts/rewrite-ab.mjs (npm run quality:rewrite-ab): for each live-quality fixture it produces a rewrite per config, model-grades both (before/after AI, MPS, fidelity via the existing scoreText/scoreMPS/scoreFidelity), measures word-level edit churn, and picks a per-fixture winner (lowest after-AI among configs meeting the MPS/fidelity floors, ties broken on churn) + per-config aggregates and head-to-head wins. Default comparison is single (one-shot) vs ouroboros (the existing CLI multi-pass), so "does a multi-pass/multi-agent pipeline rewrite better?" is answerable with data instead of intuition — no redundant new --strict CLI mode. LLM-backed/opt-in like quality:live; the comparison/aggregation core is unit-tested with injected producers. Documented in HARNESS.md + quality README.

Minor bump for the opt-in rewrite-quality A/B harness. No CLI/schema/pattern/ detection-behavior change; all four languages byte-identical. Syncs version surfaces + CHANGELOG.

vercel · 2026-06-15T11:06:45Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
patina	Ready	Preview, Comment	Jun 15, 2026 11:06am

devswha added 2 commits June 15, 2026 20:05

chore(release): 5.4.0

745074c

Minor bump for the opt-in rewrite-quality A/B harness. No CLI/schema/pattern/ detection-behavior change; all four languages byte-identical. Syncs version surfaces + CHANGELOG.

vercel Bot deployed to Preview June 15, 2026 11:06 View deployment

devswha merged commit b75dbf0 into main Jun 15, 2026
8 checks passed

devswha deleted the bot/rewrite-quality-harness branch June 15, 2026 11:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(harness): rewrite-quality A/B (single vs ouroboros multi-pass) (5.4.0)#500

feat(harness): rewrite-quality A/B (single vs ouroboros multi-pass) (5.4.0)#500
devswha merged 2 commits into
mainfrom
bot/rewrite-quality-harness

devswha commented Jun 15, 2026

Uh oh!

vercel Bot commented Jun 15, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

devswha commented Jun 15, 2026

Summary

Why this shape

Verification

Uh oh!

vercel Bot commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel Bot commented Jun 15, 2026 •

edited

Loading