From 070d40fecb4a870ab17123b59357eec2886e38ca Mon Sep 17 00:00:00 2001 From: James Date: Sun, 17 May 2026 22:41:55 +0000 Subject: [PATCH 1/2] fix(ci): pin chrome-headless-shell + clamp PSNR checkpoint to a valid frame MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Two narrow fixes to keep the regression suite green and reproducible. Stale baselines from the sub-composition refactor (PR #918) are being regenerated separately in PR #925; this PR is just the structural fixes that PR can't make on its own. 1. **Pin `chrome-headless-shell` in `Dockerfile.test`** to `148.0.7778.167` instead of `@stable`. `@stable` is a moving tag; every Chrome stable promotion shifts pixel output enough to fail PSNR on the golden baselines, so the regression suite silently broke whenever Docker.test rebuilt against a freshly-promoted stable. Pinning to the version `@stable` currently resolves to (matching what main's regenerated baselines were captured under) makes Chrome bumps an explicit, batched-with-baseline-regen action. Comment on the `RUN` line spells out the bump procedure. 2. **Clamp the last PSNR checkpoint to a frame the video stream actually contains.** `runTestSuite` samples 100 checkpoints across `min(rendered, snapshot)` container duration. Container duration includes audio padding past the last video frame — many-cuts is 5.654s container vs 5.6s of video at 30fps = 168 frames. At i=99 the raw container duration mapped to time 5.59746s → frame index 168 (round(5.59746 × 30)), one past the last frame the stream contains. ffmpeg's `psnr` filter emits no `average:` line for a non-existent frame, so the harness crashed with `Unable to parse PSNR output at 5.59746s` — pre-existing on plain `origin/main`, which PR #918 admin-merged through on shard-2. Miguel's regen via `--update` didn't catch it because `--update` only writes the snapshot; it doesn't validate. Subtracting one frame interval from the sampling duration guarantees the last checkpoint always lands on a real frame. Verified locally inside `Dockerfile.test`: bun run --cwd packages/producer docker:build:test bun run --cwd packages/producer docker:test many-cuts # ✅ green bun run --cwd packages/producer docker:test style-3-prod \ style-5-prod sub-composition-video # ✅ green --- Dockerfile.test | 7 ++++++- packages/producer/src/regression-harness.ts | 17 ++++++++++------- 2 files changed, 16 insertions(+), 8 deletions(-) diff --git a/Dockerfile.test b/Dockerfile.test index a20d795b1..d6e83f9c2 100644 --- a/Dockerfile.test +++ b/Dockerfile.test @@ -53,7 +53,12 @@ ENV CONTAINER=true # Install chrome-headless-shell for deterministic BeginFrame rendering. # This lightweight Chrome binary supports HeadlessExperimental.beginFrame. # Install to ~/.cache/puppeteer/ where resolveHeadlessShellPath() looks. -RUN npx --yes @puppeteer/browsers install chrome-headless-shell@stable \ +# +# Pinned to a specific build (NOT @stable) so the regression-test golden +# baselines in packages/producer/tests/*/output/output.mp4 stay reproducible. +# Each Chrome stable bump shifts pixel output enough to fail PSNR. Bump this +# version together with regenerating baselines via `docker:test:update`. +RUN npx --yes @puppeteer/browsers install chrome-headless-shell@148.0.7778.167 \ --path /root/.cache/puppeteer \ && find /root/.cache/puppeteer/chrome-headless-shell -name "chrome-headless-shell" -type f \ && echo "chrome-headless-shell installed" diff --git a/packages/producer/src/regression-harness.ts b/packages/producer/src/regression-harness.ts index 5b9d3c153..ad4f991fc 100644 --- a/packages/producer/src/regression-harness.ts +++ b/packages/producer/src/regression-harness.ts @@ -1079,16 +1079,19 @@ async function runTestSuite( videoMetadata.durationSeconds, snapshotMetadata.durationSeconds, ); + const fps = fpsToNumber(suite.meta.renderConfig.fps); + // Container duration includes audio padding past the last video frame + // (e.g. many-cuts: 5.654s container vs 5.6s of video). At i=99 the + // raw container duration maps to a frame index past nb_frames, and + // ffmpeg's PSNR filter emits no `average:` line for a non-existent + // frame. Subtract one frame interval so the last checkpoint always + // lands on a frame the video stream actually contains. + const sampleDuration = Math.max(0, videoDuration - 1 / fps); const minPsnrForMode = resolveMinPsnrForMode(options.mode, suite.meta.minPsnr); for (let i = 0; i < 100; i++) { - const time = (videoDuration * i) / 100; - const psnr = psnrAtCheckpoint( - renderedOutputPath, - snapshotVideoPath, - time, - fpsToNumber(suite.meta.renderConfig.fps), - ); + const time = (sampleDuration * i) / 100; + const psnr = psnrAtCheckpoint(renderedOutputPath, snapshotVideoPath, time, fps); visualCheckpoints.push({ time, psnr, From 21e68d9b50997eb3087a624fad25c92ac5c2bd29 Mon Sep 17 00:00:00 2001 From: James Date: Mon, 18 May 2026 18:58:39 +0000 Subject: [PATCH 2/2] ci: re-trigger regression on PR #926 (suspected shard-3 flake)