Skip to content

fix(ci): pin chrome-headless-shell to fix regression baseline drift#919

Closed
jrusso1020 wants to merge 2 commits into
mainfrom
fix/ci-pin-chrome-headless-shell
Closed

fix(ci): pin chrome-headless-shell to fix regression baseline drift#919
jrusso1020 wants to merge 2 commits into
mainfrom
fix/ci-pin-chrome-headless-shell

Conversation

@jrusso1020
Copy link
Copy Markdown
Collaborator

@jrusso1020 jrusso1020 commented May 17, 2026

What

Pin chrome-headless-shell to 148.0.7778.167 in Dockerfile.test (was @stable) and regenerate the style-12-prod golden baseline under the pin.

Why

Dockerfile.test:56 installed chrome-headless-shell@stable via @puppeteer/browsers. @stable is a moving tag — every Chrome stable bump shifts pixel output enough to fail PSNR on the golden baselines, so the regression suite silently broke whenever the test image was rebuilt against a freshly-promoted stable. Pinning to a numeric version makes the test environment reproducible and ties Chrome bumps to explicit baseline regenerations.

While here, style-12-prod is also regenerated. PR #918 regenerated it once at b9bdc80, but that commit landed before the refactor: extract shared inlineSubCompositions from bundler and producer (581e7a7) and the linkedom-fragment fix (754b0ed) in the same stack. The refactor changes __hfRootSelector from null to a scoped [data-composition-id="..."] selector in inlined sub-compositions, which affects the render. style-12-prod was the one fixture in that stack that didn't get a second regen pass after the refactor, so it has been failing on plain origin/main from frame 8.26s onward (PSNR ~13 — the mondrian-colors blocks no longer match expected).

How

  • Pin chrome-headless-shell@148.0.7778.167. This is what @stable currently resolves to per googlechromelabs.github.io/chrome-for-testing/last-known-good-versions.json and matches the version Miguel's earlier baseline regens were captured against.
  • Regenerate packages/producer/tests/style-12-prod/output/{compiled.html,output.mp4} via docker:test:update. The compiled.html diff is small (~18 lines): scoped __hfRootSelector values now match what the post-refactor compiler emits.
  • Added a comment on the pin: future Chrome bumps must be paired with docker:test:update.

Test plan

bun run --cwd packages/producer docker:build:test
bun run --cwd packages/producer docker:test style-12-prod      # ✅ PSNR 62-102 dB, 0 failed frames
bun run --cwd packages/producer docker:test style-3-prod       # ✅ unchanged baselines still pass under pin
bun run --cwd packages/producer docker:test style-5-prod       # ✅ unchanged baselines still pass under pin
bun run --cwd packages/producer docker:test sub-composition-video  # ✅ unchanged baselines still pass under pin
  • Manual regression run (style-12-prod) green inside Dockerfile.test
  • Spot-checked 3 other recently-regenerated baselines under the pin
  • Unit tests added/updated — N/A (CI/baseline change)
  • Documentation updated — N/A

`Dockerfile.test:56` installed `chrome-headless-shell@stable` via
`@puppeteer/browsers`. `@stable` is a moving tag, so every Chrome stable
bump shifted pixel output enough to fail PSNR on the golden baselines.
The regression suite silently broke whenever Docker.test rebuilt
against a freshly-promoted stable.

Pin to `chrome-headless-shell@148.0.7778.167` — the Chrome 148 stable
build that `@stable` currently resolves to, matching what most goldens
on `main` were captured against. Comment notes that future bumps must
be paired with `docker:test:update` so the pin and the baselines stay
in lockstep.

Also regenerates the `style-12-prod` golden baseline. PR #918 regenerated
it once at b9bdc80, but that commit landed *before* the
`refactor: extract shared inlineSubCompositions from bundler and producer`
(581e7a7) and the linkedom-fragment fix (754b0ed) in the same stack.
The compiler refactor changes `__hfRootSelector` from `null` to a scoped
`[data-composition-id="..."]` selector in the inlined sub-compositions,
which affects the rendered output. style-12-prod was the one fixture in
that stack that didn't get a second regen pass after the refactor, so
it has been failing on plain `origin/main` (PSNR ~13 from frame 8.26s
onward — the mondrian-colors blocks no longer match expected).
The new baseline regenerated under this pin passes at PSNR 62-102 dB.
@jrusso1020 jrusso1020 force-pushed the fix/ci-pin-chrome-headless-shell branch from 436c844 to 0386246 Compare May 17, 2026 19:02
Two related fixes pulled out of CI failures on the Chrome pin run:

1. **regression-harness PSNR-parse crash on many-cuts.** Container
   duration includes audio padding past the last video frame (many-cuts:
   5.654s container, 5.6s of video at 30fps = 168 frames). At i=99 the
   raw container duration mapped to time 5.59746s → frame index 168
   (round(5.59746 * 30)), which is one past the last frame the stream
   contains. ffmpeg's `psnr` filter emits no `average:` line for a
   non-existent frame, so the harness crashed with `Unable to parse
   PSNR output at 5.59746s`. The fix subtracts one frame interval from
   the sampling duration so the last checkpoint always lands on a
   frame the video stream actually contains. PR #918 admin-merged
   through this same failure on shard-2 (so main is currently red on
   many-cuts), and Miguel's regen via `--update` didn't catch it
   because `--update` only writes the snapshot — it doesn't validate.

2. **style-1-prod baseline regen.** Same pattern as style-12-prod:
   PR #918's regen was done before / between the `refactor: extract
   shared inlineSubCompositions from bundler and producer` (581e7a7)
   and the linkedom-fragment fix (754b0ed), so the committed baseline
   doesn't match what the compiler now emits. Reproduced locally:
   frames 14.62s onward fail at PSNR ~10-16 because the graphics
   sub-composition layer (`#a-roll-frame` overlay) now correctly
   renders through host duration but was absent in the committed
   baseline. Regenerated under the Chrome 148.0.7778.167 pin from
   this PR — now passes at PSNR 53-62 dB across all checkpoints.
@jrusso1020
Copy link
Copy Markdown
Collaborator Author

Closing in favor of two focused, parallel PRs:

Splitting avoids the merge conflict that would happen on output.mp4 if both this PR and #925 land — same Chrome version, but ffmpeg encode isn't bit-deterministic across runs, so the LFS oids diverge.

@jrusso1020 jrusso1020 closed this May 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant