fix(ci): pin chrome-headless-shell to fix regression baseline drift#919
Closed
jrusso1020 wants to merge 2 commits into
Closed
fix(ci): pin chrome-headless-shell to fix regression baseline drift#919jrusso1020 wants to merge 2 commits into
jrusso1020 wants to merge 2 commits into
Conversation
`Dockerfile.test:56` installed `chrome-headless-shell@stable` via `@puppeteer/browsers`. `@stable` is a moving tag, so every Chrome stable bump shifted pixel output enough to fail PSNR on the golden baselines. The regression suite silently broke whenever Docker.test rebuilt against a freshly-promoted stable. Pin to `chrome-headless-shell@148.0.7778.167` — the Chrome 148 stable build that `@stable` currently resolves to, matching what most goldens on `main` were captured against. Comment notes that future bumps must be paired with `docker:test:update` so the pin and the baselines stay in lockstep. Also regenerates the `style-12-prod` golden baseline. PR #918 regenerated it once at b9bdc80, but that commit landed *before* the `refactor: extract shared inlineSubCompositions from bundler and producer` (581e7a7) and the linkedom-fragment fix (754b0ed) in the same stack. The compiler refactor changes `__hfRootSelector` from `null` to a scoped `[data-composition-id="..."]` selector in the inlined sub-compositions, which affects the rendered output. style-12-prod was the one fixture in that stack that didn't get a second regen pass after the refactor, so it has been failing on plain `origin/main` (PSNR ~13 from frame 8.26s onward — the mondrian-colors blocks no longer match expected). The new baseline regenerated under this pin passes at PSNR 62-102 dB.
436c844 to
0386246
Compare
Two related fixes pulled out of CI failures on the Chrome pin run: 1. **regression-harness PSNR-parse crash on many-cuts.** Container duration includes audio padding past the last video frame (many-cuts: 5.654s container, 5.6s of video at 30fps = 168 frames). At i=99 the raw container duration mapped to time 5.59746s → frame index 168 (round(5.59746 * 30)), which is one past the last frame the stream contains. ffmpeg's `psnr` filter emits no `average:` line for a non-existent frame, so the harness crashed with `Unable to parse PSNR output at 5.59746s`. The fix subtracts one frame interval from the sampling duration so the last checkpoint always lands on a frame the video stream actually contains. PR #918 admin-merged through this same failure on shard-2 (so main is currently red on many-cuts), and Miguel's regen via `--update` didn't catch it because `--update` only writes the snapshot — it doesn't validate. 2. **style-1-prod baseline regen.** Same pattern as style-12-prod: PR #918's regen was done before / between the `refactor: extract shared inlineSubCompositions from bundler and producer` (581e7a7) and the linkedom-fragment fix (754b0ed), so the committed baseline doesn't match what the compiler now emits. Reproduced locally: frames 14.62s onward fail at PSNR ~10-16 because the graphics sub-composition layer (`#a-roll-frame` overlay) now correctly renders through host duration but was absent in the committed baseline. Regenerated under the Chrome 148.0.7778.167 pin from this PR — now passes at PSNR 53-62 dB across all checkpoints.
4 tasks
Collaborator
Author
|
Closing in favor of two focused, parallel PRs:
Splitting avoids the merge conflict that would happen on |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Pin
chrome-headless-shellto148.0.7778.167inDockerfile.test(was@stable) and regenerate thestyle-12-prodgolden baseline under the pin.Why
Dockerfile.test:56installedchrome-headless-shell@stablevia@puppeteer/browsers.@stableis a moving tag — every Chrome stable bump shifts pixel output enough to fail PSNR on the golden baselines, so the regression suite silently broke whenever the test image was rebuilt against a freshly-promoted stable. Pinning to a numeric version makes the test environment reproducible and ties Chrome bumps to explicit baseline regenerations.While here,
style-12-prodis also regenerated. PR #918 regenerated it once at b9bdc80, but that commit landed before therefactor: extract shared inlineSubCompositions from bundler and producer(581e7a7) and the linkedom-fragment fix (754b0ed) in the same stack. The refactor changes__hfRootSelectorfromnullto a scoped[data-composition-id="..."]selector in inlined sub-compositions, which affects the render.style-12-prodwas the one fixture in that stack that didn't get a second regen pass after the refactor, so it has been failing on plainorigin/mainfrom frame 8.26s onward (PSNR ~13 — the mondrian-colors blocks no longer match expected).How
chrome-headless-shell@148.0.7778.167. This is what@stablecurrently resolves to pergooglechromelabs.github.io/chrome-for-testing/last-known-good-versions.jsonand matches the version Miguel's earlier baseline regens were captured against.packages/producer/tests/style-12-prod/output/{compiled.html,output.mp4}viadocker:test:update. Thecompiled.htmldiff is small (~18 lines): scoped__hfRootSelectorvalues now match what the post-refactor compiler emits.docker:test:update.Test plan
Dockerfile.test