Stabilize flaky Playwright e2e tests#109
Merged
Merged
Conversation
The Extension E2E job failed intermittently in CI (passing on retry) with three recurring signatures, all rooted in timing/network nondeterminism rather than product bugs: - net::ERR_NAME_NOT_RESOLVED when a navigation to a non-resolving test domain escaped its per-test route mock and hit real DNS. - "Test timeout of 30000ms exceeded" on expectBlocked/expectAllowed, where the default 5s expect timeout was too tight for the MV3 service worker to wake, react to a storage change, and reconcile open tabs under load. - "Not attached to an active page" in the whitelist-from-blocked test, where a manual page.reload() raced the background's own tab restore. Changes: - fixtures.ts: install a context-level catch-all route that stubs every otherwise-unmocked http(s) request with a local 200, so tests never depend on real DNS. Per-test mocks still take precedence and block assertions are unaffected (blocking is chrome.tabs.update-driven). Set a 15s context navigation timeout and capture a trace on retry only (the built-in use.trace / use.navigationTimeout wiring does not apply to a directly launched persistent context). - playwright.config.ts: raise the expect timeout to 15s (keeping the 30s test ceiling) and add retries (2 on CI, 0 locally) as a backstop. - blocked.spec.ts / extension.spec.ts: replace the only real-domain target with a non-resolving stub, and convert the Go-Back Promise.all([waitForURL, click]) sites to click-then-poll-url so a transient navigation cannot reject waitForURL. Content assertions are retained. - lifecycle.spec.ts: drop the redundant manual reload in the whitelist-from-blocked test and rely on the background reconcile, matching the primed-storage and popup-disable siblings. Verified green under sustained heavy CPU load locally: 3 consecutive full-suite runs plus a repeat-each=3 run (120 executions).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
The Extension E2E job fails intermittently in CI and goes green on retry — classic flakiness, not real regressions. Pulling the actual failure logs from past runs surfaced three recurring signatures, all rooted in timing/network nondeterminism around the MV3 service worker:
net::ERR_NAME_NOT_RESOLVED— a navigation to a non-resolving test domain (*.example.test/*.example.invalid) escaped its per-testcontext.routemock and hit real DNS.Test timeout of 30000ms exceededonexpectBlocked/expectAllowed— Playwright's hidden 5s defaultexpecttimeout is too tight for the path "wake the service worker → react to a storage change → reconcile open tabs" under CI load.page.reload: Not attached to an active page— in the whitelist-from-blocked test, a manualpage.reload()raced the background's own tab restore.Changes
test/e2e/fixtures.tshttp(s)request with a local200, so no test depends on real DNS. Per-testmockAllowedPageroutes still win (Playwright matches the most-recently-registered route first), and block assertions are unaffected because blocking is driven bychrome.tabs.updateredirecting toblocked.htmlregardless of the original request. Extension/internal URLs pass through untouched.use.trace/use.navigationTimeoutwiring does not apply. Tracing on retry avoids per-action screenshot overhead on the common path — that overhead is itself a flake source under load.playwright.config.tsexpect.timeout: 15_000(the single highest-leverage fix for signatures Set up CI/CD workflows and VS Code integration #2/Drop support for all browsers except Edge #3), keeping the 30s per-test ceiling so a genuinely stuck run still fails fast.retries: process.env.CI ? 2 : 0as a backstop for residual MV3 wake jitter — kept at 0 locally so flakiness stays visible in development.listreporter on CI so retried/flaky tests stream into the log.test/e2e/blocked.spec.ts/test/e2e/extension.spec.tsexample.com) in the warning-bypass test with a non-resolving stub, removing a dependency on CI egress/DNS.Promise.all([page.waitForURL(target), click()])sites to click-then-expect.poll(() => page.url()).waitForURLis bound to the navigation lifecycle and rejects on a transient failed navigation; polling the committed URL tolerates the service worker settling a beat later. All content assertions are retained (a URL alone is weaker than verifying the restored page rendered).test/e2e/lifecycle.spec.tsbrowsingPage.reload()in the whitelist-from-blocked test and rely on the background reconcile to restore the open tab — matching theprimed-storageandpopup-disablesibling tests. The manual reload raced the background's own tab navigation and produced theNot attached to an active pagefailure.Validation
Reproduced and fixed locally under sustained heavy CPU load (cores saturated with busy workers — a quiet machine passes even when CI is flaky):
--repeat-each=3interleaved run (120/120) — all green.No product code changed; this is test-infrastructure hardening only. The fixes target root causes (determinism + adequate timeouts); retries remain a thin backstop, so a genuine regression that fails all attempts still reds the build.