feat(test): Python browser scenario harness + RC-promotion gate#8370
Merged
HydraOps-T-rav merged 35 commits intomainfrom Apr 20, 2026
Merged
feat(test): Python browser scenario harness + RC-promotion gate#8370HydraOps-T-rav merged 35 commits intomainfrom
HydraOps-T-rav merged 35 commits intomainfrom
Conversation
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Raise the pytest-rerunfailures version floor from >=14.0 to >=16.0 to match the locked version (16.1) and avoid silent misbehaviour from breaking changes in 14→15→16. Add "(Tier C)" qualifier to the scenario_browser marker description for consistency with sibling markers. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Use concrete import path for MockWorld (avoids __getattr__ union-type resolution that confused pyright) and guard teardown with getattr so future Task-7 methods don't cause pyright errors before they exist. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Extract _wire_targets(target) that accepts any duck-typed object exposing .prs/.triage_runner/.planners/.agents/.reviewers/.workspaces, enabling Task 9 to wire an orchestrator adapter without duplicating logic. The three original _wire_* methods are kept as thin backward-compatible wrappers. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements the no-orchestrator path of start_dashboard(): boots HydraFlowDashboard in-process on an ephemeral port against a seeded EventBus/StateTracker, exposes the URL via dashboard_url, and provides idempotent start + graceful stop. Adds _uvicorn_server attribute to HydraFlowDashboard so MockWorld can poll for the bound port. The browser smoke test (intentionally failing since Task 2) now passes end-to-end. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Harden MockWorld.stop_dashboard to explicitly close uvicorn's bound listener sockets (asyncio.Server.close + wait_closed) before awaiting dashboard.stop(), ensuring the ephemeral port is released immediately rather than after uvicorn's graceful-shutdown delay. Adds regression test that re-binds the port after teardown to catch any recurrence. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…to a real orchestrator Implements Task 9: _build_wired_orchestrator constructs a real HydraFlowOrchestrator (pipeline_enabled=False) and uses a _SvcAdapter to bridge the ServiceRegistry's `triage` field to the `triage_runner` name that _wire_targets expects, then patches all fake methods onto the live service registry. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Ports seed data from src/ui/e2e/fixtures/seed-state.js into Python via seed_populated_pipeline / seed_empty_pipeline helpers, backed by a new sync FakeGitHub.add_pr method. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds a useEffect in HydraFlowProvider that mirrors state.connected onto document.body[data-connected] so Playwright's wait_for_ws_ready() helper can detect WebSocket readiness via a stable DOM attribute. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Ports 10 populated + 5 empty captures from JS screenshots.spec.js to Python Playwright. Adds a stdlib-only PNG pixel-diff snapshot fixture (no Pillow/numpy) with --update-snapshots support. Fixes a pytest-asyncio session/function loop deadlock by making browser_context function-scoped, and patches PLAYWRIGHT_BROWSERS_PATH at import time to survive the hermetic HOME=/tmp/hydraflow-test environment. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Wire start_dashboard() to the harness's existing EventBus and IssueStore
via _HarnessOrchestratorShim; update seed_populated_pipeline() to call
harness.seed_issue() with correct stage names ('find'/'ready' not
'triage'/'implement'); regenerate all 15 contract baselines so populated
and empty snapshots are visibly distinct (80220 vs 81561 bytes).
Document why session-scoped Chromium fixtures are not used: the
cross-loop deadlock requires asyncio_default_test_loop_scope="session"
globally, which is high-risk for the existing unit suite.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add data-testid="orchestrator-status" hidden span to Header.jsx and two Tier-2 workflow tests verifying that clicking the dashboard's Start/Stop buttons drives the real HydraFlowOrchestrator's running state. Pyright clean; all 1034 vitest tests and both new browser tests pass. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Update HitlPage page object to match real HITLTable.jsx testids:
- open() navigates to / and clicks the HITL tab (React ignores ?tab= URL param)
- item() → hitl-row-{N} (was hitl-item-{N})
- correction_input() → hitl-textarea-{N} (was hitl-correction-input-{N})
- submit_button() → hitl-retry-{N} (was hitl-submit-{N})
- add detail(), skip_button(), close_button() helpers
- Add test_hitl_roundtrip.py: Playwright route-intercepts /api/control/status
(to report running) and /api/hitl (to serve issue 208), clicks Skip, asserts
the row disappears after the intercepted /api/hitl/208/skip returns ok.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Drive the dashboard to register a repo via the UI, verifying the registration endpoint is called with the correct path and the repo appears in the repo-selector dropdown list. Uses Playwright route interception for GET /api/repos and POST /api/repos/add since MockWorld.start_dashboard() does not wire register_repo_cb or repo_store (making the native backend return 503 and can_register: false). Adds data-testid="repo-register-input" to the filesystem-path input in RegisterRepoDialog for reliable selection. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds a Tier-2 browser scenario that verifies the automated merge_update
WebSocket event is propagated from the EventBus through the WS handler to
the React frontend, where the Outcomes panel reflects the merged PR state
("#301 (merged)") in the expanded issue row.
Uses route interception for /api/prs and /api/issues/history (same seam as
Tasks 20/21) and world._harness.bus.publish() to inject the merge_update
event that simulates what the review loop emits when PRManager.merge_pr()
succeeds. Documents in the module docstring that PR approval in HydraFlow is
fully automated with no user-facing Approve button.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds Task 23 — edits rc_cadence_hours in the staging_promotion worker card (StagingPromotionSettingsPanel) and asserts PATCH /api/control/config is fired with the correct payload on blur. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Port H1 from tests/scenarios/test_happy.py to a browser-driven scenario.
Establishes the implementation pattern for the remaining 23 Tier-3 ports.
Pattern chosen: Python-trigger + UI-assert (not full E2E via UI Start button).
Full E2E is unworkable because clicking Start in the dashboard creates a
brand-new HydraFlowOrchestrator (dashboard_routes/_control_routes.py line 227)
that replaces the pre-wired orchestrator, losing all FakeLLM / FakeGitHub
wiring. This is a confirmed Phase 5 finding.
The viable pattern:
1. IssueBuilder seeds the world (identical to Python-only test_happy.py H1).
2. world.run_pipeline() drives all four phases through wired fakes.
3. world._harness.store.mark_merged(1) syncs the IssueStore — needed because
PipelineHarness builds PostMergeHandler(store=None) so mark_merged is not
called during run_pipeline(); FakeGitHub.pr.merged=True but the store
is not updated automatically.
4. Dashboard starts with with_orchestrator=False (lightweight shim exposes
harness IssueStore; _is_pipeline_active returns True because shim.running).
5. Route interceptor sets control/status to "running" for future multi-tab use.
6. UI asserts: stage-header-merged contains "1 merged"; flow-dot-1 is visible.
7. Python asserts: world.github.pr_for_issue(1).merged is True.
This pattern scales to H2–L8: seed via IssueBuilder / set_phase_result,
run_pipeline(), mark_merged for merged issues, assert stage-header-{stage}.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds four new scenario_browser tests porting H2-H5 from tests/scenarios/test_happy.py using the render-after-run pattern established by the H1 pilot. Key patterns discovered: - H2: mark_merged must be called for all N issues before start_dashboard; stage-header-merged asserts "N merged". - H3 (failed implement): a failed issue is consumed from the implement queue but never re-queued or merged, so it has no flow-dot in the pipeline snapshot. DOM assertion is that stage-header-merged is absent of "1 merged" and stage-section-implement is present. - H4: identical seed to H1 but additionally asserts review_result is not None before the merged-stage UI check. - H5 (sub-issues): NewIssueSpec bodies shorter than 50 chars are skipped by plan_phase, so sub-issues never reach GitHub. Parent issue #1 still proceeds to merge; DOM asserts "1 merged" for the parent; Python asserts plan_result.new_issues has 2 entries. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Port all six sad-path scenarios from tests/scenarios/test_sad.py to browser-rendered assertions. Two behavioural findings recorded in test docstrings: S1 stalls at plan (harness consumes the first failure result and does not auto-retry), and S6 now merges (FakeLLM wiring has evolved since the reference comment was written). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Ports five edge-case scenarios (duplicate issues, on_phase hook, stale worktree GC, epic child ordering, zero-diff implement) from tests/scenarios/test_edge.py to Playwright browser tests following the render-after-run pattern established by H1-H5 and S1-S6. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Ports all 8 background-loop scenarios to browser tests in test_loops_browser.py. Each test seeds world state matching the reference test_loops.py, runs the loop via run_with_loops, asserts Python-side state, then boots the dashboard shim and applies a negative DOM assertion confirming no crash and all pipeline sections are visible. L2 carries the same xfail as the reference (workspace_gc calls gh api via run_subprocess which is not stubbed in the harness). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Resolves conflicts in 5 files:
- src/ui/package.json + package-lock.json: keep HEAD (no @playwright/test,
no @axe-core/playwright). Main's new e2e/{a11y,interactions}.spec.js
deleted — JS suite is gone per ADR-0042 plan; a11y + interactions
coverage must be re-added to the Python harness as a follow-up.
- tests/scenarios/fakes/fake_github.py: keep both add_pr (HEAD) and
add_alerts (main).
- tests/scenarios/fakes/mock_world.py: keep both clock_start (HEAD) and
wiki_store/beads_manager (main) ctor kwargs.
- tests/scenarios/fakes/test_mock_world.py: keep both test suites.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Delivers the full plan from
docs/superpowers/specs/2026-04-18-browser-scenario-harness-design.md: a Python-driven Playwright harness that runs the real HydraFlow dashboard againstMockWorldfakes, plus a newscenario-browserjob wired intorc-promotion-scenario.yml. Replaces the JS Playwright suite entirely.46 new browser tests land (Tier-1 contract + Tier-2 workflow + Tier-3 scenario + smoke). Full scenario suite post-merge: 160/160 scenario, 150/150 fakes, 48/48+1 xfailed browser.
What's in here
scenario_browsermarker, pytest-playwright + pytest-rerunfailures deps, browser test directoryFakeClock.freeze,MockWorld.clock_start,MockWorld.add_repo,_wire_targets(PipelineHarness ↔ ServiceRegistry unified),FakeGitHub.add_prsync helper,MockWorld.start_dashboard(with + without orchestrator),HydraFlowDashboard._uvicorn_serverfor port introspectionbody[data-connected]WS-ready signal,_HarnessOrchestratorShimfor contract pathsrc/ui/e2e/,playwright.config.js,@playwright/testnpm deps,__HYDRAFLOW_SEED_STATE__code path,visual-captureCI job,screenshotMakefile targetsscenario-browserjob inrc-promotion-scenario.yml,make scenario-browsertargetArchitecture decisions worth review attention
/api/control/startbuilds a fresh orchestrator and loses fake wiring. Tier-3 scenario tests use Python-siderun_pipeline()+ UI renders result instead. Workflow tests that exercise routes bypassingFakeGitHub(e.g.PRManager→ realghCLI) usepage.route()interception._HarnessOrchestratorShim: Contract tests passwith_orchestrator=Falsewhich routes through a small shim that reportsrunning=True, forwardsissue_storeto the harness, and returns empty-safe sentinels for everything else. Keeps the dashboard rendering populated state without booting a real orchestrator. Confined toMockWorld; not used in workflow/scenario paths.asyncio_default_test_loop_scope=function. Documented inline inconftest.py. Tier-3 pays full Chromium startup per test (~3s). Addressing this is a worthwhile perf win but out of scope here.?tab=URL params are not honored). A follow-up should click tabs in test setup rather than URL-routing.What this does NOT include (explicitly deferred)
Browser Scenariosis added to CI but not marked required until it demonstrates stability on RC PRs. That's a one-click admin action after a soak window.src/ui/e2e/{a11y,interactions}.spec.jsin test(scenarios): Tier 2 — FakeBeads + bead workflow + 5 caretaker loops #8360/test(scenarios): Tier 3 — 10 caretaker loops + UI interactions + a11y + hook + fuzz #8361 after this branch cut. Those files are removed here as part of the planned JS suite deletion. Re-adding their coverage in the Python harness is a follow-up — not done to keep this PR focused.Diff stats
.mdadditions merged in from main)Test plan
make scenario— 160/160 passmake scenario-browserlocally — 48 passed + 1 xfailed (L2 workspace_gc, matching the reference)--no-verifymock_world.py,fake_github.py,test_mock_world.py,package.jsonKnown non-issues
tests/scenarios/fakes/— caused by the lazy__getattr__intests/scenarios/fakes/__init__.py. CLI pyright is clean.test_build_prompt_truncates_long_bodyon main is unrelated to this PR.🤖 Generated with Claude Code