Skip to content

feat(test): Python browser scenario harness + RC-promotion gate#8370

Merged
HydraOps-T-rav merged 35 commits intomainfrom
browser-scenario-harness
Apr 20, 2026
Merged

feat(test): Python browser scenario harness + RC-promotion gate#8370
HydraOps-T-rav merged 35 commits intomainfrom
browser-scenario-harness

Conversation

@HydraOps-T-rav
Copy link
Copy Markdown
Collaborator

Summary

Delivers the full plan from docs/superpowers/specs/2026-04-18-browser-scenario-harness-design.md: a Python-driven Playwright harness that runs the real HydraFlow dashboard against MockWorld fakes, plus a new scenario-browser job wired into rc-promotion-scenario.yml. Replaces the JS Playwright suite entirely.

46 new browser tests land (Tier-1 contract + Tier-2 workflow + Tier-3 scenario + smoke). Full scenario suite post-merge: 160/160 scenario, 150/150 fakes, 48/48+1 xfailed browser.

What's in here

Phase Content
1 — Skeleton scenario_browser marker, pytest-playwright + pytest-rerunfailures deps, browser test directory
2 — Layer 0 helpers FakeClock.freeze, MockWorld.clock_start, MockWorld.add_repo, _wire_targets (PipelineHarness ↔ ServiceRegistry unified), FakeGitHub.add_pr sync helper, MockWorld.start_dashboard (with + without orchestrator), HydraFlowDashboard._uvicorn_server for port introspection
3 — Tier-1 contract 15 snapshot tests (5 empty + 10 populated), seed helpers, page objects, body[data-connected] WS-ready signal, _HarnessOrchestratorShim for contract path
4 — JS deletion Removed src/ui/e2e/, playwright.config.js, @playwright/test npm deps, __HYDRAFLOW_SEED_STATE__ code path, visual-capture CI job, screenshot Makefile targets
5 — Tier-2 workflows 5 tests: orchestrator start/stop, HITL skip, repo-register, PR merge-render, config autosave
6 — Tier-3 scenarios 24 tests across H1-H5, S1-S6, E1-E5, L1-L8 — render-after-run pattern
7 — CI gate New scenario-browser job in rc-promotion-scenario.yml, make scenario-browser target

Architecture decisions worth review attention

  • Full E2E click-drive isn't viable. /api/control/start builds a fresh orchestrator and loses fake wiring. Tier-3 scenario tests use Python-side run_pipeline() + UI renders result instead. Workflow tests that exercise routes bypassing FakeGitHub (e.g. PRManager → real gh CLI) use page.route() interception.
  • _HarnessOrchestratorShim: Contract tests pass with_orchestrator=False which routes through a small shim that reports running=True, forwards issue_store to the harness, and returns empty-safe sentinels for everything else. Keeps the dashboard rendering populated state without booting a real orchestrator. Confined to MockWorld; not used in workflow/scenario paths.
  • Function-scoped browser fixture (not session-scoped) due to a pytest-asyncio cross-loop deadlock with asyncio_default_test_loop_scope=function. Documented inline in conftest.py. Tier-3 pays full Chromium startup per test (~3s). Addressing this is a worthwhile perf win but out of scope here.
  • Contract snapshot limitation. Populated differs from empty, but within each seed state all tabs render identically because the React SPA's tab state is internal (?tab= URL params are not honored). A follow-up should click tabs in test setup rather than URL-routing.

What this does NOT include (explicitly deferred)

Diff stats

  • 34 commits, ~268 files changed (the large number is mostly wiki .md additions merged in from main)
  • +4K / −11K (most deletions are the JS suite removal + e2e .spec.js fixtures)

Test plan

  • make scenario — 160/160 pass
  • Fakes suite — 150/150 pass
  • make scenario-browser locally — 48 passed + 1 xfailed (L2 workspace_gc, matching the reference)
  • Contract snapshots regenerate cleanly
  • Pyright clean on all modified files
  • Pre-commit hooks pass without --no-verify
  • CI green on this PR
  • Manual review of merge-conflict resolutions in mock_world.py, fake_github.py, test_mock_world.py, package.json

Known non-issues

  • IDE pyright sometimes shows stale errors on tests/scenarios/fakes/ — caused by the lazy __getattr__ in tests/scenarios/fakes/__init__.py. CLI pyright is clean.
  • Pre-existing failure test_build_prompt_truncates_long_body on main is unrelated to this PR.

🤖 Generated with Claude Code

T-rav-Hydra-Ops and others added 30 commits April 18, 2026 16:58
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Raise the pytest-rerunfailures version floor from >=14.0 to >=16.0 to
match the locked version (16.1) and avoid silent misbehaviour from
breaking changes in 14→15→16. Add "(Tier C)" qualifier to the
scenario_browser marker description for consistency with sibling markers.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Use concrete import path for MockWorld (avoids __getattr__ union-type
resolution that confused pyright) and guard teardown with getattr so
future Task-7 methods don't cause pyright errors before they exist.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Extract _wire_targets(target) that accepts any duck-typed object exposing
.prs/.triage_runner/.planners/.agents/.reviewers/.workspaces, enabling
Task 9 to wire an orchestrator adapter without duplicating logic. The
three original _wire_* methods are kept as thin backward-compatible wrappers.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements the no-orchestrator path of start_dashboard(): boots
HydraFlowDashboard in-process on an ephemeral port against a seeded
EventBus/StateTracker, exposes the URL via dashboard_url, and provides
idempotent start + graceful stop. Adds _uvicorn_server attribute to
HydraFlowDashboard so MockWorld can poll for the bound port. The browser
smoke test (intentionally failing since Task 2) now passes end-to-end.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Harden MockWorld.stop_dashboard to explicitly close uvicorn's bound
listener sockets (asyncio.Server.close + wait_closed) before awaiting
dashboard.stop(), ensuring the ephemeral port is released immediately
rather than after uvicorn's graceful-shutdown delay. Adds regression
test that re-binds the port after teardown to catch any recurrence.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…to a real orchestrator

Implements Task 9: _build_wired_orchestrator constructs a real
HydraFlowOrchestrator (pipeline_enabled=False) and uses a _SvcAdapter
to bridge the ServiceRegistry's `triage` field to the `triage_runner`
name that _wire_targets expects, then patches all fake methods onto the
live service registry.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Ports seed data from src/ui/e2e/fixtures/seed-state.js into Python via
seed_populated_pipeline / seed_empty_pipeline helpers, backed by a new
sync FakeGitHub.add_pr method.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds a useEffect in HydraFlowProvider that mirrors state.connected onto
document.body[data-connected] so Playwright's wait_for_ws_ready() helper
can detect WebSocket readiness via a stable DOM attribute.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Ports 10 populated + 5 empty captures from JS screenshots.spec.js to
Python Playwright. Adds a stdlib-only PNG pixel-diff snapshot fixture
(no Pillow/numpy) with --update-snapshots support. Fixes a
pytest-asyncio session/function loop deadlock by making browser_context
function-scoped, and patches PLAYWRIGHT_BROWSERS_PATH at import time to
survive the hermetic HOME=/tmp/hydraflow-test environment.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Wire start_dashboard() to the harness's existing EventBus and IssueStore
via _HarnessOrchestratorShim; update seed_populated_pipeline() to call
harness.seed_issue() with correct stage names ('find'/'ready' not
'triage'/'implement'); regenerate all 15 contract baselines so populated
and empty snapshots are visibly distinct (80220 vs 81561 bytes).

Document why session-scoped Chromium fixtures are not used: the
cross-loop deadlock requires asyncio_default_test_loop_scope="session"
globally, which is high-risk for the existing unit suite.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add data-testid="orchestrator-status" hidden span to Header.jsx and two
Tier-2 workflow tests verifying that clicking the dashboard's Start/Stop
buttons drives the real HydraFlowOrchestrator's running state. Pyright
clean; all 1034 vitest tests and both new browser tests pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Update HitlPage page object to match real HITLTable.jsx testids:
  - open() navigates to / and clicks the HITL tab (React ignores ?tab= URL param)
  - item() → hitl-row-{N} (was hitl-item-{N})
  - correction_input() → hitl-textarea-{N} (was hitl-correction-input-{N})
  - submit_button() → hitl-retry-{N} (was hitl-submit-{N})
  - add detail(), skip_button(), close_button() helpers
- Add test_hitl_roundtrip.py: Playwright route-intercepts /api/control/status
  (to report running) and /api/hitl (to serve issue 208), clicks Skip, asserts
  the row disappears after the intercepted /api/hitl/208/skip returns ok.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Drive the dashboard to register a repo via the UI, verifying the
registration endpoint is called with the correct path and the repo
appears in the repo-selector dropdown list.

Uses Playwright route interception for GET /api/repos and POST
/api/repos/add since MockWorld.start_dashboard() does not wire
register_repo_cb or repo_store (making the native backend return 503
and can_register: false). Adds data-testid="repo-register-input" to
the filesystem-path input in RegisterRepoDialog for reliable selection.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds a Tier-2 browser scenario that verifies the automated merge_update
WebSocket event is propagated from the EventBus through the WS handler to
the React frontend, where the Outcomes panel reflects the merged PR state
("#301 (merged)") in the expanded issue row.

Uses route interception for /api/prs and /api/issues/history (same seam as
Tasks 20/21) and world._harness.bus.publish() to inject the merge_update
event that simulates what the review loop emits when PRManager.merge_pr()
succeeds. Documents in the module docstring that PR approval in HydraFlow is
fully automated with no user-facing Approve button.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds Task 23 — edits rc_cadence_hours in the staging_promotion worker
card (StagingPromotionSettingsPanel) and asserts PATCH /api/control/config
is fired with the correct payload on blur.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Port H1 from tests/scenarios/test_happy.py to a browser-driven scenario.
Establishes the implementation pattern for the remaining 23 Tier-3 ports.

Pattern chosen: Python-trigger + UI-assert (not full E2E via UI Start button).

Full E2E is unworkable because clicking Start in the dashboard creates a
brand-new HydraFlowOrchestrator (dashboard_routes/_control_routes.py line 227)
that replaces the pre-wired orchestrator, losing all FakeLLM / FakeGitHub
wiring. This is a confirmed Phase 5 finding.

The viable pattern:
  1. IssueBuilder seeds the world (identical to Python-only test_happy.py H1).
  2. world.run_pipeline() drives all four phases through wired fakes.
  3. world._harness.store.mark_merged(1) syncs the IssueStore — needed because
     PipelineHarness builds PostMergeHandler(store=None) so mark_merged is not
     called during run_pipeline(); FakeGitHub.pr.merged=True but the store
     is not updated automatically.
  4. Dashboard starts with with_orchestrator=False (lightweight shim exposes
     harness IssueStore; _is_pipeline_active returns True because shim.running).
  5. Route interceptor sets control/status to "running" for future multi-tab use.
  6. UI asserts: stage-header-merged contains "1 merged"; flow-dot-1 is visible.
  7. Python asserts: world.github.pr_for_issue(1).merged is True.

This pattern scales to H2–L8: seed via IssueBuilder / set_phase_result,
run_pipeline(), mark_merged for merged issues, assert stage-header-{stage}.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds four new scenario_browser tests porting H2-H5 from
tests/scenarios/test_happy.py using the render-after-run pattern
established by the H1 pilot.

Key patterns discovered:
- H2: mark_merged must be called for all N issues before start_dashboard;
  stage-header-merged asserts "N merged".
- H3 (failed implement): a failed issue is consumed from the implement queue
  but never re-queued or merged, so it has no flow-dot in the pipeline
  snapshot.  DOM assertion is that stage-header-merged is absent of
  "1 merged" and stage-section-implement is present.
- H4: identical seed to H1 but additionally asserts review_result is not None
  before the merged-stage UI check.
- H5 (sub-issues): NewIssueSpec bodies shorter than 50 chars are skipped by
  plan_phase, so sub-issues never reach GitHub.  Parent issue #1 still
  proceeds to merge; DOM asserts "1 merged" for the parent; Python asserts
  plan_result.new_issues has 2 entries.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Port all six sad-path scenarios from tests/scenarios/test_sad.py to
browser-rendered assertions.  Two behavioural findings recorded in test
docstrings: S1 stalls at plan (harness consumes the first failure result
and does not auto-retry), and S6 now merges (FakeLLM wiring has evolved
since the reference comment was written).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
T-rav-Hydra-Ops and others added 5 commits April 19, 2026 18:50
Ports five edge-case scenarios (duplicate issues, on_phase hook, stale
worktree GC, epic child ordering, zero-diff implement) from
tests/scenarios/test_edge.py to Playwright browser tests following the
render-after-run pattern established by H1-H5 and S1-S6.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Ports all 8 background-loop scenarios to browser tests in
test_loops_browser.py. Each test seeds world state matching the
reference test_loops.py, runs the loop via run_with_loops, asserts
Python-side state, then boots the dashboard shim and applies a
negative DOM assertion confirming no crash and all pipeline sections
are visible. L2 carries the same xfail as the reference (workspace_gc
calls gh api via run_subprocess which is not stubbed in the harness).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Resolves conflicts in 5 files:
- src/ui/package.json + package-lock.json: keep HEAD (no @playwright/test,
  no @axe-core/playwright). Main's new e2e/{a11y,interactions}.spec.js
  deleted — JS suite is gone per ADR-0042 plan; a11y + interactions
  coverage must be re-added to the Python harness as a follow-up.
- tests/scenarios/fakes/fake_github.py: keep both add_pr (HEAD) and
  add_alerts (main).
- tests/scenarios/fakes/mock_world.py: keep both clock_start (HEAD) and
  wiki_store/beads_manager (main) ctor kwargs.
- tests/scenarios/fakes/test_mock_world.py: keep both test suites.
@HydraOps-T-rav HydraOps-T-rav marked this pull request as ready for review April 20, 2026 16:55
@HydraOps-T-rav HydraOps-T-rav merged commit b55809f into main Apr 20, 2026
24 checks passed
@HydraOps-T-rav HydraOps-T-rav deleted the browser-scenario-harness branch April 20, 2026 19:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant