ci(e2e): move nightly regression lanes to Vitest scenarios by cv · Pull Request #5072 · NVIDIA/NemoClaw

cv · 2026-06-09T18:12:20Z

Summary

Draft planning PR for the E2E Vitest fan-out stack. This PR scopes the CI cutover once enough family migrations have landed.

Related Issue

Refs #4941
Refs #4990
Refs #4357
Depends on #5046, #5052, and the shared runtime-suite base stack.
Stacked on branch codex/e2e-fanout-01-inventory-internals.

Changes

Placeholder branch for ci(e2e): move nightly regression lanes to Vitest scenarios.
Planned scope: Move high-value nightly/regression lanes from legacy shell/YAML dispatch to registry-backed Vitest scenarios while preserving secrets, skips, runners, timeouts, and artifacts.
No implementation changes yet; this draft should remain draft until code and verification are added.

Type of Change

Code change (feature, bug fix, or refactor)
Code change with doc updates
Doc only (prose changes, no code sample modifications)
Doc only (includes code sample changes)

Verification

npx prek run --all-files passes
npm test passes
Tests added or updated for new or changed behavior
No secrets, API keys, or credentials committed
Docs updated for user-facing behavior changes
npm run docs builds without warnings (doc changes only)
Doc pages follow the style guide (doc changes only)
New doc pages include SPDX header and frontmatter (new pages only)

Signed-off-by: Carlos Villela cvillela@nvidia.com

`liveScenarioSupport` previously rejected any scenario that declared an `environment.lifecycle`, so post-onboard host mutations (reboot, rebuild, upgrade, drift) could not surface in the live Vitest matrix at all. Replace the unconditional reject with a `SUPPORTED_LIFECYCLES` whitelist that starts with the single profile the upcoming post-reboot-recovery fixture dispatches: `post-reboot-recovery`. Future profiles must land the dispatcher branch and an expected-state in the same change set, so the whitelist stays in lockstep with what the runner can actually execute. Prepares the runner for #4423's failing-test-first guard, which needs a post-reboot lifecycle scenario to demonstrate registry preservation + Docker-backed sandbox recovery on Linux/Spark Docker-driver hosts. Refs #4423

Adds two host-side state-validation probes the live runner needs to express the regression target tracked by #4423: * `local-registry-entry-present` reads `~/.nemoclaw/sandboxes.json` and asserts the scenario's sandbox name is still recorded. This is deliberately orthogonal to `sandbox.expected`: post-reboot bugs can wipe the local registry while the live OpenShell gateway is healthy, and only a host-side probe catches the data-loss regression. * `docker-sandbox-container-present` runs `docker ps -a --filter label=openshell.ai/sandbox-name=<name>` and accepts running, stopped, or `*-nemoclaw-gpu-backup-*` sibling containers. The label filter mirrors `OPENSHELL_SANDBOX_NAME_LABEL` used by `findOpenShellDockerSandboxContainerIds` in `src/lib/onboard/docker-gpu-patch.ts`, so the probe stays in lock- step with how OpenShell labels containers today. Probe wiring: * `StateProbeId` extended with the two new probe ids. * `ExpectedState` gains `localRegistry` and `dockerSandboxContainer` optional dimensions; `probesForState` emits the new probes only for `expected: "present"`. Negative-direction probes are intentionally omitted today and pinned by a probesForState test. * `StateValidationPhaseFixture.from()` now accepts either an expected-state ID or an inline `ExpectedState`, so unit tests can drive new probes without registering synthetic states in the typed registry. The live runner still calls `from(id, instance)`. * Fixture takes an optional `ProbeIO` injection so tests can stub the registry reader without touching `~/.nemoclaw`. No callers of the existing typed registry are affected: every shipped expected-state leaves `localRegistry` and `dockerSandboxContainer` unset, so `probesForState` returns the same probe lists as before. Refs #4423

Adds a Vitest phase fixture that mutates host state between onboarding and state-validation, so live scenarios can express post-onboard invariants the legacy bash runner has no equivalent for. `LifecyclePhaseFixture.simulate("post-reboot-recovery", instance, opts)` reproduces the host-side conditions of a DGX Spark / Linux Docker-driver reboot in two modes: * `stop-original` (default) — `openshell gateway stop` + `docker stop` of the labeled sandbox container. Models the common reboot outcome where OpenShell forgets the sandbox while Docker keeps the container exited but labeled. * `rename-to-gpu-backup` — additionally `docker rename`s the container to a `*-nemoclaw-gpu- backup-<ts>` sibling, mirroring the GPU-patch reboot path in `src/lib/onboard/docker-gpu-patch.ts`. Both modes register cleanups (in reverse order) to restore the container so test teardown leaves Docker in a usable state. Wiring: * `framework/phases/index.ts` re-exports the fixture and types. * `framework/e2e-test.ts` registers a `lifecycle` Vitest fixture on `E2EScenarioFixtures`, wired with the shared `host`, `sandbox`, and `cleanup` registries. * `live/registry-scenarios.test.ts` invokes `lifecycle.simulate(profile, instance)` between `onboard.from(...)` and `stateValidation.from(...)` whenever the scenario declares a whitelisted `environment.lifecycle`. Scenarios that omit lifecycle are unaffected. A scenario whose lifecycle is whitelisted by `runtime-support.ts` but NOT dispatched by the fixture fails fast with a clear error so the whitelist and dispatcher stay in lock- step. Coverage in `e2e-phase-lifecycle.test.ts` exercises both modes, gateway-stop tolerance, the no-labeled-container failure case, the docker-discover failure case, the unsupported-profile rejection, the cleanup queue order, and `buildBackupContainerName` truncation. The fixture is intentionally narrow on profiles: only `post-reboot-recovery` is dispatched today. Adding rebuild, upgrade, or drift profiles is a separate, equally narrow change set that must land the dispatcher branch and `SUPPORTED_LIFECYCLES` whitelist together. Refs #4423

Registers the failing-test-first guard for #4423 in the typed scenario registry so the live Vitest matrix from #5006 fans it out as a dedicated CI job. Builds on the framework primitives added earlier in this PR (lifecycle phase fixture, host-side probes, lifecycle whitelist). Additions: * `post-reboot-recovery-ready` expected-state in `scenarios/expected-states.ts` declaring the user-visible invariants that must hold after a `nemoclaw <name> status` call on a freshly-rebooted DGX Spark / Linux Docker-driver host: - cli installed, - gateway healthy (the user-systemd unit from #4580 brings it back up before status runs), - sandbox running (recovery completed in time), - localRegistry entry preserved (the user-visible regression target — destroyed on unfixed `main`), - dockerSandboxContainer present (recovery didn't delete the labeled container or its `*-nemoclaw-gpu-backup-*` sibling). * `ubuntu-repo-docker-post-reboot-recovery` scenario in `scenarios/scenarios/baseline.ts` wiring `ubuntuRepoDockerLifecycle("cloud-openclaw", "post-reboot-recovery")` against the new expected-state and a smoke suite. Carries a description that explains the RED/GREEN contract and points to the PR-A fix landing in `src/lib/`. * `manifests/openclaw-nvidia-post-reboot-recovery.yaml` declares `lifecycle: post-reboot-recovery` and the same NVIDIA_API_KEY credential ref the cloud-openclaw scenarios use. * `.github/workflows/e2e-scenarios.yaml` ROUTES table gains the new scenario so the workflow-boundary test (`e2e-scenarios-workflow.test.ts`) routes every typed id. Test pinning: * `e2e-scenario-matrix.test.ts` updated from a 1-entry to a 2-entry live matrix expectation. The new entry asserts on `expectedStateId: "post-reboot-recovery-ready"` so a future accidental dropped-lifecycle change to the scenario regresses loudly. * `e2e-live-registry-discovery.test.ts` swaps the synthetic whitelist-coverage test for an assertion against the real `ubuntu-repo-docker-post-reboot-recovery` registry entry. Behavior: * On unfixed `main`, the live runner's lifecycle phase stops the OpenShell gateway runtime and `docker stop`s the labeled sandbox container. State-validation then runs `nemoclaw <name> status` (which restarts the gateway via systemd) and the destructive `missing` branch in `src/lib/actions/sandbox/status.ts` wipes the local registry entry. The `local-registry-entry-present` probe fails. Scenario goes RED. * On the PR-A fix branch, the new Docker-driver sandbox recovery helper restarts the labeled container before stale-removal can fire, registry survives, all five probes pass. Scenario flips GREEN. The bash-side legacy compiler emits a `lifecycle.profile.post-reboot-recovery` PhaseAction pointing at `nemoclaw_scenarios/lifecycle/dispatch.sh`, but the legacy bash worker is intentionally not provided: this scenario is Vitest-only. The typed runner's `LifecyclePhaseFixture` handles dispatch directly. If the legacy runner is invoked against this scenario it errors out at the dispatcher; that's the right failure mode while the bash side stays on its own retirement clock. Refs #4423

Signed-off-by: Carlos Villela <cvillela@nvidia.com>

copy-pr-bot · 2026-06-09T18:12:25Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

coderabbitai · 2026-06-09T18:12:28Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: e87957ef-176f-4b73-8b40-9cc1d6718d7b

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch codex/e2e-fanout-21-nightly-regression-vitest

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-06-09T18:13:23Z

E2E Advisor Recommendation

Required E2E: None
Optional E2E: None

Workflow run

Full advisor summary

E2E Recommendation Advisor

Failed: Could not parse JSON from advisor output; see /home/runner/work/NemoClaw/NemoClaw/artifacts/e2e-advisor/e2e-advisor-raw-output.txt

github-actions · 2026-06-09T18:13:25Z

E2E Scenario Advisor Recommendation

Required scenario E2E: None
Optional scenario E2E: None

Workflow run

Full scenario advisor summary

E2E Scenario Advisor

Failed: Could not parse JSON from advisor output; see /home/runner/work/NemoClaw/NemoClaw/artifacts/e2e-advisor/e2e-scenario-advisor-raw-output.txt

github-actions · 2026-06-09T18:13:48Z

PR Review Advisor

Findings: 0 needs attention, 1 worth checking, 0 nice ideas
Top item: PR review advisor unavailable

Review findings

🛠️ Needs attention

None.

🔎 Worth checking

PR review advisor unavailable: The automated advisor could not complete: Could not parse JSON from PR review advisor output; see /home/runner/work/NemoClaw/NemoClaw/artifacts/pr-review-advisor/pr-review-advisor-raw-output.txt
- Recommendation: Re-run the PR Review Advisor or perform a manual review.
- Evidence: Could not parse JSON from PR review advisor output; see /home/runner/work/NemoClaw/NemoClaw/artifacts/pr-review-advisor/pr-review-advisor-raw-output.txt

🌱 Nice ideas

None.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

cv · 2026-06-10T08:40:46Z

Closing as superseded by #5106 and the post-#5098 one-E2E migration plan.

This branch belongs to the pre-cutover fanout stack. Any useful helper/scenario work should come back as a fresh, focused draft PR from current main: Vitest as the only E2E harness, GitHub Actions as the matrix, no revived runner path, no long-lived legacy-inventory.json roadmap expansion, and replacement/deletion evidence carried in the PR body plus linked issue.

jyaunches and others added 7 commits June 9, 2026 12:24

Merge branch 'main' into e2e-scenario-lifecycle-fixture-prereq

e276ef3

chore(e2e): scaffold inventory internals migration draft

34466dd

Signed-off-by: Carlos Villela <cvillela@nvidia.com>

chore(e2e): scaffold fan-out draft 21

b279f46

Signed-off-by: Carlos Villela <cvillela@nvidia.com>

cv self-assigned this Jun 9, 2026

cv added area: e2e End-to-end tests, nightly failures, or validation infrastructure area: ci CI workflows, checks, release automation, or GitHub Actions chore Build, CI, dependency, or tooling maintenance labels Jun 9, 2026

Base automatically changed from codex/e2e-fanout-01-inventory-internals to main June 10, 2026 02:41

cv mentioned this pull request Jun 10, 2026

Epic: Migrate legacy bash E2E into the Vitest E2E system #5098

Open

79 tasks

cv closed this Jun 10, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci(e2e): move nightly regression lanes to Vitest scenarios#5072

ci(e2e): move nightly regression lanes to Vitest scenarios#5072
cv wants to merge 7 commits into
mainfrom
codex/e2e-fanout-21-nightly-regression-vitest

cv commented Jun 9, 2026

Uh oh!

copy-pr-bot Bot commented Jun 9, 2026

Uh oh!

coderabbitai Bot commented Jun 9, 2026

Review skipped

Uh oh!

github-actions Bot commented Jun 9, 2026

E2E Recommendation Advisor

Uh oh!

github-actions Bot commented Jun 9, 2026

E2E Scenario Advisor

Uh oh!

github-actions Bot commented Jun 9, 2026

🛠️ Needs attention

🔎 Worth checking

🌱 Nice ideas

Uh oh!

cv commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

cv commented Jun 9, 2026

Summary

Related Issue

Changes

Type of Change

Verification

Uh oh!

copy-pr-bot Bot commented Jun 9, 2026

Uh oh!

coderabbitai Bot commented Jun 9, 2026

Review skipped

Uh oh!

github-actions Bot commented Jun 9, 2026

E2E Advisor Recommendation

E2E Recommendation Advisor

Uh oh!

github-actions Bot commented Jun 9, 2026

E2E Scenario Advisor Recommendation

E2E Scenario Advisor

Uh oh!

github-actions Bot commented Jun 9, 2026

PR Review Advisor

🛠️ Needs attention

🔎 Worth checking

🌱 Nice ideas

Uh oh!

cv commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants