test(e2e): failing-test-first guard for #2701 + recovery framework helpers by jyaunches · Pull Request #5049 · NVIDIA/NemoClaw

jyaunches · 2026-06-09T17:33:16Z

Summary

Failing-test-first regression guard for #2701: gateway recovery currently warns about a missing /tmp guard chain after a pod recreate but launches the gateway naked anyway, triggering the @homebridge/ciao crash loop documented in the issue (5-minute rebuild --yes is the only manual recovery on DGX Spark).

This PR captures the #2701 contract in two formats so the bug shows up in CI before the fix lands. The bash and Vitest suites stay completely separate and do not call each other.

Legacy bash flip — test/e2e/test-issue-2478-crash-loop-recovery.sh Phase 4 keeps the existing [DGX Spark] Gateway crash loop on startup: @homebridge/ciao networkInterfaces() returns EPERM in OpenShell sandbox #2478 WARNING assertion and now also asserts the guard chain is restored after recovery. Will be red on main; flips green when the [DGX Spark] Host reboot bricks sandbox until 5-min rebuild --yes: connect recovery path warns about missing /tmp guards but launches gateway naked → @homebridge/ciao crash loop #2701 fix lands. The new fail() is deferred so Phase 4 restore + Phase 5 soak still execute and the artifact bundle stays comparable to historical runs. Runs in the existing issue-2478-crash-loop-recovery-e2e job in nightly-e2e.yaml — no workflow change.
New Vitest live E2E — test/e2e-scenario/live/gateway-guard-recovery.test.ts. First slice migrated from the legacy 2478 harness. Asserts the same [DGX Spark] Host reboot bricks sandbox until 5-min rebuild --yes: connect recovery path warns about missing /tmp guards but launches gateway naked → @homebridge/ciao crash loop #2701 contract using typed framework fixtures and runs in the Vitest workflow only (e2e-vitest-scenarios.yaml) as a new free-standing gateway-guard-recovery job. Not part of the registry-driven matrix because recovery scenarios don't fit the steady-state expected-state probe model.
Framework helpers — sandbox-lifecycle disruption-and-recovery primitives, reusable for any future recovery scenario:
- GatewayClient.expectGuardChainActive — reads /tmp/nemoclaw-proxy-env.sh (not /proc/<pid>/environ; kernel.yama.ptrace_scope=1 blocks cross-tree environ reads — same approach as the legacy gateway_guards_active shell helper) and asserts the expected --require markers are present.
- GatewayClient.expectLogContains / expectLogDoesNotContain — tail and grep /tmp/gateway.log inside the sandbox.
- GatewayClient.expectPidStable — sample gateway PID over a window, fail on change (crash-loop) or disappearance.
- GatewayClient.resolveGatewayPid — typed wrapper around the legacy two-pass ps detection.
- SandboxClient.wipeGuardChain — remove /tmp/nemoclaw-proxy-env.sh plus the seven guard preload files; equivalent shape to a fresh container's /tmp after pod recreate.
- SandboxClient.killGatewayTree — pkill -9 -f '[o]penclaw' + verify nothing came back.
- 21 unit tests in test/e2e-scenario/framework-tests/e2e-recovery-helpers.test.ts covering the new helpers (mocked runner, no live deps).

Phases 1, 2, 3, and 5 of the legacy 2478 script (initial-state, normal crash-recovery loop, idle soak) remain in bash pending a follow-up migration PR per the migration doc's "preserve the requirement before removing legacy runner pieces" policy. Migration inventory updated to record the partial Vitest target.

Suite separation

nightly-e2e.yaml — bash E2E only. Untouched by this PR. The issue-2478-crash-loop-recovery-e2e job already runs the legacy script; the new Phase 4 assertion lights up automatically.
e2e-vitest-scenarios.yaml — Vitest E2E only. Gains the new gateway-guard-recovery job. workflow_dispatch only today, matching the existing Vitest workflow's trigger model.

The two suites do not call each other and do not share workflow definitions. PR-C may revisit the Vitest workflow's trigger model (e.g. add a schedule) but that's out of scope for this PR.

Related Issue

Refs #2701 #2478 #4356 #4941

Sequencing

PR-A (this PR) — failing-test-first guard. Two assertions go red: legacy issue-2478-crash-loop-recovery-e2e (Phase 4 new assertion, runs in nightly) and the new Vitest gateway-guard-recovery job (runs on workflow_dispatch of e2e-vitest-scenarios.yaml).
PR-B (next) — the [DGX Spark] Host reboot bricks sandbox until 5-min rebuild --yes: connect recovery path warns about missing /tmp guards but launches gateway naked → @homebridge/ciao crash loop #2701 fix. Implements install-preloads.sh + a guard-recovery.ts module + wires re-emit into buildOpenClawRecoveryScript. Both assertions flip green.
PR-C (later) — full migration of the remaining 4 phases of the 2478 bash test into Vitest after PR-B has soaked clean. Legacy bash test deletion gated on green soak per the migration policy.

Changes

.github/workflows/e2e-vitest-scenarios.yaml — new gateway-guard-recovery free-standing job (not in the matrix).
test/e2e-scenario/framework/clients/gateway.ts — added SandboxClient constructor dep + 5 new helpers.
test/e2e-scenario/framework/clients/sandbox.ts — added wipeGuardChain and killGatewayTree disruption helpers.
test/e2e-scenario/framework/e2e-test.ts — wire sandbox fixture into gateway fixture (chain stays acyclic).
test/e2e-scenario/framework-tests/e2e-clients.test.ts and e2e-phase-state-validation.test.ts — updated existing test constructors to the new GatewayClient(host, sandbox) signature.
test/e2e-scenario/framework-tests/e2e-recovery-helpers.test.ts (new) — 21 unit tests for the new helpers.
test/e2e-scenario/live/gateway-guard-recovery.test.ts (new) — failing live E2E.
test/e2e-scenario/migration/legacy-inventory.json — record the Vitest target for test/e2e/test-issue-2478-crash-loop-recovery.sh.
test/e2e/test-issue-2478-crash-loop-recovery.sh — Phase 4 [DGX Spark] Host reboot bricks sandbox until 5-min rebuild --yes: connect recovery path warns about missing /tmp guards but launches gateway naked → @homebridge/ciao crash loop #2701 contract assertion (deferred fail).
vitest.config.ts — exclude test/e2e-scenario/live/** from the cli project so pre-commit Test (cli) doesn't accidentally pick up live tests that need Docker. Pre-existing issue surfaced by this PR.

nightly-e2e.yaml is intentionally untouched.

Type of Change

Code change (feature, bug fix, or refactor)
Code change with doc updates
Doc only (prose changes, no code sample modifications)
Doc only (includes code sample changes)

Verification

npx vitest run --project e2e-scenario-framework — 320 passed (baseline 299 + 21 new).
npx prek run --all-files — pre-existing flake in runtime-hermes-secret-boundary-behavioural.test.ts and slow tests in nemoclaw-start.test.ts time out under prek's parallel load (unrelated to this PR; same behavior on main).
Tests added for new behavior (21 unit tests for helpers; 1 live E2E test as the failing-test-first regression guard)
No secrets, API keys, or credentials committed
Docs updated for user-facing behavior changes (none — internal test infra only)

The new Vitest live E2E and the bash Phase 4 assertion are expected to fail on main-equivalent CI on this PR — that is the regression-guard signal. They will flip to green when the #2701 fix lands in PR-B.

Signed-off-by: J. Yaunches jyaunches@nvidia.com

🤖 Generated with Claude Code

Summary by CodeRabbit

Tests
- Added comprehensive end-to-end tests for gateway recovery after disruptions.
- Enhanced test fixtures with recovery and health assertion capabilities.
Chores
- Improved CI/CD workflow with manual dispatch support and enhanced PR reporting.
- Updated test configuration and helpers for recovery scenario validation.

copy-pr-bot · 2026-06-09T17:33:19Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

coderabbitai · 2026-06-09T17:33:24Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

This PR implements gateway guard-chain recovery verification by adding sandbox-aware test client APIs, sandbox disruption helpers, comprehensive unit tests, a live E2E scenario, and CI workflow wiring to validate that gateway recovery restores the /tmp guard chain after pod-recreate wipes.

Changes

Gateway Guard Recovery Feature

Layer / File(s)	Summary
Gateway & Sandbox client recovery APIs `test/e2e-scenario/fixtures/clients/gateway.ts`, `test/e2e-scenario/fixtures/clients/sandbox.ts`	Adds `GatewayClient` recovery methods (`resolveGatewayPid`, `expectGuardChainActive`, `expectLogContains`, `expectPidStable`) and `SandboxClient` disruption helpers (`wipeGuardChain`, `killGatewayTree`). Both clients now use shared probe environments with explicit `OPENSHELL_GATEWAY` injection, and `SandboxClient.status()` is updated to use `--name` semantics.
Unit tests for recovery and disruption `test/e2e-scenario/support-tests/e2e-recovery-helpers.test.ts`	Adds comprehensive test coverage via a `ScriptedRunner` harness, validating GatewayClient probe helpers (guard markers, log tails, PID parsing, PID stability polling) and SandboxClient disruption helpers (guard-chain wiping, gateway tree killing) across success and failure cases.
Fixture wiring & client test updates `test/e2e-scenario/fixtures/e2e-test.ts`, `test/e2e-scenario/support-tests/e2e-clients.test.ts`, `test/e2e-scenario/support-tests/e2e-phase-state-validation.test.ts`	Reorders fixtures to instantiate sandbox before gateway, updates gateway fixture and framework tests to construct `GatewayClient(host, sandbox)`, and adjusts OpenShell command expectations to use `--name` placement.
Live gateway-guard-recovery scenario `test/e2e-scenario/live/gateway-guard-recovery.test.ts`	Provisions a Docker sandbox, asserts guard chain is active after onboarding, simulates pod-recreate disruption (wipe + kill), triggers recovery via `nemoclaw connect --probe-only` with probe environment, and validates guard restoration, absence of recovery warnings, and PID stability.
CI workflow: gateway-guard-recovery job and reporting `.github/workflows/e2e-vitest-scenarios.yaml`	Adds optional `workflow_dispatch` `pr_number` input, introduces `gateway-guard-recovery` job that installs OpenShell and runs the live test, and updates `report-to-pr` to depend on all jobs and post markdown results to a targeted or inferred PR.
Legacy integration & vitest config `test/e2e/test-issue-2478-crash-loop-recovery.sh`, `vitest.config.ts`	Updates legacy crash-loop script to assert guard-chain activation post-respawn; adds vitest config comments explaining live scenario exclusion from CLI tests.
Test input construction tweaks `src/lib/security/credential-filter.test.ts`, `test/snapshot.test.ts`	Minor test data changes: X-API-Key and Slack botToken test inputs expressed via string concatenation.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related issues

NVIDIA/NemoClaw#2701: PR directly implements E2E tests and sandbox/gateway helpers that simulate the guard-chain recovery scenario described in the issue.
NVIDIA/NemoClaw#4999: PR extends the same Vitest fixture/client framework and CI workflow orchestration described by the epic.

Possibly related PRs

NVIDIA/NemoClaw#5101: Shares a credential-filter.test.ts edit pattern for secret-value construction.
NVIDIA/NemoClaw#5107: Introduces the openshell-version-pin-vitest job that this PR's workflow now depends on.

Suggested reviewers

cv
prekshivyas

Poem

🐰 A gateway guard stands firm and true,
Through pod-recreate and wipes anew,
With probes and helpers all in place,
The recovery dance finds its grace,
CI logs the victory's trace. 🔐

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 28.57% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'test(e2e): failing-test-first guard for `#2701` + recovery framework helpers' accurately reflects the main changes: introducing E2E test guards and framework helpers for issue `#2701`.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-06-09T18:39:16Z

Selective E2E Results — ❌ Some jobs failed

Run: 27227063905
Target ref: ff460c66fcb07a2584882bad910bf54ef1f54b35
Workflow ref: main
Requested jobs: issue-2478-crash-loop-recovery-e2e
Summary: 0 passed, 1 failed, 0 skipped

Job	Result
issue-2478-crash-loop-recovery-e2e	❌ failure

Failed jobs: issue-2478-crash-loop-recovery-e2e. Check run artifacts for logs.

github-actions · 2026-06-09T21:02:11Z

Vitest E2E Scenario Results — ❌ Some jobs failed

Run: 27235281178
Workflow ref: fix/2701-failing-e2e-guard-pr-a
Requested scenarios: (default — all supported)
Summary: 2 passed, 1 failed, 0 skipped

Job	Result
gateway-guard-recovery	❌ failure
generate-matrix	✅ success
live-scenarios	✅ success

Failed jobs: gateway-guard-recovery. Check run artifacts for logs.

github-actions · 2026-06-09T21:26:13Z

Vitest E2E Scenario Results — ❌ Some jobs failed

Run: 27236542721
Workflow ref: fix/2701-failing-e2e-guard-pr-a
Requested scenarios: (default — all supported)
Summary: 2 passed, 1 failed, 0 skipped

Job	Result
gateway-guard-recovery	❌ failure
generate-matrix	✅ success
live-scenarios	✅ success

Failed jobs: gateway-guard-recovery. Check run artifacts for logs.

wscurran · 2026-06-09T21:37:02Z

✨
Related open issues:

#2701 [DGX Spark] Host reboot bricks sandbox until 5-min rebuild --yes: connect recovery path warns about missing /tmp guards but launches gateway naked → @homebridge/ciao crash loop
#2478 [DGX Spark] Gateway crash loop on startup: @homebridge/ciao networkInterfaces() returns EPERM in OpenShell sandbox

coderabbitai

🧹 Nitpick comments (1)

.github/workflows/e2e-vitest-scenarios.yaml (1)

240-244: ⚡ Quick win

Consider explicit input validation as defense in depth.

Static analysis flagged potential template injection at lines 243-244 where inputs.pr_number and inputs.scenarios are expanded via ${{ toJSON(...) }}. While toJSON() should properly escape strings and inputs.scenarios is already validated via regex in the generate-matrix job (lines 54-57), adding explicit validation in this script provides defense in depth and makes the security posture more obvious to future maintainers.

🛡️ Optional: Add explicit validation

         with:
           script: |
             const needs = ${{ toJSON(needs) }};
             const runUrl = `${context.serverUrl}/${context.repo.owner}/${context.repo.repo}/actions/runs/${context.runId}`;
             const workflowBranch = context.ref.replace('refs/heads/', '');
             const prNumberInput = ${{ toJSON(inputs.pr_number) }} || '';
             const requestedScenarios = ${{ toJSON(inputs.scenarios) }} || '';
+            
+            // Validate inputs (defense in depth)
+            if (prNumberInput && !/^\d+$/.test(prNumberInput)) {
+              core.setFailed(`Invalid pr_number input: ${prNumberInput}`);
+              return;
+            }
+            if (requestedScenarios && !/^[A-Za-z0-9_-]+(,[A-Za-z0-9_-]+)*$/.test(requestedScenarios)) {
+              core.setFailed(`Invalid scenarios input: ${requestedScenarios}`);
+              return;
+            }

             let prNumber = prNumberInput ? Number.parseInt(prNumberInput, 10) : undefined;

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/e2e-vitest-scenarios.yaml around lines 240 - 244, The
template expands user inputs into prNumberInput and requestedScenarios without
runtime checks; add explicit validation/sanitization immediately after those
assignments: ensure prNumberInput contains only digits (or is empty) and
coerce/clear it otherwise, and ensure requestedScenarios matches the allowed
pattern (reuse the regex from generate-matrix or a conservative whitelist for
scenario names, splitting and validating each item) falling back to an empty
string or failing fast; update the block where prNumberInput and
requestedScenarios are defined to perform these checks and log or handle invalid
input accordingly.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In @.github/workflows/e2e-vitest-scenarios.yaml:
- Around line 240-244: The template expands user inputs into prNumberInput and
requestedScenarios without runtime checks; add explicit validation/sanitization
immediately after those assignments: ensure prNumberInput contains only digits
(or is empty) and coerce/clear it otherwise, and ensure requestedScenarios
matches the allowed pattern (reuse the regex from generate-matrix or a
conservative whitelist for scenario names, splitting and validating each item)
falling back to an empty string or failing fast; update the block where
prNumberInput and requestedScenarios are defined to perform these checks and log
or handle invalid input accordingly.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 103670ef-451b-45cf-a084-34a82f83eb20

📥 Commits

Reviewing files that changed from the base of the PR and between 2f1c2da and 9bcdab8.

📒 Files selected for processing (12)

.github/workflows/e2e-vitest-scenarios.yaml
src/commands/sandbox/agents/list.ts
test/e2e-scenario/framework-tests/e2e-clients.test.ts
test/e2e-scenario/framework-tests/e2e-phase-state-validation.test.ts
test/e2e-scenario/framework-tests/e2e-recovery-helpers.test.ts
test/e2e-scenario/framework/clients/gateway.ts
test/e2e-scenario/framework/clients/sandbox.ts
test/e2e-scenario/framework/e2e-test.ts
test/e2e-scenario/live/gateway-guard-recovery.test.ts
test/e2e-scenario/migration/legacy-inventory.json
test/e2e/test-issue-2478-crash-loop-recovery.sh
vitest.config.ts

coderabbitai

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/e2e-vitest-scenarios.yaml:
- Around line 243-259: prNumber currently uses Number.parseInt on prNumberInput
and silently falls back to finding the first open PR when parsing fails; change
the logic so a non-empty prNumberInput that is malformed causes the workflow to
fail instead of retrying: when prNumberInput is non-empty validate it with a
strict check (e.g., /^\d+$/ or compare parsed number to the trimmed input)
before assigning prNumber, and if validation fails call core.setFailed or throw
an error (do not call github.rest.pulls.list or use the workflowBranch
fallback); keep the existing branch-only lookup behavior only when prNumberInput
is empty.

In `@test/e2e-scenario/framework/clients/gateway.ts`:
- Around line 140-147: The probe options are built incorrectly: helper-specific
env from probeEnv() is overwritten by caller options because options is spread
after env; update the probe options construction in GatewayClient and
SandboxClient so non-env fields are spread first and env is merged explicitly
(env: { ...probeEnv(), ...options?.env }) before passing to this.sandbox.exec
(references: GatewayClient, SandboxClient, probeEnv, and the this.sandbox.exec
call building artifactName/gateway-guard-chain-...). Ensure you apply this same
merge pattern wherever helper probe options are assembled so OPENSHELL_GATEWAY
and other helper vars are preserved.
- Around line 221-223: The loop undercounts total wait time when durationSeconds
isn't a multiple of pollIntervalSeconds: change the sample calculation to use
Math.ceil instead of Math.floor (i.e., compute samples = Math.max(1,
Math.ceil(options.durationSeconds / pollIntervalSeconds))) so the for-loop that
calls sleepSeconds(pollIntervalSeconds) runs long enough to cover at least
durationSeconds; update any related comments/tests referencing samples if
present.
- Around line 241-255: tailLog currently ignores the sandbox.exec exit code so a
missing/unreadable GATEWAY_LOG_PATH yields an empty string and lets
expectLogDoesNotContain false-pass; update tailLog (the private async tailLog in
gateway.ts) to check the result from this.sandbox.exec (the call that runs
["sh","-c", `tail -n ${lines} ${GATEWAY_LOG_PATH} 2>/dev/null`]) and throw an
Error when result.exitCode is non-zero (include result.stderr or a descriptive
message) so callers like expectLogDoesNotContain fail when tail could not read
the file.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 36971dbc-e347-4f83-80c9-e1b2d0ca1b34

📥 Commits

Reviewing files that changed from the base of the PR and between 9bcdab8 and 671aa71.

📒 Files selected for processing (11)

.github/workflows/e2e-vitest-scenarios.yaml
test/e2e-scenario/framework-tests/e2e-clients.test.ts
test/e2e-scenario/framework-tests/e2e-phase-state-validation.test.ts
test/e2e-scenario/framework-tests/e2e-recovery-helpers.test.ts
test/e2e-scenario/framework/clients/gateway.ts
test/e2e-scenario/framework/clients/sandbox.ts
test/e2e-scenario/framework/e2e-test.ts
test/e2e-scenario/live/gateway-guard-recovery.test.ts
test/e2e-scenario/migration/legacy-inventory.json
test/e2e/test-issue-2478-crash-loop-recovery.sh
vitest.config.ts

🚧 Files skipped from review as they are similar to previous changes (7)

test/e2e-scenario/live/gateway-guard-recovery.test.ts
test/e2e-scenario/framework-tests/e2e-phase-state-validation.test.ts
test/e2e-scenario/migration/legacy-inventory.json
test/e2e/test-issue-2478-crash-loop-recovery.sh
test/e2e-scenario/framework/e2e-test.ts
test/e2e-scenario/framework-tests/e2e-recovery-helpers.test.ts
test/e2e-scenario/framework-tests/e2e-clients.test.ts

github-actions · 2026-06-10T00:35:47Z

Vitest E2E Scenario Results — ❌ Some jobs failed

Run: 27244789650
Workflow ref: fix/2701-failing-e2e-guard-pr-a
Requested scenarios: (default — all supported)
Summary: 2 passed, 1 failed, 0 skipped

Job	Result
gateway-guard-recovery	❌ failure
generate-matrix	✅ success
live-scenarios	✅ success

Failed jobs: gateway-guard-recovery. Check run artifacts for logs.

github-actions · 2026-06-10T00:42:13Z

Selective E2E Results — ❌ Some jobs failed

Run: 27244790666
Target ref: 671aa71ac9cc53b92e3becb0d438703489e38a56
Workflow ref: main
Requested jobs: issue-2478-crash-loop-recovery-e2e
Summary: 0 passed, 1 failed, 0 skipped

Job	Result
issue-2478-crash-loop-recovery-e2e	❌ failure

Failed jobs: issue-2478-crash-loop-recovery-e2e. Check run artifacts for logs.

…-pin Migrates test/e2e/test-openshell-version-pin.sh (regression guard for #3474) to a free-standing Vitest live test under test/e2e-scenario/live/. The legacy script is a hermetic installer-script behavioral test: it runs scripts/install-openshell.sh under a stubbed PATH where the already-installed openshell reports a too-new version (0.0.45) and the downloaded archives produce a binary that reports the pinned 0.0.44. It does not exercise the registry-driven steady-state probe model, so it lives outside baseline.ts as a free-standing live test (per #5049). Four [PASS] assertions translated faithfully: - installer-exits-zero (result.status === 0) - download-log-contains-v0.0.44 - download-log-excludes-v0.0.45 - replaced-openshell-reports-0.0.44 CI: discrete openshell-version-pin-vitest job in e2e-vitest-scenarios.yaml runs on workflow_dispatch alongside the registry-driven matrix. Bash guard retained in regression-e2e.yaml; deletion is a follow-up PR with #4357 approval per the migration freeze policy. Refs #4355, #3474

cv · 2026-06-10T07:25:32Z

Hybrid framing note

Since #5098 cites this PR as the HYBRID precedent, I want to make the intended meaning explicit.

“Hybrid” should be a migration/convergence state, not a permanent architecture with two E2E systems. The useful lesson from this PR is not “some E2Es live forever in bash and some in Vitest.” The useful lesson is:

preserve the high-fidelity contract first;
add the typed/Vitest harness without losing the real system boundary;
keep existing bash coverage only while the Vitest representation is incomplete or while it is still proving equivalence.

If the contract requires shell behavior, the Vitest E2E can and should execute that real shell behavior: process signals, /proc reads, login shell state, bash install.sh, etc. The thing we should avoid preserving is a second bash test runner/harness.

So for follow-ups from this PR, the durable target should still be a Vitest E2E harness with boundary fixtures/helpers, not a long-term split suite. Related clarification is now on #4941, #5098, and #4999.

…est live test (#5107) # Migration: `test-openshell-version-pin.sh` → free-standing Vitest live test **Refs**: #4355 (owner issue), #3474 (regression target), #4990 (E2E architecture contract) ## What landed in this PR A free-standing Vitest live test at `test/e2e-scenario/live/openshell-version-pin.test.ts` plus a discrete `openshell-version-pin-vitest` job in `.github/workflows/e2e-vitest-scenarios.yaml`, modeled on #5049's free-standing pattern. The legacy bash guard `test/e2e/test-openshell-version-pin.sh` remains in place; deletion is a follow-up PR with #4357 approval per the migration freeze policy. ## Architecture notes (course-correction during the loop) The first pass of this migration tried to land the test in `test/install-openshell-version-check.test.ts` under the existing `installer-integration` Vitest project. It even passed CI twice there (commit `a95edc02e`, runs 27251976992 and 27252048411). The argument was that this is a hermetic installer-script unit test, not a scenario, and sister tests already live in that file. That argument **lost** to an explicit project contract: `test/e2e-scenario/framework-tests/e2e-migration-inventory.test.ts` enforces that every entry in `targetVitestScenarios` for a `covered` legacy script must match `^test/e2e-scenario/live/.+\.test\.ts$`. The schema is the binding architectural decision; the project has already chosen `test/e2e-scenario/live/` as the canonical home for migrated tests, regardless of whether they exercise the scenario framework's primitives. The pivot adopts that decision. Commits in this PR walk that history: 1. `873aaa2ee` — mark inventory `bridge-probe` (initial in-flight state) 2. `a95edc02e` — first sketch under installer-integration (rejected by schema) 3. `4670cb556` — empty commit (verification dispatch trigger from the rejected approach) 4. `9215a4f1f` — revert (2) 5. `920876546` — final sketch: free-standing live test + discrete CI job 6. `3d9c01931` — inventory mark `covered` pointing at the new live test ## Shape - **Shape**: free-standing live test (not registry-driven) - **Family**: hermetic installer-script behavioral test - **Runner**: `ubuntu-latest` - **Registry-driven**: no — does not fit `from(scenario.environment, env) → from(scenario.expectedStateId, instance)` - **Free-standing job**: `openshell-version-pin-vitest` in `e2e-vitest-scenarios.yaml` ## Assertion chain (mapped from the bash script's four `[PASS]` checks) 1. `installer-exits-zero` — `result.status === 0` (happy path completes; no "above the maximum" hard-fail). 2. `download-log-contains-v0.0.44` — pinned release tag was requested. 3. `download-log-excludes-v0.0.45` — too-new sticky version is never re-fetched. 4. `replaced-openshell-reports-0.0.44` — binary on disk in active install dir overwritten with pinned build. ## Failing-test-first contract - **First-dispatch outcome predicted**: GREEN. The framework already covers the regression target (`scripts/install-openshell.sh:272-275` implements the v0.0.44 reinstall path; sister tests in `test/install-openshell-version-check.test.ts` already cover the entry-point invariants). This migration's residual coverage is the happy-path completion, exercised via stubbed PATH binaries. - **Skill 5.8 false-pass guard**: two consecutive GREEN dispatches required of the new `openshell-version-pin-vitest` job before convergence is recorded in the inventory entry's `frozenAtSha` / `convergenceEvidence` fields. - Local `vitest run --project e2e-scenarios-live test/e2e-scenario/live/openshell-version-pin.test.ts` passes; inventory schema-lock test passes. ## Suite separation honored - `regression-e2e.yaml` (bash) untouched. Legacy script remains in place; deletion is a follow-up PR with #4357 approval. - `e2e-vitest-scenarios.yaml` gains a discrete free-standing job; matrix path unchanged. - `vitest.config.ts:e2e-scenarios-live` project picks up the new test file via the `test/e2e-scenario/live/**/*.test.ts` glob with no project-list edit. ## Inventory delta ``` status: not-migrated → covered targetVitestScenarios: [] → ["test/e2e-scenario/live/openshell-version-pin.test.ts"] deletionReady: false (deletion is a follow-up PR; #4357 approval required) ``` ## Next steps after this PR merges - Drive two consecutive GREEN runs of `openshell-version-pin-vitest` (one is happening on this PR; the second is enforced post-merge by re-dispatch on `main`). - Follow-up PR with `#4357` approval to delete `test/e2e/test-openshell-version-pin.sh` from `regression-e2e.yaml` and the bash file itself, transitioning the inventory entry from `covered` → `retired`. ## Convergence - ✅ **First dispatch GREEN** (post-pivot): [openshell-version-pin-vitest job 80480862915](https://github.com/NVIDIA/NemoClaw/actions/runs/27252871283/job/80480862915) on commit `3d9c01931`. - ✅ **Verification dispatch GREEN**: [openshell-version-pin-vitest job 80481579156](https://github.com/NVIDIA/NemoClaw/actions/runs/27253022534/job/80481579156) on the same commit. Two consecutive GREENs rule out false-pass. - ✅ **Legacy regression-e2e baseline GREEN**: last 2 dispatches that ran `openshell-version-pin-e2e` succeeded (runs 26720769662, 26718460056). Bash guard untouched. - 🔒 **Frozen at**: `3d9c01931f598cf2450bd88028cb3b44bcf367b0`. Inventory entry now records `frozenAtSha` and `convergenceEvidence`. ### Assertions resolved (in order) 1. `installer-exits-zero` — `result.status === 0` (commit `920876546`) 2. `download-log-contains-v0.0.44` — `expect(downloads).toContain("v0.0.44")` (commit `920876546`) 3. `download-log-excludes-v0.0.45` — `expect(downloads).not.toContain("v0.0.45")` (commit `920876546`) 4. `replaced-openshell-reports-0.0.44` — re-run `$fakeBin/openshell --version`; assert `0.0.44` and not `0.0.45` (commit `920876546`) All four legacy `[PASS]` assertions translated faithfully. No deferred assertions. ### Note for reviewers First-dispatch passed (in both pre- and post-pivot homes). This is the duplicative case the migration skill anticipates: `install-openshell.sh:272-275` already lands the v0.0.44 reinstall path, and sister tests already covered the entry point. The legacy bash guard's value-add was the happy-path completion through download + extraction + binary replacement, which is now expressed as a Vitest live test. **Please sanity-check the assertion list above against the four `[PASS]` lines in `test/e2e/test-openshell-version-pin.sh`** — if equivalence is confirmed, the migration is complete and the bash guard can be retired in a follow-up PR with `#4357` approval.  ## Summary by CodeRabbit * **Tests** * Added a live end-to-end Vitest to validate OpenShell version-pin behavior and prevent regressions, capturing installer output and download activity as artifacts. * **CI** * Added a dedicated CI job to run the new live test and upload related artifacts for troubleshooting. * **Chores** * Marked the legacy test as covered in the migration inventory and recorded convergence evidence and frozen revision metadata.  --------- Co-authored-by: Carlos Villela <cvillela@nvidia.com>

…+ workflow job Ports the live block of test/openclaw-tui-chat-correlation.test.ts into the typed scenario framework as a free-standing live test (#5049 pattern), plus a discrete openclaw-tui-chat-correlation-vitest job in e2e-vitest-scenarios.yaml modeled on #5107's openshell-version-pin-vitest. Coverage: - Onboards a fresh cloud OpenClaw sandbox via OnboardingPhaseFixture.from with the cloud-openclaw profile (already in SUPPORTED_ONBOARDING). - Asserts the pinned 2026.5.27 OpenClaw version (host-side probe via openshell sandbox exec ... openclaw --version). Override via E2E_OPENCLAW_TUI_CORRELATION_PINNED_VERSION when bumping. - Idempotent in-sandbox gateway start on port 18789 via SandboxClient.execShell. - Uploads the websocket repro driver via SandboxClient.upload, runs it via execShell with the gateway auth token read from /sandbox/.openclaw/openclaw.json. - Asserts the #2603 + #3145 contract: no empty finals for submitted runs, no uncorrelated replies, no missing/duplicate replies, no missing/duplicate user turns. Preserves the looksLikeEventCaptureFailure retry-once guard (OpenClaw-side observability race; remove when the runtime exposes a deterministic chat readiness ack). Inventory: bridge-probe entry for test/e2e/test-openclaw-tui-chat-correlation.sh now references this file as its bridge probe (satisfies the inventory invariant that bridge-probe entries have non-empty bridgeProbes pointing at real paths). Bash workflow (nightly-e2e.yaml openclaw-tui-chat-correlation-e2e job) remains untouched per the migration freeze policy; deletion is a follow-up PR with #4357 approval after typed scenario soaks. Refs: #4347 #2603 #3145 #5098 #5049 #5107

Pass 2 of phase-5 convergence on dispatch 27284469157. Line moved forward (75ms → 127s) clearing all Phase 1 prereqs and the Phase 2 first onboard. New blocking assertion: sandbox-exists-after-interrupt (#9 in chain), failing again with `spawn openshell ENOENT` \u2014 same root cause class as pass 1, missed during the previous fix because the sandbox.exists() call passed no options so options.env was empty. Fix: thread `env: buildAvailabilityProbeEnv()` into the exists() call. The SandboxClient correctly does not auto-supply env (mirrors HostCliClient.expectNemoclawAvailable convention \u2014 callers stay explicit about the env boundary, helpers-not-bridges per #5049). Refs: #4348, #5098

…ork helpers Captures the NVIDIA#2701 contract — gateway recovery must restore the /tmp guard chain before launching, not just warn about it — in two formats so the bug shows up in CI immediately: 1. Legacy bash flip — test/e2e/test-issue-2478-crash-loop-recovery.sh Phase 4 keeps the existing NVIDIA#2478 WARNING assertion and now also asserts the guard chain is restored after recovery. Will fail on main; flips green when the NVIDIA#2701 fix lands. The fail() is deferred so Phase 4 restore + Phase 5 soak still execute and the artifact bundle stays comparable to historical runs. Runs in the existing `issue-2478-crash-loop-recovery-e2e` job in nightly-e2e.yaml — no workflow change needed. 2. New Vitest live E2E — test/e2e-scenario/live/gateway-guard-recovery.test.ts First slice migrated from the legacy 2478 harness. Asserts the same NVIDIA#2701 contract using typed framework fixtures and runs in the Vitest workflow only — the bash and Vitest suites stay completely separate and do not call each other. Wired into e2e-vitest-scenarios.yaml as a new `gateway-guard-recovery` job (free-standing, not part of the registry-driven matrix). 3. Framework helpers — sandbox-lifecycle disruption-and-recovery primitives: - GatewayClient.expectGuardChainActive — reads /tmp/nemoclaw-proxy-env.sh (not /proc/<pid>/environ — kernel.yama.ptrace_scope=1 blocks cross-tree environ reads) and asserts the expected --require markers are present. - GatewayClient.expectLogContains / expectLogDoesNotContain — tail and grep /tmp/gateway.log inside the sandbox. - GatewayClient.expectPidStable — sample gateway PID over a window, fail on change (crash-loop) or disappearance. - GatewayClient.resolveGatewayPid — typed wrapper around the legacy two-pass ps detection. - SandboxClient.wipeGuardChain — remove /tmp/nemoclaw-proxy-env.sh plus the seven guard preload files; equivalent to a fresh container's /tmp after pod recreate. - SandboxClient.killGatewayTree — pkill -9 -f '[o]penclaw' + verify. - 21 unit tests covering the new helpers in test/e2e-scenario/framework-tests/e2e-recovery-helpers.test.ts Phases 1, 2, 3, and 5 of the legacy 2478 script (initial-state, normal crash-recovery loop, idle soak) remain in bash pending a follow-up migration PR per the migration doc's "preserve the requirement before removing legacy runner pieces" policy. Migration inventory updated to record the partial Vitest target. Refs NVIDIA#2701 NVIDIA#2478 NVIDIA#4356 NVIDIA#4941 Signed-off-by: J. Yaunches <jyaunches@nvidia.com>

jyaunches force-pushed the fix/2701-failing-e2e-guard-pr-a branch 3 times, most recently from ff460c6 to 4b59d59 Compare June 9, 2026 18:32

jyaunches force-pushed the fix/2701-failing-e2e-guard-pr-a branch 7 times, most recently from bbe042d to 1c95cbd Compare June 9, 2026 20:55

jyaunches force-pushed the fix/2701-failing-e2e-guard-pr-a branch from 1c95cbd to 9bcdab8 Compare June 9, 2026 21:19

wscurran added area: e2e End-to-end tests, nightly failures, or validation infrastructure area: sandbox OpenShell sandbox lifecycle, runtime, config, or recovery feature PR adds or expands user-visible functionality platform: dgx-spark Affects DGX Spark hardware or workflows labels Jun 9, 2026

jyaunches added the v0.0.62 Release target label Jun 9, 2026

jyaunches marked this pull request as ready for review June 9, 2026 22:27

coderabbitai Bot reviewed Jun 9, 2026

View reviewed changes

jyaunches force-pushed the fix/2701-failing-e2e-guard-pr-a branch from 9bcdab8 to 671aa71 Compare June 10, 2026 00:05

coderabbitai Bot reviewed Jun 10, 2026

View reviewed changes

Comment thread .github/workflows/e2e-vitest-scenarios.yaml

Comment thread test/e2e-scenario/fixtures/clients/gateway.ts

Comment thread test/e2e-scenario/framework/clients/gateway.ts

Comment thread test/e2e-scenario/fixtures/clients/gateway.ts

jyaunches mentioned this pull request Jun 10, 2026

Epic: Migrate legacy bash E2E into the Vitest E2E system #5098

Open

79 tasks

cv removed the v0.0.62 Release target label Jun 10, 2026

cv added the v0.0.63 Release target label Jun 10, 2026

jyaunches mentioned this pull request Jun 10, 2026

test(e2e): migrate test-openshell-version-pin.sh to free-standing Vitest live test #5107

Merged

This was referenced Jun 10, 2026

test(e2e): add Vitest live coverage for onboard resume #5147

Merged

test(e2e): add OpenClaw TUI chat-correlation coverage slice #5150

Merged

wscurran assigned jyaunches Jun 10, 2026

ahunnargikar-nvidia requested a review from cv June 10, 2026 16:42

jyaunches added v0.0.64 Release target and removed v0.0.63 Release target labels Jun 11, 2026

jyaunches force-pushed the fix/2701-failing-e2e-guard-pr-a branch from b5a86da to fd608fb Compare June 11, 2026 13:25

jyaunches merged commit d338508 into NVIDIA:main Jun 11, 2026
29 checks passed

coderabbitai Bot mentioned this pull request Jun 11, 2026

test(e2e): migrate test-skill-agent-e2e.sh to vitest #5222

Merged

sandl99 mentioned this pull request Jun 12, 2026

test(e2e): revert failing #2701 guard assertion #5320

Closed

12 tasks

coderabbitai Bot mentioned this pull request Jun 12, 2026

fix(agent): restore gateway guard chain during recovery #5321

Merged

12 tasks

jyaunches mentioned this pull request Jun 12, 2026

fix(recovery): restore OpenClaw guard chain #5340

Closed

coderabbitai Bot mentioned this pull request Jun 12, 2026

fix(agent): harden recovery proxy env sourcing #5342

Merged

12 tasks

Conversation

jyaunches commented Jun 9, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Suite separation

Related Issue

Sequencing

Changes

Type of Change

Verification

Summary by CodeRabbit

Uh oh!

copy-pr-bot Bot commented Jun 9, 2026

Uh oh!

coderabbitai Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot commented Jun 9, 2026

Selective E2E Results — ❌ Some jobs failed

Uh oh!

github-actions Bot commented Jun 9, 2026

Vitest E2E Scenario Results — ❌ Some jobs failed

Uh oh!

github-actions Bot commented Jun 9, 2026

Vitest E2E Scenario Results — ❌ Some jobs failed

Uh oh!

wscurran commented Jun 9, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Jun 10, 2026

Vitest E2E Scenario Results — ❌ Some jobs failed

Uh oh!

github-actions Bot commented Jun 10, 2026

Selective E2E Results — ❌ Some jobs failed

Uh oh!

cv commented Jun 10, 2026

Hybrid framing note

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jyaunches commented Jun 9, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 9, 2026 •

edited

Loading