Skip to content

test(e2e): failing-test-first guard for #2701 + recovery framework helpers#5049

Merged
jyaunches merged 1 commit into
NVIDIA:mainfrom
jyaunches:fix/2701-failing-e2e-guard-pr-a
Jun 11, 2026
Merged

test(e2e): failing-test-first guard for #2701 + recovery framework helpers#5049
jyaunches merged 1 commit into
NVIDIA:mainfrom
jyaunches:fix/2701-failing-e2e-guard-pr-a

Conversation

@jyaunches

@jyaunches jyaunches commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Summary

Failing-test-first regression guard for #2701: gateway recovery currently warns about a missing /tmp guard chain after a pod recreate but launches the gateway naked anyway, triggering the @homebridge/ciao crash loop documented in the issue (5-minute rebuild --yes is the only manual recovery on DGX Spark).

This PR captures the #2701 contract in two formats so the bug shows up in CI before the fix lands. The bash and Vitest suites stay completely separate and do not call each other.

  1. Legacy bash fliptest/e2e/test-issue-2478-crash-loop-recovery.sh Phase 4 keeps the existing [DGX Spark] Gateway crash loop on startup: @homebridge/ciao networkInterfaces() returns EPERM in OpenShell sandbox #2478 WARNING assertion and now also asserts the guard chain is restored after recovery. Will be red on main; flips green when the [DGX Spark] Host reboot bricks sandbox until 5-min rebuild --yes: connect recovery path warns about missing /tmp guards but launches gateway naked → @homebridge/ciao crash loop #2701 fix lands. The new fail() is deferred so Phase 4 restore + Phase 5 soak still execute and the artifact bundle stays comparable to historical runs. Runs in the existing issue-2478-crash-loop-recovery-e2e job in nightly-e2e.yaml — no workflow change.

  2. New Vitest live E2Etest/e2e-scenario/live/gateway-guard-recovery.test.ts. First slice migrated from the legacy 2478 harness. Asserts the same [DGX Spark] Host reboot bricks sandbox until 5-min rebuild --yes: connect recovery path warns about missing /tmp guards but launches gateway naked → @homebridge/ciao crash loop #2701 contract using typed framework fixtures and runs in the Vitest workflow only (e2e-vitest-scenarios.yaml) as a new free-standing gateway-guard-recovery job. Not part of the registry-driven matrix because recovery scenarios don't fit the steady-state expected-state probe model.

  3. Framework helpers — sandbox-lifecycle disruption-and-recovery primitives, reusable for any future recovery scenario:

    • GatewayClient.expectGuardChainActive — reads /tmp/nemoclaw-proxy-env.sh (not /proc/<pid>/environ; kernel.yama.ptrace_scope=1 blocks cross-tree environ reads — same approach as the legacy gateway_guards_active shell helper) and asserts the expected --require markers are present.
    • GatewayClient.expectLogContains / expectLogDoesNotContain — tail and grep /tmp/gateway.log inside the sandbox.
    • GatewayClient.expectPidStable — sample gateway PID over a window, fail on change (crash-loop) or disappearance.
    • GatewayClient.resolveGatewayPid — typed wrapper around the legacy two-pass ps detection.
    • SandboxClient.wipeGuardChain — remove /tmp/nemoclaw-proxy-env.sh plus the seven guard preload files; equivalent shape to a fresh container's /tmp after pod recreate.
    • SandboxClient.killGatewayTreepkill -9 -f '[o]penclaw' + verify nothing came back.
    • 21 unit tests in test/e2e-scenario/framework-tests/e2e-recovery-helpers.test.ts covering the new helpers (mocked runner, no live deps).

Phases 1, 2, 3, and 5 of the legacy 2478 script (initial-state, normal crash-recovery loop, idle soak) remain in bash pending a follow-up migration PR per the migration doc's "preserve the requirement before removing legacy runner pieces" policy. Migration inventory updated to record the partial Vitest target.

Suite separation

  • nightly-e2e.yaml — bash E2E only. Untouched by this PR. The issue-2478-crash-loop-recovery-e2e job already runs the legacy script; the new Phase 4 assertion lights up automatically.
  • e2e-vitest-scenarios.yaml — Vitest E2E only. Gains the new gateway-guard-recovery job. workflow_dispatch only today, matching the existing Vitest workflow's trigger model.

The two suites do not call each other and do not share workflow definitions. PR-C may revisit the Vitest workflow's trigger model (e.g. add a schedule) but that's out of scope for this PR.

Related Issue

Refs #2701 #2478 #4356 #4941

Sequencing

Changes

  • .github/workflows/e2e-vitest-scenarios.yaml — new gateway-guard-recovery free-standing job (not in the matrix).
  • test/e2e-scenario/framework/clients/gateway.ts — added SandboxClient constructor dep + 5 new helpers.
  • test/e2e-scenario/framework/clients/sandbox.ts — added wipeGuardChain and killGatewayTree disruption helpers.
  • test/e2e-scenario/framework/e2e-test.ts — wire sandbox fixture into gateway fixture (chain stays acyclic).
  • test/e2e-scenario/framework-tests/e2e-clients.test.ts and e2e-phase-state-validation.test.ts — updated existing test constructors to the new GatewayClient(host, sandbox) signature.
  • test/e2e-scenario/framework-tests/e2e-recovery-helpers.test.ts (new) — 21 unit tests for the new helpers.
  • test/e2e-scenario/live/gateway-guard-recovery.test.ts (new) — failing live E2E.
  • test/e2e-scenario/migration/legacy-inventory.json — record the Vitest target for test/e2e/test-issue-2478-crash-loop-recovery.sh.
  • test/e2e/test-issue-2478-crash-loop-recovery.sh — Phase 4 [DGX Spark] Host reboot bricks sandbox until 5-min rebuild --yes: connect recovery path warns about missing /tmp guards but launches gateway naked → @homebridge/ciao crash loop #2701 contract assertion (deferred fail).
  • vitest.config.ts — exclude test/e2e-scenario/live/** from the cli project so pre-commit Test (cli) doesn't accidentally pick up live tests that need Docker. Pre-existing issue surfaced by this PR.

nightly-e2e.yaml is intentionally untouched.

Type of Change

  • Code change (feature, bug fix, or refactor)
  • Code change with doc updates
  • Doc only (prose changes, no code sample modifications)
  • Doc only (includes code sample changes)

Verification

  • npx vitest run --project e2e-scenario-framework — 320 passed (baseline 299 + 21 new).
  • npx prek run --all-files — pre-existing flake in runtime-hermes-secret-boundary-behavioural.test.ts and slow tests in nemoclaw-start.test.ts time out under prek's parallel load (unrelated to this PR; same behavior on main).
  • Tests added for new behavior (21 unit tests for helpers; 1 live E2E test as the failing-test-first regression guard)
  • No secrets, API keys, or credentials committed
  • Docs updated for user-facing behavior changes (none — internal test infra only)

The new Vitest live E2E and the bash Phase 4 assertion are expected to fail on main-equivalent CI on this PR — that is the regression-guard signal. They will flip to green when the #2701 fix lands in PR-B.


Signed-off-by: J. Yaunches jyaunches@nvidia.com

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Tests

    • Added comprehensive end-to-end tests for gateway recovery after disruptions.
    • Enhanced test fixtures with recovery and health assertion capabilities.
  • Chores

    • Improved CI/CD workflow with manual dispatch support and enhanced PR reporting.
    • Updated test configuration and helpers for recovery scenario validation.

@copy-pr-bot

copy-pr-bot Bot commented Jun 9, 2026

Copy link
Copy Markdown

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@coderabbitai

coderabbitai Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR implements gateway guard-chain recovery verification by adding sandbox-aware test client APIs, sandbox disruption helpers, comprehensive unit tests, a live E2E scenario, and CI workflow wiring to validate that gateway recovery restores the /tmp guard chain after pod-recreate wipes.

Changes

Gateway Guard Recovery Feature

Layer / File(s) Summary
Gateway & Sandbox client recovery APIs
test/e2e-scenario/fixtures/clients/gateway.ts, test/e2e-scenario/fixtures/clients/sandbox.ts
Adds GatewayClient recovery methods (resolveGatewayPid, expectGuardChainActive, expectLogContains, expectPidStable) and SandboxClient disruption helpers (wipeGuardChain, killGatewayTree). Both clients now use shared probe environments with explicit OPENSHELL_GATEWAY injection, and SandboxClient.status() is updated to use --name semantics.
Unit tests for recovery and disruption
test/e2e-scenario/support-tests/e2e-recovery-helpers.test.ts
Adds comprehensive test coverage via a ScriptedRunner harness, validating GatewayClient probe helpers (guard markers, log tails, PID parsing, PID stability polling) and SandboxClient disruption helpers (guard-chain wiping, gateway tree killing) across success and failure cases.
Fixture wiring & client test updates
test/e2e-scenario/fixtures/e2e-test.ts, test/e2e-scenario/support-tests/e2e-clients.test.ts, test/e2e-scenario/support-tests/e2e-phase-state-validation.test.ts
Reorders fixtures to instantiate sandbox before gateway, updates gateway fixture and framework tests to construct GatewayClient(host, sandbox), and adjusts OpenShell command expectations to use --name placement.
Live gateway-guard-recovery scenario
test/e2e-scenario/live/gateway-guard-recovery.test.ts
Provisions a Docker sandbox, asserts guard chain is active after onboarding, simulates pod-recreate disruption (wipe + kill), triggers recovery via nemoclaw connect --probe-only with probe environment, and validates guard restoration, absence of recovery warnings, and PID stability.
CI workflow: gateway-guard-recovery job and reporting
.github/workflows/e2e-vitest-scenarios.yaml
Adds optional workflow_dispatch pr_number input, introduces gateway-guard-recovery job that installs OpenShell and runs the live test, and updates report-to-pr to depend on all jobs and post markdown results to a targeted or inferred PR.
Legacy integration & vitest config
test/e2e/test-issue-2478-crash-loop-recovery.sh, vitest.config.ts
Updates legacy crash-loop script to assert guard-chain activation post-respawn; adds vitest config comments explaining live scenario exclusion from CLI tests.
Test input construction tweaks
src/lib/security/credential-filter.test.ts, test/snapshot.test.ts
Minor test data changes: X-API-Key and Slack botToken test inputs expressed via string concatenation.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related issues

  • NVIDIA/NemoClaw#2701: PR directly implements E2E tests and sandbox/gateway helpers that simulate the guard-chain recovery scenario described in the issue.
  • NVIDIA/NemoClaw#4999: PR extends the same Vitest fixture/client framework and CI workflow orchestration described by the epic.

Possibly related PRs

  • NVIDIA/NemoClaw#5101: Shares a credential-filter.test.ts edit pattern for secret-value construction.
  • NVIDIA/NemoClaw#5107: Introduces the openshell-version-pin-vitest job that this PR's workflow now depends on.

Suggested reviewers

  • cv
  • prekshivyas

Poem

🐰 A gateway guard stands firm and true,
Through pod-recreate and wipes anew,
With probes and helpers all in place,
The recovery dance finds its grace,
CI logs the victory's trace. 🔐

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 28.57% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'test(e2e): failing-test-first guard for #2701 + recovery framework helpers' accurately reflects the main changes: introducing E2E test guards and framework helpers for issue #2701.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@jyaunches jyaunches force-pushed the fix/2701-failing-e2e-guard-pr-a branch 3 times, most recently from ff460c6 to 4b59d59 Compare June 9, 2026 18:32
@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Selective E2E Results — ❌ Some jobs failed

Run: 27227063905
Target ref: ff460c66fcb07a2584882bad910bf54ef1f54b35
Workflow ref: main
Requested jobs: issue-2478-crash-loop-recovery-e2e
Summary: 0 passed, 1 failed, 0 skipped

Job Result
issue-2478-crash-loop-recovery-e2e ❌ failure

Failed jobs: issue-2478-crash-loop-recovery-e2e. Check run artifacts for logs.

@jyaunches jyaunches force-pushed the fix/2701-failing-e2e-guard-pr-a branch 7 times, most recently from bbe042d to 1c95cbd Compare June 9, 2026 20:55
@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Vitest E2E Scenario Results — ❌ Some jobs failed

Run: 27235281178
Workflow ref: fix/2701-failing-e2e-guard-pr-a
Requested scenarios: (default — all supported)
Summary: 2 passed, 1 failed, 0 skipped

Job Result
gateway-guard-recovery ❌ failure
generate-matrix ✅ success
live-scenarios ✅ success

Failed jobs: gateway-guard-recovery. Check run artifacts for logs.

@jyaunches jyaunches force-pushed the fix/2701-failing-e2e-guard-pr-a branch from 1c95cbd to 9bcdab8 Compare June 9, 2026 21:19
@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Vitest E2E Scenario Results — ❌ Some jobs failed

Run: 27236542721
Workflow ref: fix/2701-failing-e2e-guard-pr-a
Requested scenarios: (default — all supported)
Summary: 2 passed, 1 failed, 0 skipped

Job Result
gateway-guard-recovery ❌ failure
generate-matrix ✅ success
live-scenarios ✅ success

Failed jobs: gateway-guard-recovery. Check run artifacts for logs.

@wscurran wscurran added area: e2e End-to-end tests, nightly failures, or validation infrastructure area: sandbox OpenShell sandbox lifecycle, runtime, config, or recovery feature PR adds or expands user-visible functionality platform: dgx-spark Affects DGX Spark hardware or workflows labels Jun 9, 2026
@jyaunches jyaunches added the v0.0.62 Release target label Jun 9, 2026
@jyaunches jyaunches marked this pull request as ready for review June 9, 2026 22:27

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
.github/workflows/e2e-vitest-scenarios.yaml (1)

240-244: ⚡ Quick win

Consider explicit input validation as defense in depth.

Static analysis flagged potential template injection at lines 243-244 where inputs.pr_number and inputs.scenarios are expanded via ${{ toJSON(...) }}. While toJSON() should properly escape strings and inputs.scenarios is already validated via regex in the generate-matrix job (lines 54-57), adding explicit validation in this script provides defense in depth and makes the security posture more obvious to future maintainers.

🛡️ Optional: Add explicit validation
         with:
           script: |
             const needs = ${{ toJSON(needs) }};
             const runUrl = `${context.serverUrl}/${context.repo.owner}/${context.repo.repo}/actions/runs/${context.runId}`;
             const workflowBranch = context.ref.replace('refs/heads/', '');
             const prNumberInput = ${{ toJSON(inputs.pr_number) }} || '';
             const requestedScenarios = ${{ toJSON(inputs.scenarios) }} || '';
+            
+            // Validate inputs (defense in depth)
+            if (prNumberInput && !/^\d+$/.test(prNumberInput)) {
+              core.setFailed(`Invalid pr_number input: ${prNumberInput}`);
+              return;
+            }
+            if (requestedScenarios && !/^[A-Za-z0-9_-]+(,[A-Za-z0-9_-]+)*$/.test(requestedScenarios)) {
+              core.setFailed(`Invalid scenarios input: ${requestedScenarios}`);
+              return;
+            }

             let prNumber = prNumberInput ? Number.parseInt(prNumberInput, 10) : undefined;
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/e2e-vitest-scenarios.yaml around lines 240 - 244, The
template expands user inputs into prNumberInput and requestedScenarios without
runtime checks; add explicit validation/sanitization immediately after those
assignments: ensure prNumberInput contains only digits (or is empty) and
coerce/clear it otherwise, and ensure requestedScenarios matches the allowed
pattern (reuse the regex from generate-matrix or a conservative whitelist for
scenario names, splitting and validating each item) falling back to an empty
string or failing fast; update the block where prNumberInput and
requestedScenarios are defined to perform these checks and log or handle invalid
input accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In @.github/workflows/e2e-vitest-scenarios.yaml:
- Around line 240-244: The template expands user inputs into prNumberInput and
requestedScenarios without runtime checks; add explicit validation/sanitization
immediately after those assignments: ensure prNumberInput contains only digits
(or is empty) and coerce/clear it otherwise, and ensure requestedScenarios
matches the allowed pattern (reuse the regex from generate-matrix or a
conservative whitelist for scenario names, splitting and validating each item)
falling back to an empty string or failing fast; update the block where
prNumberInput and requestedScenarios are defined to perform these checks and log
or handle invalid input accordingly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 103670ef-451b-45cf-a084-34a82f83eb20

📥 Commits

Reviewing files that changed from the base of the PR and between 2f1c2da and 9bcdab8.

📒 Files selected for processing (12)
  • .github/workflows/e2e-vitest-scenarios.yaml
  • src/commands/sandbox/agents/list.ts
  • test/e2e-scenario/framework-tests/e2e-clients.test.ts
  • test/e2e-scenario/framework-tests/e2e-phase-state-validation.test.ts
  • test/e2e-scenario/framework-tests/e2e-recovery-helpers.test.ts
  • test/e2e-scenario/framework/clients/gateway.ts
  • test/e2e-scenario/framework/clients/sandbox.ts
  • test/e2e-scenario/framework/e2e-test.ts
  • test/e2e-scenario/live/gateway-guard-recovery.test.ts
  • test/e2e-scenario/migration/legacy-inventory.json
  • test/e2e/test-issue-2478-crash-loop-recovery.sh
  • vitest.config.ts

@jyaunches jyaunches force-pushed the fix/2701-failing-e2e-guard-pr-a branch from 9bcdab8 to 671aa71 Compare June 10, 2026 00:05

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/e2e-vitest-scenarios.yaml:
- Around line 243-259: prNumber currently uses Number.parseInt on prNumberInput
and silently falls back to finding the first open PR when parsing fails; change
the logic so a non-empty prNumberInput that is malformed causes the workflow to
fail instead of retrying: when prNumberInput is non-empty validate it with a
strict check (e.g., /^\d+$/ or compare parsed number to the trimmed input)
before assigning prNumber, and if validation fails call core.setFailed or throw
an error (do not call github.rest.pulls.list or use the workflowBranch
fallback); keep the existing branch-only lookup behavior only when prNumberInput
is empty.

In `@test/e2e-scenario/framework/clients/gateway.ts`:
- Around line 140-147: The probe options are built incorrectly: helper-specific
env from probeEnv() is overwritten by caller options because options is spread
after env; update the probe options construction in GatewayClient and
SandboxClient so non-env fields are spread first and env is merged explicitly
(env: { ...probeEnv(), ...options?.env }) before passing to this.sandbox.exec
(references: GatewayClient, SandboxClient, probeEnv, and the this.sandbox.exec
call building artifactName/gateway-guard-chain-...). Ensure you apply this same
merge pattern wherever helper probe options are assembled so OPENSHELL_GATEWAY
and other helper vars are preserved.
- Around line 221-223: The loop undercounts total wait time when durationSeconds
isn't a multiple of pollIntervalSeconds: change the sample calculation to use
Math.ceil instead of Math.floor (i.e., compute samples = Math.max(1,
Math.ceil(options.durationSeconds / pollIntervalSeconds))) so the for-loop that
calls sleepSeconds(pollIntervalSeconds) runs long enough to cover at least
durationSeconds; update any related comments/tests referencing samples if
present.
- Around line 241-255: tailLog currently ignores the sandbox.exec exit code so a
missing/unreadable GATEWAY_LOG_PATH yields an empty string and lets
expectLogDoesNotContain false-pass; update tailLog (the private async tailLog in
gateway.ts) to check the result from this.sandbox.exec (the call that runs
["sh","-c", `tail -n ${lines} ${GATEWAY_LOG_PATH} 2>/dev/null`]) and throw an
Error when result.exitCode is non-zero (include result.stderr or a descriptive
message) so callers like expectLogDoesNotContain fail when tail could not read
the file.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 36971dbc-e347-4f83-80c9-e1b2d0ca1b34

📥 Commits

Reviewing files that changed from the base of the PR and between 9bcdab8 and 671aa71.

📒 Files selected for processing (11)
  • .github/workflows/e2e-vitest-scenarios.yaml
  • test/e2e-scenario/framework-tests/e2e-clients.test.ts
  • test/e2e-scenario/framework-tests/e2e-phase-state-validation.test.ts
  • test/e2e-scenario/framework-tests/e2e-recovery-helpers.test.ts
  • test/e2e-scenario/framework/clients/gateway.ts
  • test/e2e-scenario/framework/clients/sandbox.ts
  • test/e2e-scenario/framework/e2e-test.ts
  • test/e2e-scenario/live/gateway-guard-recovery.test.ts
  • test/e2e-scenario/migration/legacy-inventory.json
  • test/e2e/test-issue-2478-crash-loop-recovery.sh
  • vitest.config.ts
🚧 Files skipped from review as they are similar to previous changes (7)
  • test/e2e-scenario/live/gateway-guard-recovery.test.ts
  • test/e2e-scenario/framework-tests/e2e-phase-state-validation.test.ts
  • test/e2e-scenario/migration/legacy-inventory.json
  • test/e2e/test-issue-2478-crash-loop-recovery.sh
  • test/e2e-scenario/framework/e2e-test.ts
  • test/e2e-scenario/framework-tests/e2e-recovery-helpers.test.ts
  • test/e2e-scenario/framework-tests/e2e-clients.test.ts

Comment thread .github/workflows/e2e-vitest-scenarios.yaml
Comment thread test/e2e-scenario/fixtures/clients/gateway.ts
Comment thread test/e2e-scenario/framework/clients/gateway.ts
Comment thread test/e2e-scenario/fixtures/clients/gateway.ts
@github-actions

Copy link
Copy Markdown
Contributor

Vitest E2E Scenario Results — ❌ Some jobs failed

Run: 27244789650
Workflow ref: fix/2701-failing-e2e-guard-pr-a
Requested scenarios: (default — all supported)
Summary: 2 passed, 1 failed, 0 skipped

Job Result
gateway-guard-recovery ❌ failure
generate-matrix ✅ success
live-scenarios ✅ success

Failed jobs: gateway-guard-recovery. Check run artifacts for logs.

@github-actions

Copy link
Copy Markdown
Contributor

Selective E2E Results — ❌ Some jobs failed

Run: 27244790666
Target ref: 671aa71ac9cc53b92e3becb0d438703489e38a56
Workflow ref: main
Requested jobs: issue-2478-crash-loop-recovery-e2e
Summary: 0 passed, 1 failed, 0 skipped

Job Result
issue-2478-crash-loop-recovery-e2e ❌ failure

Failed jobs: issue-2478-crash-loop-recovery-e2e. Check run artifacts for logs.

@cv cv removed the v0.0.62 Release target label Jun 10, 2026
@cv cv added the v0.0.63 Release target label Jun 10, 2026
jyaunches added a commit that referenced this pull request Jun 10, 2026
…-pin

Migrates test/e2e/test-openshell-version-pin.sh (regression guard for #3474)
to a free-standing Vitest live test under test/e2e-scenario/live/.

The legacy script is a hermetic installer-script behavioral test: it runs
scripts/install-openshell.sh under a stubbed PATH where the already-installed
openshell reports a too-new version (0.0.45) and the downloaded archives
produce a binary that reports the pinned 0.0.44. It does not exercise the
registry-driven steady-state probe model, so it lives outside baseline.ts as
a free-standing live test (per #5049).

Four [PASS] assertions translated faithfully:
- installer-exits-zero (result.status === 0)
- download-log-contains-v0.0.44
- download-log-excludes-v0.0.45
- replaced-openshell-reports-0.0.44

CI: discrete openshell-version-pin-vitest job in e2e-vitest-scenarios.yaml
runs on workflow_dispatch alongside the registry-driven matrix.

Bash guard retained in regression-e2e.yaml; deletion is a follow-up PR with
#4357 approval per the migration freeze policy.

Refs #4355, #3474
@cv

cv commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator

Hybrid framing note

Since #5098 cites this PR as the HYBRID precedent, I want to make the intended meaning explicit.

“Hybrid” should be a migration/convergence state, not a permanent architecture with two E2E systems. The useful lesson from this PR is not “some E2Es live forever in bash and some in Vitest.” The useful lesson is:

  • preserve the high-fidelity contract first;
  • add the typed/Vitest harness without losing the real system boundary;
  • keep existing bash coverage only while the Vitest representation is incomplete or while it is still proving equivalence.

If the contract requires shell behavior, the Vitest E2E can and should execute that real shell behavior: process signals, /proc reads, login shell state, bash install.sh, etc. The thing we should avoid preserving is a second bash test runner/harness.

So for follow-ups from this PR, the durable target should still be a Vitest E2E harness with boundary fixtures/helpers, not a long-term split suite. Related clarification is now on #4941, #5098, and #4999.

cv added a commit that referenced this pull request Jun 10, 2026
…est live test (#5107)

# Migration: `test-openshell-version-pin.sh` → free-standing Vitest live
test

**Refs**: #4355 (owner issue), #3474 (regression target), #4990 (E2E
architecture contract)

## What landed in this PR

A free-standing Vitest live test at
`test/e2e-scenario/live/openshell-version-pin.test.ts` plus a discrete
`openshell-version-pin-vitest` job in
`.github/workflows/e2e-vitest-scenarios.yaml`, modeled on #5049's
free-standing pattern. The legacy bash guard
`test/e2e/test-openshell-version-pin.sh` remains in place; deletion is a
follow-up PR with #4357 approval per the migration freeze policy.

## Architecture notes (course-correction during the loop)

The first pass of this migration tried to land the test in
`test/install-openshell-version-check.test.ts` under the existing
`installer-integration` Vitest project. It even passed CI twice there
(commit `a95edc02e`, runs 27251976992 and 27252048411). The argument was
that this is a hermetic installer-script unit test, not a scenario, and
sister tests already live in that file.

That argument **lost** to an explicit project contract:
`test/e2e-scenario/framework-tests/e2e-migration-inventory.test.ts`
enforces that every entry in `targetVitestScenarios` for a `covered`
legacy script must match `^test/e2e-scenario/live/.+\.test\.ts$`. The
schema is the binding architectural decision; the project has already
chosen `test/e2e-scenario/live/` as the canonical home for migrated
tests, regardless of whether they exercise the scenario framework's
primitives. The pivot adopts that decision.

Commits in this PR walk that history:
1. `873aaa2ee` — mark inventory `bridge-probe` (initial in-flight state)
2. `a95edc02e` — first sketch under installer-integration (rejected by
schema)
3. `4670cb556` — empty commit (verification dispatch trigger from the
rejected approach)
4. `9215a4f1f` — revert (2)
5. `920876546` — final sketch: free-standing live test + discrete CI job
6. `3d9c01931` — inventory mark `covered` pointing at the new live test

## Shape

- **Shape**: free-standing live test (not registry-driven)
- **Family**: hermetic installer-script behavioral test
- **Runner**: `ubuntu-latest`
- **Registry-driven**: no — does not fit `from(scenario.environment,
env) → from(scenario.expectedStateId, instance)`
- **Free-standing job**: `openshell-version-pin-vitest` in
`e2e-vitest-scenarios.yaml`

## Assertion chain (mapped from the bash script's four `[PASS]` checks)

1. `installer-exits-zero` — `result.status === 0` (happy path completes;
no "above the maximum" hard-fail).
2. `download-log-contains-v0.0.44` — pinned release tag was requested.
3. `download-log-excludes-v0.0.45` — too-new sticky version is never
re-fetched.
4. `replaced-openshell-reports-0.0.44` — binary on disk in active
install dir overwritten with pinned build.

## Failing-test-first contract

- **First-dispatch outcome predicted**: GREEN. The framework already
covers the regression target (`scripts/install-openshell.sh:272-275`
implements the v0.0.44 reinstall path; sister tests in
`test/install-openshell-version-check.test.ts` already cover the
entry-point invariants). This migration's residual coverage is the
happy-path completion, exercised via stubbed PATH binaries.
- **Skill 5.8 false-pass guard**: two consecutive GREEN dispatches
required of the new `openshell-version-pin-vitest` job before
convergence is recorded in the inventory entry's `frozenAtSha` /
`convergenceEvidence` fields.
- Local `vitest run --project e2e-scenarios-live
test/e2e-scenario/live/openshell-version-pin.test.ts` passes; inventory
schema-lock test passes.

## Suite separation honored

- `regression-e2e.yaml` (bash) untouched. Legacy script remains in
place; deletion is a follow-up PR with #4357 approval.
- `e2e-vitest-scenarios.yaml` gains a discrete free-standing job; matrix
path unchanged.
- `vitest.config.ts:e2e-scenarios-live` project picks up the new test
file via the `test/e2e-scenario/live/**/*.test.ts` glob with no
project-list edit.

## Inventory delta

```
status: not-migrated → covered
targetVitestScenarios: [] → ["test/e2e-scenario/live/openshell-version-pin.test.ts"]
deletionReady: false (deletion is a follow-up PR; #4357 approval required)
```

## Next steps after this PR merges

- Drive two consecutive GREEN runs of `openshell-version-pin-vitest`
(one is happening on this PR; the second is enforced post-merge by
re-dispatch on `main`).
- Follow-up PR with `#4357` approval to delete
`test/e2e/test-openshell-version-pin.sh` from `regression-e2e.yaml` and
the bash file itself, transitioning the inventory entry from `covered` →
`retired`.


## Convergence

- ✅ **First dispatch GREEN** (post-pivot): [openshell-version-pin-vitest
job
80480862915](https://github.com/NVIDIA/NemoClaw/actions/runs/27252871283/job/80480862915)
on commit `3d9c01931`.
- ✅ **Verification dispatch GREEN**: [openshell-version-pin-vitest job
80481579156](https://github.com/NVIDIA/NemoClaw/actions/runs/27253022534/job/80481579156)
on the same commit. Two consecutive GREENs rule out false-pass.
- ✅ **Legacy regression-e2e baseline GREEN**: last 2 dispatches that ran
`openshell-version-pin-e2e` succeeded (runs 26720769662, 26718460056).
Bash guard untouched.
- 🔒 **Frozen at**: `3d9c01931f598cf2450bd88028cb3b44bcf367b0`. Inventory
entry now records `frozenAtSha` and `convergenceEvidence`.

### Assertions resolved (in order)

1. `installer-exits-zero` — `result.status === 0` (commit `920876546`)
2. `download-log-contains-v0.0.44` —
`expect(downloads).toContain("v0.0.44")` (commit `920876546`)
3. `download-log-excludes-v0.0.45` —
`expect(downloads).not.toContain("v0.0.45")` (commit `920876546`)
4. `replaced-openshell-reports-0.0.44` — re-run `$fakeBin/openshell
--version`; assert `0.0.44` and not `0.0.45` (commit `920876546`)

All four legacy `[PASS]` assertions translated faithfully. No deferred
assertions.

### Note for reviewers

First-dispatch passed (in both pre- and post-pivot homes). This is the
duplicative case the migration skill anticipates:
`install-openshell.sh:272-275` already lands the v0.0.44 reinstall path,
and sister tests already covered the entry point. The legacy bash
guard's value-add was the happy-path completion through download +
extraction + binary replacement, which is now expressed as a Vitest live
test. **Please sanity-check the assertion list above against the four
`[PASS]` lines in `test/e2e/test-openshell-version-pin.sh`** — if
equivalence is confirmed, the migration is complete and the bash guard
can be retired in a follow-up PR with `#4357` approval.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Tests**
* Added a live end-to-end Vitest to validate OpenShell version-pin
behavior and prevent regressions, capturing installer output and
download activity as artifacts.
* **CI**
* Added a dedicated CI job to run the new live test and upload related
artifacts for troubleshooting.
* **Chores**
* Marked the legacy test as covered in the migration inventory and
recorded convergence evidence and frozen revision metadata.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Carlos Villela <cvillela@nvidia.com>
jyaunches added a commit that referenced this pull request Jun 10, 2026
…+ workflow job

Ports the live block of test/openclaw-tui-chat-correlation.test.ts into the
typed scenario framework as a free-standing live test (#5049 pattern), plus
a discrete openclaw-tui-chat-correlation-vitest job in
e2e-vitest-scenarios.yaml modeled on #5107's openshell-version-pin-vitest.

Coverage:
- Onboards a fresh cloud OpenClaw sandbox via OnboardingPhaseFixture.from
  with the cloud-openclaw profile (already in SUPPORTED_ONBOARDING).
- Asserts the pinned 2026.5.27 OpenClaw version (host-side probe via
  openshell sandbox exec ... openclaw --version). Override via
  E2E_OPENCLAW_TUI_CORRELATION_PINNED_VERSION when bumping.
- Idempotent in-sandbox gateway start on port 18789 via SandboxClient.execShell.
- Uploads the websocket repro driver via SandboxClient.upload, runs it via
  execShell with the gateway auth token read from /sandbox/.openclaw/openclaw.json.
- Asserts the #2603 + #3145 contract: no empty finals for submitted runs,
  no uncorrelated replies, no missing/duplicate replies, no missing/duplicate
  user turns. Preserves the looksLikeEventCaptureFailure retry-once guard
  (OpenClaw-side observability race; remove when the runtime exposes a
  deterministic chat readiness ack).

Inventory: bridge-probe entry for test/e2e/test-openclaw-tui-chat-correlation.sh
now references this file as its bridge probe (satisfies the inventory invariant
that bridge-probe entries have non-empty bridgeProbes pointing at real paths).

Bash workflow (nightly-e2e.yaml openclaw-tui-chat-correlation-e2e job)
remains untouched per the migration freeze policy; deletion is a follow-up
PR with #4357 approval after typed scenario soaks.

Refs: #4347 #2603 #3145 #5098 #5049 #5107
jyaunches added a commit that referenced this pull request Jun 10, 2026
Pass 2 of phase-5 convergence on dispatch 27284469157. Line moved
forward (75ms → 127s) clearing all Phase 1 prereqs and the Phase 2
first onboard. New blocking assertion: sandbox-exists-after-interrupt
(#9 in chain), failing again with `spawn openshell ENOENT` \u2014 same root
cause class as pass 1, missed during the previous fix because the
sandbox.exists() call passed no options so options.env was empty.

Fix: thread `env: buildAvailabilityProbeEnv()` into the exists() call.
The SandboxClient correctly does not auto-supply env (mirrors
HostCliClient.expectNemoclawAvailable convention \u2014 callers stay explicit
about the env boundary, helpers-not-bridges per #5049).

Refs: #4348, #5098
@ahunnargikar-nvidia ahunnargikar-nvidia requested a review from cv June 10, 2026 16:42
@jyaunches jyaunches added v0.0.64 Release target and removed v0.0.63 Release target labels Jun 11, 2026
…ork helpers

Captures the NVIDIA#2701 contract — gateway recovery must restore the /tmp guard
chain before launching, not just warn about it — in two formats so the bug
shows up in CI immediately:

1. Legacy bash flip — test/e2e/test-issue-2478-crash-loop-recovery.sh
   Phase 4 keeps the existing NVIDIA#2478 WARNING assertion and now also
   asserts the guard chain is restored after recovery. Will fail on
   main; flips green when the NVIDIA#2701 fix lands. The fail() is deferred
   so Phase 4 restore + Phase 5 soak still execute and the artifact
   bundle stays comparable to historical runs. Runs in the existing
   `issue-2478-crash-loop-recovery-e2e` job in nightly-e2e.yaml — no
   workflow change needed.

2. New Vitest live E2E — test/e2e-scenario/live/gateway-guard-recovery.test.ts
   First slice migrated from the legacy 2478 harness. Asserts the same
   NVIDIA#2701 contract using typed framework fixtures and runs in the Vitest
   workflow only — the bash and Vitest suites stay completely separate
   and do not call each other. Wired into e2e-vitest-scenarios.yaml as
   a new `gateway-guard-recovery` job (free-standing, not part of the
   registry-driven matrix).

3. Framework helpers — sandbox-lifecycle disruption-and-recovery primitives:
   - GatewayClient.expectGuardChainActive — reads /tmp/nemoclaw-proxy-env.sh
     (not /proc/<pid>/environ — kernel.yama.ptrace_scope=1 blocks cross-tree
     environ reads) and asserts the expected --require markers are present.
   - GatewayClient.expectLogContains / expectLogDoesNotContain — tail and
     grep /tmp/gateway.log inside the sandbox.
   - GatewayClient.expectPidStable — sample gateway PID over a window,
     fail on change (crash-loop) or disappearance.
   - GatewayClient.resolveGatewayPid — typed wrapper around the legacy
     two-pass ps detection.
   - SandboxClient.wipeGuardChain — remove /tmp/nemoclaw-proxy-env.sh plus
     the seven guard preload files; equivalent to a fresh container's /tmp
     after pod recreate.
   - SandboxClient.killGatewayTree — pkill -9 -f '[o]penclaw' + verify.
   - 21 unit tests covering the new helpers in
     test/e2e-scenario/framework-tests/e2e-recovery-helpers.test.ts

Phases 1, 2, 3, and 5 of the legacy 2478 script (initial-state, normal
crash-recovery loop, idle soak) remain in bash pending a follow-up
migration PR per the migration doc's "preserve the requirement before
removing legacy runner pieces" policy. Migration inventory updated to
record the partial Vitest target.

Refs NVIDIA#2701 NVIDIA#2478 NVIDIA#4356 NVIDIA#4941

Signed-off-by: J. Yaunches <jyaunches@nvidia.com>
@jyaunches jyaunches force-pushed the fix/2701-failing-e2e-guard-pr-a branch from b5a86da to fd608fb Compare June 11, 2026 13:25
@jyaunches jyaunches merged commit d338508 into NVIDIA:main Jun 11, 2026
29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: e2e End-to-end tests, nightly failures, or validation infrastructure area: sandbox OpenShell sandbox lifecycle, runtime, config, or recovery feature PR adds or expands user-visible functionality platform: dgx-spark Affects DGX Spark hardware or workflows v0.0.64 Release target

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants