test(e2e): failing-test-first guard for #2701 + recovery framework helpers#5049
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughThis PR implements gateway guard-chain recovery verification by adding sandbox-aware test client APIs, sandbox disruption helpers, comprehensive unit tests, a live E2E scenario, and CI workflow wiring to validate that gateway recovery restores the ChangesGateway Guard Recovery Feature
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related issues
Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
ff460c6 to
4b59d59
Compare
Selective E2E Results — ❌ Some jobs failedRun: 27227063905
|
bbe042d to
1c95cbd
Compare
Vitest E2E Scenario Results — ❌ Some jobs failedRun: 27235281178
|
1c95cbd to
9bcdab8
Compare
Vitest E2E Scenario Results — ❌ Some jobs failedRun: 27236542721
|
There was a problem hiding this comment.
🧹 Nitpick comments (1)
.github/workflows/e2e-vitest-scenarios.yaml (1)
240-244: ⚡ Quick winConsider explicit input validation as defense in depth.
Static analysis flagged potential template injection at lines 243-244 where
inputs.pr_numberandinputs.scenariosare expanded via${{ toJSON(...) }}. WhiletoJSON()should properly escape strings andinputs.scenariosis already validated via regex in thegenerate-matrixjob (lines 54-57), adding explicit validation in this script provides defense in depth and makes the security posture more obvious to future maintainers.🛡️ Optional: Add explicit validation
with: script: | const needs = ${{ toJSON(needs) }}; const runUrl = `${context.serverUrl}/${context.repo.owner}/${context.repo.repo}/actions/runs/${context.runId}`; const workflowBranch = context.ref.replace('refs/heads/', ''); const prNumberInput = ${{ toJSON(inputs.pr_number) }} || ''; const requestedScenarios = ${{ toJSON(inputs.scenarios) }} || ''; + + // Validate inputs (defense in depth) + if (prNumberInput && !/^\d+$/.test(prNumberInput)) { + core.setFailed(`Invalid pr_number input: ${prNumberInput}`); + return; + } + if (requestedScenarios && !/^[A-Za-z0-9_-]+(,[A-Za-z0-9_-]+)*$/.test(requestedScenarios)) { + core.setFailed(`Invalid scenarios input: ${requestedScenarios}`); + return; + } let prNumber = prNumberInput ? Number.parseInt(prNumberInput, 10) : undefined;🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/e2e-vitest-scenarios.yaml around lines 240 - 244, The template expands user inputs into prNumberInput and requestedScenarios without runtime checks; add explicit validation/sanitization immediately after those assignments: ensure prNumberInput contains only digits (or is empty) and coerce/clear it otherwise, and ensure requestedScenarios matches the allowed pattern (reuse the regex from generate-matrix or a conservative whitelist for scenario names, splitting and validating each item) falling back to an empty string or failing fast; update the block where prNumberInput and requestedScenarios are defined to perform these checks and log or handle invalid input accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In @.github/workflows/e2e-vitest-scenarios.yaml:
- Around line 240-244: The template expands user inputs into prNumberInput and
requestedScenarios without runtime checks; add explicit validation/sanitization
immediately after those assignments: ensure prNumberInput contains only digits
(or is empty) and coerce/clear it otherwise, and ensure requestedScenarios
matches the allowed pattern (reuse the regex from generate-matrix or a
conservative whitelist for scenario names, splitting and validating each item)
falling back to an empty string or failing fast; update the block where
prNumberInput and requestedScenarios are defined to perform these checks and log
or handle invalid input accordingly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 103670ef-451b-45cf-a084-34a82f83eb20
📒 Files selected for processing (12)
.github/workflows/e2e-vitest-scenarios.yamlsrc/commands/sandbox/agents/list.tstest/e2e-scenario/framework-tests/e2e-clients.test.tstest/e2e-scenario/framework-tests/e2e-phase-state-validation.test.tstest/e2e-scenario/framework-tests/e2e-recovery-helpers.test.tstest/e2e-scenario/framework/clients/gateway.tstest/e2e-scenario/framework/clients/sandbox.tstest/e2e-scenario/framework/e2e-test.tstest/e2e-scenario/live/gateway-guard-recovery.test.tstest/e2e-scenario/migration/legacy-inventory.jsontest/e2e/test-issue-2478-crash-loop-recovery.shvitest.config.ts
9bcdab8 to
671aa71
Compare
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.github/workflows/e2e-vitest-scenarios.yaml:
- Around line 243-259: prNumber currently uses Number.parseInt on prNumberInput
and silently falls back to finding the first open PR when parsing fails; change
the logic so a non-empty prNumberInput that is malformed causes the workflow to
fail instead of retrying: when prNumberInput is non-empty validate it with a
strict check (e.g., /^\d+$/ or compare parsed number to the trimmed input)
before assigning prNumber, and if validation fails call core.setFailed or throw
an error (do not call github.rest.pulls.list or use the workflowBranch
fallback); keep the existing branch-only lookup behavior only when prNumberInput
is empty.
In `@test/e2e-scenario/framework/clients/gateway.ts`:
- Around line 140-147: The probe options are built incorrectly: helper-specific
env from probeEnv() is overwritten by caller options because options is spread
after env; update the probe options construction in GatewayClient and
SandboxClient so non-env fields are spread first and env is merged explicitly
(env: { ...probeEnv(), ...options?.env }) before passing to this.sandbox.exec
(references: GatewayClient, SandboxClient, probeEnv, and the this.sandbox.exec
call building artifactName/gateway-guard-chain-...). Ensure you apply this same
merge pattern wherever helper probe options are assembled so OPENSHELL_GATEWAY
and other helper vars are preserved.
- Around line 221-223: The loop undercounts total wait time when durationSeconds
isn't a multiple of pollIntervalSeconds: change the sample calculation to use
Math.ceil instead of Math.floor (i.e., compute samples = Math.max(1,
Math.ceil(options.durationSeconds / pollIntervalSeconds))) so the for-loop that
calls sleepSeconds(pollIntervalSeconds) runs long enough to cover at least
durationSeconds; update any related comments/tests referencing samples if
present.
- Around line 241-255: tailLog currently ignores the sandbox.exec exit code so a
missing/unreadable GATEWAY_LOG_PATH yields an empty string and lets
expectLogDoesNotContain false-pass; update tailLog (the private async tailLog in
gateway.ts) to check the result from this.sandbox.exec (the call that runs
["sh","-c", `tail -n ${lines} ${GATEWAY_LOG_PATH} 2>/dev/null`]) and throw an
Error when result.exitCode is non-zero (include result.stderr or a descriptive
message) so callers like expectLogDoesNotContain fail when tail could not read
the file.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 36971dbc-e347-4f83-80c9-e1b2d0ca1b34
📒 Files selected for processing (11)
.github/workflows/e2e-vitest-scenarios.yamltest/e2e-scenario/framework-tests/e2e-clients.test.tstest/e2e-scenario/framework-tests/e2e-phase-state-validation.test.tstest/e2e-scenario/framework-tests/e2e-recovery-helpers.test.tstest/e2e-scenario/framework/clients/gateway.tstest/e2e-scenario/framework/clients/sandbox.tstest/e2e-scenario/framework/e2e-test.tstest/e2e-scenario/live/gateway-guard-recovery.test.tstest/e2e-scenario/migration/legacy-inventory.jsontest/e2e/test-issue-2478-crash-loop-recovery.shvitest.config.ts
🚧 Files skipped from review as they are similar to previous changes (7)
- test/e2e-scenario/live/gateway-guard-recovery.test.ts
- test/e2e-scenario/framework-tests/e2e-phase-state-validation.test.ts
- test/e2e-scenario/migration/legacy-inventory.json
- test/e2e/test-issue-2478-crash-loop-recovery.sh
- test/e2e-scenario/framework/e2e-test.ts
- test/e2e-scenario/framework-tests/e2e-recovery-helpers.test.ts
- test/e2e-scenario/framework-tests/e2e-clients.test.ts
Vitest E2E Scenario Results — ❌ Some jobs failedRun: 27244789650
|
Selective E2E Results — ❌ Some jobs failedRun: 27244790666
|
…-pin Migrates test/e2e/test-openshell-version-pin.sh (regression guard for #3474) to a free-standing Vitest live test under test/e2e-scenario/live/. The legacy script is a hermetic installer-script behavioral test: it runs scripts/install-openshell.sh under a stubbed PATH where the already-installed openshell reports a too-new version (0.0.45) and the downloaded archives produce a binary that reports the pinned 0.0.44. It does not exercise the registry-driven steady-state probe model, so it lives outside baseline.ts as a free-standing live test (per #5049). Four [PASS] assertions translated faithfully: - installer-exits-zero (result.status === 0) - download-log-contains-v0.0.44 - download-log-excludes-v0.0.45 - replaced-openshell-reports-0.0.44 CI: discrete openshell-version-pin-vitest job in e2e-vitest-scenarios.yaml runs on workflow_dispatch alongside the registry-driven matrix. Bash guard retained in regression-e2e.yaml; deletion is a follow-up PR with #4357 approval per the migration freeze policy. Refs #4355, #3474
Hybrid framing noteSince #5098 cites this PR as the HYBRID precedent, I want to make the intended meaning explicit. “Hybrid” should be a migration/convergence state, not a permanent architecture with two E2E systems. The useful lesson from this PR is not “some E2Es live forever in bash and some in Vitest.” The useful lesson is:
If the contract requires shell behavior, the Vitest E2E can and should execute that real shell behavior: process signals, So for follow-ups from this PR, the durable target should still be a Vitest E2E harness with boundary fixtures/helpers, not a long-term split suite. Related clarification is now on #4941, #5098, and #4999. |
…est live test (#5107) # Migration: `test-openshell-version-pin.sh` → free-standing Vitest live test **Refs**: #4355 (owner issue), #3474 (regression target), #4990 (E2E architecture contract) ## What landed in this PR A free-standing Vitest live test at `test/e2e-scenario/live/openshell-version-pin.test.ts` plus a discrete `openshell-version-pin-vitest` job in `.github/workflows/e2e-vitest-scenarios.yaml`, modeled on #5049's free-standing pattern. The legacy bash guard `test/e2e/test-openshell-version-pin.sh` remains in place; deletion is a follow-up PR with #4357 approval per the migration freeze policy. ## Architecture notes (course-correction during the loop) The first pass of this migration tried to land the test in `test/install-openshell-version-check.test.ts` under the existing `installer-integration` Vitest project. It even passed CI twice there (commit `a95edc02e`, runs 27251976992 and 27252048411). The argument was that this is a hermetic installer-script unit test, not a scenario, and sister tests already live in that file. That argument **lost** to an explicit project contract: `test/e2e-scenario/framework-tests/e2e-migration-inventory.test.ts` enforces that every entry in `targetVitestScenarios` for a `covered` legacy script must match `^test/e2e-scenario/live/.+\.test\.ts$`. The schema is the binding architectural decision; the project has already chosen `test/e2e-scenario/live/` as the canonical home for migrated tests, regardless of whether they exercise the scenario framework's primitives. The pivot adopts that decision. Commits in this PR walk that history: 1. `873aaa2ee` — mark inventory `bridge-probe` (initial in-flight state) 2. `a95edc02e` — first sketch under installer-integration (rejected by schema) 3. `4670cb556` — empty commit (verification dispatch trigger from the rejected approach) 4. `9215a4f1f` — revert (2) 5. `920876546` — final sketch: free-standing live test + discrete CI job 6. `3d9c01931` — inventory mark `covered` pointing at the new live test ## Shape - **Shape**: free-standing live test (not registry-driven) - **Family**: hermetic installer-script behavioral test - **Runner**: `ubuntu-latest` - **Registry-driven**: no — does not fit `from(scenario.environment, env) → from(scenario.expectedStateId, instance)` - **Free-standing job**: `openshell-version-pin-vitest` in `e2e-vitest-scenarios.yaml` ## Assertion chain (mapped from the bash script's four `[PASS]` checks) 1. `installer-exits-zero` — `result.status === 0` (happy path completes; no "above the maximum" hard-fail). 2. `download-log-contains-v0.0.44` — pinned release tag was requested. 3. `download-log-excludes-v0.0.45` — too-new sticky version is never re-fetched. 4. `replaced-openshell-reports-0.0.44` — binary on disk in active install dir overwritten with pinned build. ## Failing-test-first contract - **First-dispatch outcome predicted**: GREEN. The framework already covers the regression target (`scripts/install-openshell.sh:272-275` implements the v0.0.44 reinstall path; sister tests in `test/install-openshell-version-check.test.ts` already cover the entry-point invariants). This migration's residual coverage is the happy-path completion, exercised via stubbed PATH binaries. - **Skill 5.8 false-pass guard**: two consecutive GREEN dispatches required of the new `openshell-version-pin-vitest` job before convergence is recorded in the inventory entry's `frozenAtSha` / `convergenceEvidence` fields. - Local `vitest run --project e2e-scenarios-live test/e2e-scenario/live/openshell-version-pin.test.ts` passes; inventory schema-lock test passes. ## Suite separation honored - `regression-e2e.yaml` (bash) untouched. Legacy script remains in place; deletion is a follow-up PR with #4357 approval. - `e2e-vitest-scenarios.yaml` gains a discrete free-standing job; matrix path unchanged. - `vitest.config.ts:e2e-scenarios-live` project picks up the new test file via the `test/e2e-scenario/live/**/*.test.ts` glob with no project-list edit. ## Inventory delta ``` status: not-migrated → covered targetVitestScenarios: [] → ["test/e2e-scenario/live/openshell-version-pin.test.ts"] deletionReady: false (deletion is a follow-up PR; #4357 approval required) ``` ## Next steps after this PR merges - Drive two consecutive GREEN runs of `openshell-version-pin-vitest` (one is happening on this PR; the second is enforced post-merge by re-dispatch on `main`). - Follow-up PR with `#4357` approval to delete `test/e2e/test-openshell-version-pin.sh` from `regression-e2e.yaml` and the bash file itself, transitioning the inventory entry from `covered` → `retired`. ## Convergence - ✅ **First dispatch GREEN** (post-pivot): [openshell-version-pin-vitest job 80480862915](https://github.com/NVIDIA/NemoClaw/actions/runs/27252871283/job/80480862915) on commit `3d9c01931`. - ✅ **Verification dispatch GREEN**: [openshell-version-pin-vitest job 80481579156](https://github.com/NVIDIA/NemoClaw/actions/runs/27253022534/job/80481579156) on the same commit. Two consecutive GREENs rule out false-pass. - ✅ **Legacy regression-e2e baseline GREEN**: last 2 dispatches that ran `openshell-version-pin-e2e` succeeded (runs 26720769662, 26718460056). Bash guard untouched. - 🔒 **Frozen at**: `3d9c01931f598cf2450bd88028cb3b44bcf367b0`. Inventory entry now records `frozenAtSha` and `convergenceEvidence`. ### Assertions resolved (in order) 1. `installer-exits-zero` — `result.status === 0` (commit `920876546`) 2. `download-log-contains-v0.0.44` — `expect(downloads).toContain("v0.0.44")` (commit `920876546`) 3. `download-log-excludes-v0.0.45` — `expect(downloads).not.toContain("v0.0.45")` (commit `920876546`) 4. `replaced-openshell-reports-0.0.44` — re-run `$fakeBin/openshell --version`; assert `0.0.44` and not `0.0.45` (commit `920876546`) All four legacy `[PASS]` assertions translated faithfully. No deferred assertions. ### Note for reviewers First-dispatch passed (in both pre- and post-pivot homes). This is the duplicative case the migration skill anticipates: `install-openshell.sh:272-275` already lands the v0.0.44 reinstall path, and sister tests already covered the entry point. The legacy bash guard's value-add was the happy-path completion through download + extraction + binary replacement, which is now expressed as a Vitest live test. **Please sanity-check the assertion list above against the four `[PASS]` lines in `test/e2e/test-openshell-version-pin.sh`** — if equivalence is confirmed, the migration is complete and the bash guard can be retired in a follow-up PR with `#4357` approval. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Tests** * Added a live end-to-end Vitest to validate OpenShell version-pin behavior and prevent regressions, capturing installer output and download activity as artifacts. * **CI** * Added a dedicated CI job to run the new live test and upload related artifacts for troubleshooting. * **Chores** * Marked the legacy test as covered in the migration inventory and recorded convergence evidence and frozen revision metadata. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Carlos Villela <cvillela@nvidia.com>
…+ workflow job Ports the live block of test/openclaw-tui-chat-correlation.test.ts into the typed scenario framework as a free-standing live test (#5049 pattern), plus a discrete openclaw-tui-chat-correlation-vitest job in e2e-vitest-scenarios.yaml modeled on #5107's openshell-version-pin-vitest. Coverage: - Onboards a fresh cloud OpenClaw sandbox via OnboardingPhaseFixture.from with the cloud-openclaw profile (already in SUPPORTED_ONBOARDING). - Asserts the pinned 2026.5.27 OpenClaw version (host-side probe via openshell sandbox exec ... openclaw --version). Override via E2E_OPENCLAW_TUI_CORRELATION_PINNED_VERSION when bumping. - Idempotent in-sandbox gateway start on port 18789 via SandboxClient.execShell. - Uploads the websocket repro driver via SandboxClient.upload, runs it via execShell with the gateway auth token read from /sandbox/.openclaw/openclaw.json. - Asserts the #2603 + #3145 contract: no empty finals for submitted runs, no uncorrelated replies, no missing/duplicate replies, no missing/duplicate user turns. Preserves the looksLikeEventCaptureFailure retry-once guard (OpenClaw-side observability race; remove when the runtime exposes a deterministic chat readiness ack). Inventory: bridge-probe entry for test/e2e/test-openclaw-tui-chat-correlation.sh now references this file as its bridge probe (satisfies the inventory invariant that bridge-probe entries have non-empty bridgeProbes pointing at real paths). Bash workflow (nightly-e2e.yaml openclaw-tui-chat-correlation-e2e job) remains untouched per the migration freeze policy; deletion is a follow-up PR with #4357 approval after typed scenario soaks. Refs: #4347 #2603 #3145 #5098 #5049 #5107
Pass 2 of phase-5 convergence on dispatch 27284469157. Line moved forward (75ms → 127s) clearing all Phase 1 prereqs and the Phase 2 first onboard. New blocking assertion: sandbox-exists-after-interrupt (#9 in chain), failing again with `spawn openshell ENOENT` \u2014 same root cause class as pass 1, missed during the previous fix because the sandbox.exists() call passed no options so options.env was empty. Fix: thread `env: buildAvailabilityProbeEnv()` into the exists() call. The SandboxClient correctly does not auto-supply env (mirrors HostCliClient.expectNemoclawAvailable convention \u2014 callers stay explicit about the env boundary, helpers-not-bridges per #5049). Refs: #4348, #5098
…ork helpers Captures the NVIDIA#2701 contract — gateway recovery must restore the /tmp guard chain before launching, not just warn about it — in two formats so the bug shows up in CI immediately: 1. Legacy bash flip — test/e2e/test-issue-2478-crash-loop-recovery.sh Phase 4 keeps the existing NVIDIA#2478 WARNING assertion and now also asserts the guard chain is restored after recovery. Will fail on main; flips green when the NVIDIA#2701 fix lands. The fail() is deferred so Phase 4 restore + Phase 5 soak still execute and the artifact bundle stays comparable to historical runs. Runs in the existing `issue-2478-crash-loop-recovery-e2e` job in nightly-e2e.yaml — no workflow change needed. 2. New Vitest live E2E — test/e2e-scenario/live/gateway-guard-recovery.test.ts First slice migrated from the legacy 2478 harness. Asserts the same NVIDIA#2701 contract using typed framework fixtures and runs in the Vitest workflow only — the bash and Vitest suites stay completely separate and do not call each other. Wired into e2e-vitest-scenarios.yaml as a new `gateway-guard-recovery` job (free-standing, not part of the registry-driven matrix). 3. Framework helpers — sandbox-lifecycle disruption-and-recovery primitives: - GatewayClient.expectGuardChainActive — reads /tmp/nemoclaw-proxy-env.sh (not /proc/<pid>/environ — kernel.yama.ptrace_scope=1 blocks cross-tree environ reads) and asserts the expected --require markers are present. - GatewayClient.expectLogContains / expectLogDoesNotContain — tail and grep /tmp/gateway.log inside the sandbox. - GatewayClient.expectPidStable — sample gateway PID over a window, fail on change (crash-loop) or disappearance. - GatewayClient.resolveGatewayPid — typed wrapper around the legacy two-pass ps detection. - SandboxClient.wipeGuardChain — remove /tmp/nemoclaw-proxy-env.sh plus the seven guard preload files; equivalent to a fresh container's /tmp after pod recreate. - SandboxClient.killGatewayTree — pkill -9 -f '[o]penclaw' + verify. - 21 unit tests covering the new helpers in test/e2e-scenario/framework-tests/e2e-recovery-helpers.test.ts Phases 1, 2, 3, and 5 of the legacy 2478 script (initial-state, normal crash-recovery loop, idle soak) remain in bash pending a follow-up migration PR per the migration doc's "preserve the requirement before removing legacy runner pieces" policy. Migration inventory updated to record the partial Vitest target. Refs NVIDIA#2701 NVIDIA#2478 NVIDIA#4356 NVIDIA#4941 Signed-off-by: J. Yaunches <jyaunches@nvidia.com>
b5a86da to
fd608fb
Compare
Summary
Failing-test-first regression guard for #2701: gateway recovery currently warns about a missing
/tmpguard chain after a pod recreate but launches the gateway naked anyway, triggering the @homebridge/ciao crash loop documented in the issue (5-minuterebuild --yesis the only manual recovery on DGX Spark).This PR captures the #2701 contract in two formats so the bug shows up in CI before the fix lands. The bash and Vitest suites stay completely separate and do not call each other.
Legacy bash flip —
test/e2e/test-issue-2478-crash-loop-recovery.shPhase 4 keeps the existing [DGX Spark] Gateway crash loop on startup: @homebridge/ciao networkInterfaces() returns EPERM in OpenShell sandbox #2478 WARNING assertion and now also asserts the guard chain is restored after recovery. Will be red onmain; flips green when the [DGX Spark] Host reboot bricks sandbox until 5-minrebuild --yes:connectrecovery path warns about missing /tmp guards but launches gateway naked → @homebridge/ciao crash loop #2701 fix lands. The newfail()is deferred so Phase 4 restore + Phase 5 soak still execute and the artifact bundle stays comparable to historical runs. Runs in the existingissue-2478-crash-loop-recovery-e2ejob innightly-e2e.yaml— no workflow change.New Vitest live E2E —
test/e2e-scenario/live/gateway-guard-recovery.test.ts. First slice migrated from the legacy 2478 harness. Asserts the same [DGX Spark] Host reboot bricks sandbox until 5-minrebuild --yes:connectrecovery path warns about missing /tmp guards but launches gateway naked → @homebridge/ciao crash loop #2701 contract using typed framework fixtures and runs in the Vitest workflow only (e2e-vitest-scenarios.yaml) as a new free-standinggateway-guard-recoveryjob. Not part of the registry-driven matrix because recovery scenarios don't fit the steady-state expected-state probe model.Framework helpers — sandbox-lifecycle disruption-and-recovery primitives, reusable for any future recovery scenario:
GatewayClient.expectGuardChainActive— reads/tmp/nemoclaw-proxy-env.sh(not/proc/<pid>/environ;kernel.yama.ptrace_scope=1blocks cross-tree environ reads — same approach as the legacygateway_guards_activeshell helper) and asserts the expected--requiremarkers are present.GatewayClient.expectLogContains/expectLogDoesNotContain— tail and grep/tmp/gateway.loginside the sandbox.GatewayClient.expectPidStable— sample gateway PID over a window, fail on change (crash-loop) or disappearance.GatewayClient.resolveGatewayPid— typed wrapper around the legacy two-passpsdetection.SandboxClient.wipeGuardChain— remove/tmp/nemoclaw-proxy-env.shplus the seven guard preload files; equivalent shape to a fresh container's/tmpafter pod recreate.SandboxClient.killGatewayTree—pkill -9 -f '[o]penclaw'+ verify nothing came back.test/e2e-scenario/framework-tests/e2e-recovery-helpers.test.tscovering the new helpers (mocked runner, no live deps).Phases 1, 2, 3, and 5 of the legacy 2478 script (initial-state, normal crash-recovery loop, idle soak) remain in bash pending a follow-up migration PR per the migration doc's "preserve the requirement before removing legacy runner pieces" policy. Migration inventory updated to record the partial Vitest target.
Suite separation
nightly-e2e.yaml— bash E2E only. Untouched by this PR. Theissue-2478-crash-loop-recovery-e2ejob already runs the legacy script; the new Phase 4 assertion lights up automatically.e2e-vitest-scenarios.yaml— Vitest E2E only. Gains the newgateway-guard-recoveryjob.workflow_dispatchonly today, matching the existing Vitest workflow's trigger model.The two suites do not call each other and do not share workflow definitions. PR-C may revisit the Vitest workflow's trigger model (e.g. add a schedule) but that's out of scope for this PR.
Related Issue
Refs #2701 #2478 #4356 #4941
Sequencing
issue-2478-crash-loop-recovery-e2e(Phase 4 new assertion, runs in nightly) and the new Vitestgateway-guard-recoveryjob (runs onworkflow_dispatchofe2e-vitest-scenarios.yaml).rebuild --yes:connectrecovery path warns about missing /tmp guards but launches gateway naked → @homebridge/ciao crash loop #2701 fix. Implementsinstall-preloads.sh+ aguard-recovery.tsmodule + wires re-emit intobuildOpenClawRecoveryScript. Both assertions flip green.Changes
.github/workflows/e2e-vitest-scenarios.yaml— newgateway-guard-recoveryfree-standing job (not in the matrix).test/e2e-scenario/framework/clients/gateway.ts— addedSandboxClientconstructor dep + 5 new helpers.test/e2e-scenario/framework/clients/sandbox.ts— addedwipeGuardChainandkillGatewayTreedisruption helpers.test/e2e-scenario/framework/e2e-test.ts— wiresandboxfixture intogatewayfixture (chain stays acyclic).test/e2e-scenario/framework-tests/e2e-clients.test.tsande2e-phase-state-validation.test.ts— updated existing test constructors to the newGatewayClient(host, sandbox)signature.test/e2e-scenario/framework-tests/e2e-recovery-helpers.test.ts(new) — 21 unit tests for the new helpers.test/e2e-scenario/live/gateway-guard-recovery.test.ts(new) — failing live E2E.test/e2e-scenario/migration/legacy-inventory.json— record the Vitest target fortest/e2e/test-issue-2478-crash-loop-recovery.sh.test/e2e/test-issue-2478-crash-loop-recovery.sh— Phase 4 [DGX Spark] Host reboot bricks sandbox until 5-minrebuild --yes:connectrecovery path warns about missing /tmp guards but launches gateway naked → @homebridge/ciao crash loop #2701 contract assertion (deferred fail).vitest.config.ts— excludetest/e2e-scenario/live/**from thecliproject so pre-commitTest (cli)doesn't accidentally pick up live tests that need Docker. Pre-existing issue surfaced by this PR.nightly-e2e.yamlis intentionally untouched.Type of Change
Verification
npx vitest run --project e2e-scenario-framework— 320 passed (baseline 299 + 21 new).npx prek run --all-files— pre-existing flake inruntime-hermes-secret-boundary-behavioural.test.tsand slow tests innemoclaw-start.test.tstime out under prek's parallel load (unrelated to this PR; same behavior onmain).The new Vitest live E2E and the bash Phase 4 assertion are expected to fail on
main-equivalent CI on this PR — that is the regression-guard signal. They will flip to green when the #2701 fix lands in PR-B.Signed-off-by: J. Yaunches jyaunches@nvidia.com
🤖 Generated with Claude Code
Summary by CodeRabbit
Tests
Chores