Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 18 additions & 8 deletions .github/workflows/regression-e2e.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -284,21 +284,31 @@ jobs:
uses: actions/setup-node@48b55a011bda9f5d6aeb4c2d9c7362e8dae4041e # v6
with:
node-version: "22"
cache: npm

- name: Run strict tool-call probe E2E test
- name: Install root dependencies
run: npm ci --ignore-scripts

- name: Run strict tool-call probe Vitest E2E test
env:
NEMOCLAW_TEST_NO_SLEEP: "1"
run: bash test/e2e/test-strict-tool-call-probe.sh
NEMOCLAW_RUN_E2E_SCENARIOS: "1"
E2E_ARTIFACT_DIR: ${{ github.workspace }}/e2e-artifacts/vitest/strict-tool-call-probe
run: |
set -euo pipefail
npx vitest run --project e2e-scenarios-live \
test/e2e-scenario/live/strict-tool-call-probe.test.ts \
--silent=false --reporter=default

- name: Upload strict tool-call probe logs on failure
if: failure()
- name: Upload strict tool-call probe artifacts
if: always()
uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7.0.1
with:
name: strict-tool-call-probe-logs
path: |
/tmp/nemoclaw-e2e-strict-tool-call-probe.log
/tmp/nemoclaw-e2e-strict-tool-call-probe-node.log
name: strict-tool-call-probe-artifacts
path: e2e-artifacts/vitest/strict-tool-call-probe/
include-hidden-files: false
if-no-files-found: ignore
retention-days: 14

# ── Gateway drift preflight E2E ─────────────────────────────
# Coverage guard for #3399 / #3423. A stale OpenShell gateway image can
Expand Down
4 changes: 2 additions & 2 deletions docs/about/release-notes.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ NemoClaw v0.0.61 improves sandbox network visibility, onboarding recovery, Herme
- Onboarding and rebuild paths recover more reliably across host and provider drift. ARM64 image-tar upload failures receive a clear classification with an image-reference workaround, rebuild detaches sandbox providers before delete, rebuilt resume snapshots keep session state, and messaging selector key sequences work during onboarding. For more information, refer to [NemoClaw CLI Commands Reference](../reference/commands).
- Local inference and Hermes setup cover more restart and configuration edge cases. Managed inference hostnames bypass host proxies, managed vLLM restarts after host reboot, DGX Station managed vLLM defaults to `Qwen/Qwen3.6-27B-FP8`, Hermes rejects dashboard port collisions during configuration, and Hermes recovery enforces the environment-secret boundary. For more information, refer to [Use a Local Inference Server](../inference/use-local-inference).
- Messaging setup gives clearer feedback and stores more deterministic state. Slack now notifies the sender when a channel `@mention` is denied, operator-supplied placeholder keys can be registered during onboarding, `messagingPlan` persists into resume state, and channel conflict detection now uses the manifest-plan architecture. For more information, refer to [Messaging Channels](../manage-sandboxes/messaging-channels).
- Release validation now uses real shell assertions in the e2e scenario runner, includes an opt-in live scenario project, shards CLI coverage, adds a docs-only PR fast path, and trims slow CLI subprocess coverage.
- Release validation now runs real shell-boundary assertions through Vitest E2E support, includes an opt-in live scenario project, shards CLI coverage, adds a docs-only PR fast path, and trims slow CLI subprocess coverage.

## v0.0.60

Expand Down Expand Up @@ -185,7 +185,7 @@ NemoClaw v0.0.48 improves onboarding, sandbox builds, local inference, messaging

NemoClaw v0.0.47 focused on release hardening and validation coverage:

- The scenario E2E framework gained baseline onboarding coverage for CLI setup, OpenShell gateway creation, sandbox state, inference routing, and smoke tests.
- The Vitest E2E fixture layer gained baseline onboarding coverage for CLI setup, OpenShell gateway creation, sandbox state, inference routing, and smoke tests.
- Messaging provider scenarios now validate provider attachment, placeholder configuration, secret-leak prevention, bridge reachability, Discord gateway routing, Slack provider state, Telegram injection safety, and token-rotation isolation.
- CLI command registration was refactored so public display defaults stay consistent across sandbox channel, host, log, policy, skill, and snapshot commands.
- PR review advisor automation was added for maintainers, with deterministic GitHub context gathering and structured review comments.
Expand Down
18 changes: 12 additions & 6 deletions test/e2e-scenario-advisor.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -33,18 +33,18 @@ function metadata(
};
}

describe("E2E scenario advisor — prompt construction", () => {
describe("Vitest E2E scenario advisor — prompt construction", () => {
it("user prompt embeds the metadata fields the advisor must echo back", () => {
const prompt = buildPrompt({
baseRef: "origin/main",
headRef: "HEAD",
changedFiles: ["test/e2e-scenario/framework/phases/onboarding.ts"],
changedFiles: ["test/e2e-scenario/fixtures/phases/onboarding.ts"],
diff: "+ echo ok",
});
// Caller of normalizeScenarioAdvisorResult re-injects metadata, but the
// prompt must still surface enough context for the model to reason.
expect(prompt).toContain("origin/main");
expect(prompt).toContain("test/e2e-scenario/framework/phases/onboarding.ts");
expect(prompt).toContain("test/e2e-scenario/fixtures/phases/onboarding.ts");
expect(prompt).toContain("+ echo ok");
});

Expand All @@ -59,6 +59,8 @@ describe("E2E scenario advisor — prompt construction", () => {
expect(systemPrompt).toContain(VITEST_SCENARIO_WORKFLOW);
expect(systemPrompt).toContain("trusted advisor checkout");
expect(systemPrompt).toContain("recommend the `e2e-scenarios-all` fan-out");
expect(systemPrompt).toContain("single NemoClaw E2E system");
expect(systemPrompt).not.toContain("non-scenario E2E");
expect(systemPrompt).not.toContain("e2e-scenarios-all.yaml");
expect(systemPrompt).not.toContain("e2e-scenarios.yaml");
});
Expand All @@ -71,7 +73,7 @@ describe("E2E scenario advisor — prompt construction", () => {
});
});

describe("E2E scenario advisor — normalization contract", () => {
describe("Vitest E2E scenario advisor — normalization contract", () => {
it("preserves valid recommendations and canonicalizes the dispatch command", () => {
const raw = {
version: 1,
Expand Down Expand Up @@ -364,7 +366,7 @@ describe("E2E scenario advisor — normalization contract", () => {
{ required: [], optional: [], confidence: "low" },
metadata({ changedFiles: ["docs/foo.md"] }),
);
expect(normalized.noScenarioE2eReason).toMatch(/no scenario E2E impact/i);
expect(normalized.noScenarioE2eReason).toMatch(/no Vitest E2E scenario impact/i);
});

it("rejects non-object advisor output", () => {
Expand All @@ -373,7 +375,7 @@ describe("E2E scenario advisor — normalization contract", () => {
});
});

describe("E2E scenario advisor — summary and comment rendering", () => {
describe("Vitest E2E scenario advisor — summary and comment rendering", () => {
function sampleResult(): ScenarioAdvisorResult {
return {
version: 1,
Expand All @@ -398,6 +400,8 @@ describe("E2E scenario advisor — summary and comment rendering", () => {

it("renders a summary that surfaces required scenarios with their dispatch line", () => {
const summary = renderScenarioSummary(sampleResult());
expect(summary).toContain("# Vitest E2E Scenario Advisor");
expect(summary).toContain("Required Vitest E2E scenarios");
expect(summary).toContain("e2e-scenarios-all");
expect(summary).toContain(
canonicalDispatchCommand(VITEST_SCENARIO_WORKFLOW, "e2e-scenarios-all"),
Expand All @@ -413,6 +417,8 @@ describe("E2E scenario advisor — summary and comment rendering", () => {
runUrl: "https://example.invalid/run",
});
expect(comment).toContain("<!-- nemoclaw-e2e-scenario-advisor -->");
expect(comment).toContain("## Vitest E2E Scenario Recommendation");
expect(comment).toContain("Dispatch required Vitest E2E scenarios");
expect(comment).toContain("https://example.invalid/run");
});
});
92 changes: 49 additions & 43 deletions test/e2e-scenario/docs/MIGRATION.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,18 @@
<!-- SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. -->
<!-- SPDX-License-Identifier: Apache-2.0 -->

# E2E Scenario Migration Notes
# NemoClaw E2E Migration Notes

This file describes how to move coverage into the Vitest scenario framework
without confusing that work with the retired typed-shell scenario runner.
Changing status, ownership, and per-test decisions belong in GitHub issues and
PRs.
This file describes how to move coverage into the single Vitest E2E system
without confusing that work with the retired typed-shell scenario runner or a
second bash-driven harness. Vitest is the harness, GitHub Actions is the matrix,
and NemoClaw fixtures may invoke real subprocess and system boundaries when
those boundaries are the contract.

Migration state is tracked outside the repository in GitHub issues and pull
requests.
Use GitHub issues and pull requests for status changes.
requests. Use GitHub issues and pull requests as the source of truth for status
changes, ownership, deletion evidence, and contract-preserving migration
decisions.

## Current State

Expand All @@ -19,73 +21,77 @@ The scenario runner cutover is complete:
- `e2e-vitest-scenarios.yaml` is the scenario workflow.
- `test/e2e-scenario/live/registry-scenarios.test.ts` is the registry-driven
live scenario entrypoint.
- `test/e2e-scenario/framework/` owns phase fixtures, clients, artifact
- `test/e2e-scenario/fixtures/` owns phase fixtures, clients, artifact
capture, redaction, cleanup, and shell-probe bridges.
- `test/e2e-scenario/scenarios/run.ts` only lists scenarios and emits the live
Vitest matrix.
- The typed-shell scenario runner, shell validation-suite tree, and retiring
scenario workflows are removed. See `RETIREMENT.md`.

Direct legacy E2E scripts under `test/e2e/test-*.sh` remain in place. Many are
expected to stay because they test shell/install/user-flow behavior or preserve
umbrella integration smoke value. #5098 tracks family-by-family migration,
augmentation, and eventual deletion decisions for those scripts.
Direct legacy E2E scripts under `test/e2e/test-*.sh` remain in place until they
are migrated by contract. Some currently test shell, install, platform, process,
or full user-flow behavior. Preserve those real boundaries by invoking them from
Vitest tests and fixtures instead of keeping a separate durable E2E runner.
Issue #5098 tracks family-by-family migration, augmentation, and eventual
deletion decisions for those scripts.

## Target Architecture

The durable scenario framework has one execution path:
The durable E2E system has one execution path:

- Vitest owns execution, filtering, reporters, timeouts, fixture lifecycle,
skip handling, and CI integration.
- NemoClaw fixtures own setup, onboarding, lifecycle mutations,
expected-state probes, assertion helpers, expected-failure evidence,
cleanup, artifacts, and secret redaction.
- `test/e2e-scenario/fixtures/` is fixture/support code, not a test harness
or runner.
- Typed scenario definitions and matrix helpers describe stable scenario IDs
and supported combinations without becoming a second runner.
- Product-facing manifests describe desired setup/onboarding state, not test
execution logic.
- Shell scripts remain only for direct legacy E2Es or narrow system-boundary
probes where shell is the contract or lowest-risk adapter.
- Shell and system-boundary behavior should be exercised from Vitest when it is
the contract or lowest-risk adapter.

## Deletion Inventory
## Migration Governance

`test/e2e-scenario/migration/legacy-inventory.json` is a machine-readable
deletion gate.
The former `test/e2e-scenario/migration/legacy-inventory.json` ledger and
generated legacy assertion inventories are removed because they duplicated live
GitHub issues and pull requests and quickly became stale sources of truth.

It must cover:
The useful deletion invariant is smaller:

- every direct legacy shell entrypoint under `test/e2e/test-*.sh`;
- explicitly retained bridge entrypoints such as `test/e2e/brev-e2e.test.ts`;
- retired internal scenario-runner surfaces removed by the cutover.
> A PR that deletes a legacy E2E script must show the replacement Vitest
> coverage or explain the retirement rationale.

Status values:
Record that evidence in the PR body, which is the machine-checkable boundary for
the deletion. Link or summarize that PR evidence from the issue when useful. For
each deleted script, include a `Legacy E2E deletion evidence` block with:

- `not-migrated`: legacy coverage has no equivalent typed scenario yet.
- `bridge-probe`: coverage is temporarily represented by a bridge path.
- `covered`: equivalent Vitest live scenario coverage exists.
- `retired`: maintainers agreed the legacy surface is no longer required.

Do not set `deletionReady: true` on a direct legacy script unless the record is
`covered` or `retired` and the approval issue records the deletion rationale.
The retired internal scenario-runner surfaces are already marked through #5098;
that does not imply direct legacy bash scripts are deletion-ready.
- `Script:` the deleted `test/e2e/test-*.sh` path.
- `Legacy contract:` the observable behavior the shell script protected.
- `Replacement Vitest coverage:` an existing `.test.ts` path, or `Retirement
rationale:` when the behavior is intentionally retired instead of replaced.
- `Intentionally retired behavior:` any assertions, probes, or workflow hooks
that are deliberately not preserved.
- `Fidelity verification:` the command, CI check, or review evidence proving the
Vitest coverage keeps the same contract value.

## Migration Pattern

When moving behavior from a legacy E2E script:

1. Identify the test family and policy from #5098: KEEP_BASH, HYBRID, or
MIGRATE_TYPED.
1. Identify the actual contract: CLI behavior, installer behavior, full user
journey, process boundary, platform boundary, or another observable behavior.
2. Add or update manifests only when product setup/onboarding state changes.
3. Add typed scenario registry coverage when the live matrix needs a stable
scenario ID.
4. Add fixture helpers before copying shell logic.
5. For HYBRID tests, keep the bash test and add a focused typed peer for the
contract being strengthened.
6. For MIGRATE_TYPED tests, prove parity first, then mark the inventory row
covered before any deletion PR.
7. Leave umbrella KEEP_BASH tests in place unless the tracking issue explicitly
revises their classification.
4. Add only the fixture or helper needed for the migration.
5. Preserve real boundaries. Use `bash`, login shells, `/proc`, process
signals, `sudo`, Docker host state, installer scripts, or full journey flows
from Vitest when they are the behavior being tested.
6. Prove equivalence in the PR, then delete the bash harness when the Vitest
test preserves the same value.

## Useful Commands

Expand All @@ -95,8 +101,8 @@ npx tsx test/e2e-scenario/scenarios/run.ts --list
npx tsx test/e2e-scenario/scenarios/run.ts --emit-live-matrix
npx tsx test/e2e-scenario/scenarios/run.ts --emit-live-matrix --scenarios ubuntu-repo-cloud-openclaw

# Framework tests
npx vitest run --project e2e-scenario-framework --silent=false --reporter=default
# Fixture/support tests
npx vitest run --project e2e-vitest-support --silent=false --reporter=default

# Opt-in live Vitest scenarios
npm run build:cli
Expand Down
Loading
Loading