test(e2e): migrate provider-switch and compatible endpoint scenarios by cv · Pull Request #5058 · NVIDIA/NemoClaw

cv · 2026-06-09T18:08:53Z

Summary

Migrates the compatible-endpoint half of the provider/endpoint inference slice from placeholder planning into the typed E2E scenario framework.

Related Issue

Refs #4941
Refs #4990
Refs #4349
Stacked on #5057 via base branch codex/e2e-fanout-06-cloud-inference-routing-scenarios.

Changes

Adds fixture-owned OpenAI-compatible onboarding for OpenClaw using a local mock endpoint reachable from the sandbox via host.openshell.internal.
Requires bearer auth on the compatible mock's /v1/models and /v1/chat/completions routes, uses a per-run mock bearer token, and caps request bodies so the live scenario proves credential propagation instead of accepting unauthenticated traffic.
Marks the compatible-endpoint manifest as an explicit open policy scenario, matching the host-reachable mock endpoint boundary.
Adds a compatible-specific expected-state ID so the scenario metadata records the compatible provider, mock model, open policy tier, and COMPATIBLE_API_KEY credential ref instead of reusing generic NVIDIA cloud state.
Wires openai-compatible-inference into RuntimePhaseFixture, the assertion registry, live scenario support, and the live matrix so ubuntu-repo-openai-compatible-openclaw can run without external compatible-endpoint secrets.
Declares the double-provider-switch scenario against the inference-switch runtime suite, while keeping it unsupported and not-migrated in legacy inventory until a real lifecycle fixture performs provider registration and inference set.
Updates legacy inventory notes so the remaining shell-script coverage is explicit instead of pretending this slice is deletion-ready.
Updates typed sandbox exec calls to the current OpenShell sandbox exec --name <sandbox> -- <cmd> form and keeps live scenario tests out of the default CLI project.

Type of Change

Code change (feature, bug fix, or refactor)
Code change with doc updates
Doc only (prose changes, no code sample modifications)
Doc only (includes code sample changes)

Verification

Signed-off-by: Carlos Villela cvillela@nvidia.com

Signed-off-by: Carlos Villela <cvillela@nvidia.com>

copy-pr-bot · 2026-06-09T18:08:57Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

coderabbitai · 2026-06-09T18:09:01Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 545c4d2a-3561-4cd2-93b2-f33477de5b4a

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch codex/e2e-fanout-07-provider-switch-compatible-endpoints

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-06-09T18:09:41Z

E2E Advisor Recommendation

Required E2E: ubuntu-repo-openai-compatible-openclaw, ubuntu-repo-cloud-openclaw
Optional E2E: ubuntu-repo-docker-post-reboot-recovery, test/e2e/test-inference-routing.sh

Dispatch hint: ubuntu-repo-openai-compatible-openclaw,ubuntu-repo-cloud-openclaw

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/codex/e2e-fanout-06-cloud-inference-routing-scenarios
Head: HEAD
Confidence: high

Required E2E

ubuntu-repo-openai-compatible-openclaw (medium): This PR newly wires the OpenAI-compatible live Vitest scenario, onboarding fixture mock endpoint, expected state, manifest credential refs, and openai-compatible-inference runtime suite. Run the scenario to prove the real CLI can onboard with the custom provider and route inference.local calls through the sandbox.
ubuntu-repo-cloud-openclaw (high): The canonical cloud OpenClaw scenario exercises the changed SandboxClient openshell sandbox exec --name command path and the existing inference/inference-routing suites that this PR modifies adjacent to the new compatible-provider suite. It guards against regressions to the existing NVIDIA cloud route while adding the new route.

Optional E2E

ubuntu-repo-docker-post-reboot-recovery (high): Useful confidence check because the default live Vitest matrix expectations were updated and this supported lifecycle scenario remains in that matrix. It is adjacent to state-validation/sandbox lifecycle behavior but not directly changed by the OpenAI-compatible onboarding path.
test/e2e/test-inference-routing.sh (high): Legacy inference-routing script remains broader than the migrated Vitest coverage per the updated legacy inventory. Run only if maintainers want extra migration confidence around credential isolation, negative classification, cleanup, provider-route, and compatible-endpoint cases not yet represented by live Vitest.

New E2E recommendations

provider-switch-lifecycle (high): The PR wires an inference-switch runtime suite and declares ubuntu-repo-cloud-openclaw-double-provider-switch, but live Vitest still skips it because lifecycle double-provider-switch is not wired. Existing coverage remains in test/e2e/test-openclaw-inference-switch.sh.
- Suggested test: Add a live Vitest lifecycle fixture for double-provider-switch so ubuntu-repo-cloud-openclaw-double-provider-switch can execute the inference-switch suite instead of being skipped.
compatible-endpoint-negative-and-security-coverage (medium): The new compatible endpoint path validates happy-path onboarding and inference, but the legacy inventory still calls out missing credential isolation, negative classification, cleanup, and all compatible-endpoint cases.
- Suggested test: Add live Vitest scenarios for invalid compatible endpoint credential/model and credential-redaction assertions around the fixture-owned compatible provider.

Dispatch hint

Workflow: .github/workflows/e2e-vitest-scenarios.yaml
jobs input: ubuntu-repo-openai-compatible-openclaw,ubuntu-repo-cloud-openclaw

github-actions · 2026-06-09T18:09:42Z

E2E Scenario Advisor Recommendation

Required scenario E2E: e2e-scenarios-all
Optional scenario E2E: None

Dispatch required scenario E2E:

gh workflow run e2e-scenarios-all.yaml --ref <pr-head-ref>

Workflow run

Full scenario advisor summary

E2E Scenario Advisor

Base: origin/codex/e2e-fanout-06-cloud-inference-routing-scenarios
Head: HEAD
Confidence: high

Required scenario E2E

e2e-scenarios-all: Changes touch scenario catalog/typed registry metadata, expected-state definitions, runtime support classification, manifests, and shared onboarding/runtime fixture code; these can affect scenario selection and behavior across the matrix, so the full scenario fan-out is required.
- Dispatch: gh workflow run e2e-scenarios-all.yaml --ref <pr-head-ref>

Optional scenario E2E

None.

Relevant changed files

test/e2e-scenario/framework-tests/e2e-clients.test.ts
test/e2e-scenario/framework-tests/e2e-expected-state.test.ts
test/e2e-scenario/framework-tests/e2e-live-project-config.test.ts
test/e2e-scenario/framework-tests/e2e-live-registry-discovery.test.ts
test/e2e-scenario/framework-tests/e2e-phase-onboarding.test.ts
test/e2e-scenario/framework-tests/e2e-phase-runtime.test.ts
test/e2e-scenario/framework-tests/e2e-phase-state-validation.test.ts
test/e2e-scenario/framework-tests/e2e-scenario-matrix.test.ts
test/e2e-scenario/framework/clients/sandbox.ts
test/e2e-scenario/framework/phases/onboarding.ts
test/e2e-scenario/framework/phases/runtime.ts
test/e2e-scenario/manifests/openclaw-openai-compatible.yaml
test/e2e-scenario/migration/legacy-inventory.json
test/e2e-scenario/scenarios/assertions/registry.ts
test/e2e-scenario/scenarios/expected-states.ts
test/e2e-scenario/scenarios/runtime-support.ts
test/e2e-scenario/scenarios/scenarios/baseline.ts
test/e2e-scenario/scenarios/types.ts
vitest.config.ts

github-actions · 2026-06-09T18:12:19Z

PR Review Advisor

Findings: 0 needs attention, 2 worth checking, 0 nice ideas
Since last review: 1 prior item resolved, 1 still applies, 0 new items found

Review findings

🛠️ Needs attention

None.

🔎 Worth checking

Source-of-truth review needed: OpenAI-compatible endpoint mock binding for host.openshell.internal reachability: The advisor marked localized patch analysis as needs_followup.
- Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
- Evidence: Comment above server.listen(0, "0.0.0.0", ...) describes Docker-backed OpenShell reachability and the removal condition; protected /v1 routes require the per-run bearer token.
Fixture-owned compatible endpoint still binds on all host interfaces (test/e2e-scenario/framework/phases/onboarding.ts:213): The OpenAI-compatible test mock still listens on 0.0.0.0 so Docker-backed OpenShell sandboxes can reach it through host.openshell.internal. The source-boundary and removal-condition comments now address the prior source-of-truth gap, and the sensitive /v1 routes require a per-run bearer token with tests for wrong auth, wrong model, and body-size rejection. However, the TCP listener remains exposed beyond the host-to-sandbox path during the live test window, /health is unauthenticated, and there is still no regression proof that binding to 127.0.0.1 or a specific host-gateway address is impossible.
- Recommendation: Bind the mock to the narrowest sandbox-reachable address if possible, or add a host-side allowlist/firewall/auth guard around the fixture. If OpenShell requires the all-interface bind, keep the source-boundary comment and add a reachability regression test documenting why a narrower bind fails.
- Evidence: startCompatibleEndpointMock() returns http://host.openshell.internal:<port>/v1 and calls server.listen(0, "0.0.0.0", ...). Protected /v1/models and /v1/chat/completions compare Authorization against the per-run token, while GET /health returns 200 without auth.

🌱 Nice ideas

None.

Consider writing more tests for

**Runtime validation** — OpenAI-compatible live runtime evidence verifies persisted registry/session state includes provider compatible-endpoint, model mock-compatible-model, policyTier open, and COMPATIBLE_API_KEY credential ref after onboarding. Fixture/unit coverage is strong for command construction, bearer auth, model checks, body caps, suite dispatch, and live-matrix discovery. Additional live/runtime evidence would make the compatible scenario fail on persisted state, policy, credential propagation, and listener reachability drift rather than only on HTTP response shape.
**Runtime validation** — OpenAI-compatible live scenario fails when NemoClaw forwards an incorrect bearer token to the fixture mock. Fixture/unit coverage is strong for command construction, bearer auth, model checks, body caps, suite dispatch, and live-matrix discovery. Additional live/runtime evidence would make the compatible scenario fail on persisted state, policy, credential propagation, and listener reachability drift rather than only on HTTP response shape.
**Runtime validation** — Compatible endpoint fixture reachability test documents whether binding the mock to 127.0.0.1 or a specific host-gateway address is unreachable from inside OpenShell before retaining 0.0.0.0. Fixture/unit coverage is strong for command construction, bearer auth, model checks, body caps, suite dispatch, and live-matrix discovery. Additional live/runtime evidence would make the compatible scenario fail on persisted state, policy, credential propagation, and listener reachability drift rather than only on HTTP response shape.
**Runtime validation** — Compatible endpoint mock returns HTTP 400 for malformed JSON without crashing or leaking the request body. Fixture/unit coverage is strong for command construction, bearer auth, model checks, body caps, suite dispatch, and live-matrix discovery. Additional live/runtime evidence would make the compatible scenario fail on persisted state, policy, credential propagation, and listener reachability drift rather than only on HTTP response shape.
**Runtime validation** — openai-compatible-inference fails fast when called for a compatible endpoint instance with no model instead of silently falling back to default. Fixture/unit coverage is strong for command construction, bearer auth, model checks, body caps, suite dispatch, and live-matrix discovery. Additional live/runtime evidence would make the compatible scenario fail on persisted state, policy, credential propagation, and listener reachability drift rather than only on HTTP response shape.
**OpenAI-compatible endpoint mock binding for host.openshell.internal reachability** — Partial. Unit coverage asserts the host.openshell.internal endpoint shape and verifies auth/model/body-cap behavior by rewriting to 127.0.0.1; the live scenario would exercise sandbox reachability. No test proves a 127.0.0.1 or host-gateway-specific bind fails from inside OpenShell.. Comment above server.listen(0, "0.0.0.0", ...) describes Docker-backed OpenShell reachability and the removal condition; protected /v1 routes require the per-run bearer token.

Since last review details

Current findings:

Source-of-truth review needed: OpenAI-compatible endpoint mock binding for host.openshell.internal reachability: The advisor marked localized patch analysis as needs_followup.
- Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
- Evidence: Comment above server.listen(0, "0.0.0.0", ...) describes Docker-backed OpenShell reachability and the removal condition; protected /v1 routes require the per-run bearer token.
Fixture-owned compatible endpoint still binds on all host interfaces (test/e2e-scenario/framework/phases/onboarding.ts:213): The OpenAI-compatible test mock still listens on 0.0.0.0 so Docker-backed OpenShell sandboxes can reach it through host.openshell.internal. The source-boundary and removal-condition comments now address the prior source-of-truth gap, and the sensitive /v1 routes require a per-run bearer token with tests for wrong auth, wrong model, and body-size rejection. However, the TCP listener remains exposed beyond the host-to-sandbox path during the live test window, /health is unauthenticated, and there is still no regression proof that binding to 127.0.0.1 or a specific host-gateway address is impossible.
- Recommendation: Bind the mock to the narrowest sandbox-reachable address if possible, or add a host-side allowlist/firewall/auth guard around the fixture. If OpenShell requires the all-interface bind, keep the source-boundary comment and add a reachability regression test documenting why a narrower bind fails.
- Evidence: startCompatibleEndpointMock() returns http://host.openshell.internal:<port>/v1 and calls server.listen(0, "0.0.0.0", ...). Protected /v1/models and /v1/chat/completions compare Authorization against the per-run token, while GET /health returns 200 without auth.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

Signed-off-by: Carlos Villela <cvillela@nvidia.com>

copy-pr-bot · 2026-06-10T01:19:18Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Signed-off-by: Carlos Villela <cvillela@nvidia.com>

cv · 2026-06-10T07:53:00Z

Cross-linking the post-#5106 migration path from #5098 before this stacked inference PR is rebased/salvaged: #5098 (comment)

The useful inference work here should move forward only if it fits the one-E2E-system model: Vitest is the harness, GitHub Actions is the matrix, and NemoClaw fixtures/helpers wrap real subprocess/system boundaries. Shell/system behavior is fine inside tests where it is the real boundary, but we should not preserve or rebuild a parallel E2E runner.

Also: please avoid treating legacy-inventory.json as the migration roadmap. The PR and linked issue should carry replacement/retirement evidence; any repo-level check should stay lightweight and deletion-focused.

cv · 2026-06-10T08:34:39Z

Closing this stacked PR in its current form because it is based on the stale #5057 branch and predates the #5106 cutover.

The compatible endpoint mock/runtime work is still valuable. Please reopen it as a clean draft PR from current main, targeting the one Vitest E2E system, without legacy-inventory.json roadmap changes, and with the host bind boundary either narrowed or justified by a regression test.

chore(e2e): scaffold fan-out draft 07

5742ca9

Signed-off-by: Carlos Villela <cvillela@nvidia.com>

cv self-assigned this Jun 9, 2026

cv added area: e2e End-to-end tests, nightly failures, or validation infrastructure area: inference Inference routing, serving, model selection, or outputs chore Build, CI, dependency, or tooling maintenance labels Jun 9, 2026

cv added 2 commits June 9, 2026 17:28

chore(e2e): sync provider migration branch

6f93376

test(e2e): migrate compatible inference scenarios

087a16e

Signed-off-by: Carlos Villela <cvillela@nvidia.com>

cv marked this pull request as ready for review June 10, 2026 01:20

cv added 5 commits June 9, 2026 18:37

test(e2e): harden compatible endpoint mock

761b8f3

Signed-off-by: Carlos Villela <cvillela@nvidia.com>

test(e2e): align compatible scenario sources

ffe4d9d

Signed-off-by: Carlos Villela <cvillela@nvidia.com>

test(e2e): tighten compatible endpoint scenario state

787ce5b

Signed-off-by: Carlos Villela <cvillela@nvidia.com>

test(e2e): validate compatible endpoint model routing

9aee8b1

Signed-off-by: Carlos Villela <cvillela@nvidia.com>

test(e2e): document compatible endpoint bind boundary

f4c21fe

Signed-off-by: Carlos Villela <cvillela@nvidia.com>

cv mentioned this pull request Jun 10, 2026

Epic: Migrate legacy bash E2E into the Vitest E2E system #5098

Open

79 tasks

cv closed this Jun 10, 2026

Conversation

cv commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Related Issue

Changes

Type of Change

Verification

Uh oh!

copy-pr-bot Bot commented Jun 9, 2026

Uh oh!

coderabbitai Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

github-actions Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E Advisor Recommendation

E2E Recommendation Advisor

Required E2E

Optional E2E

New E2E recommendations

Dispatch hint

Uh oh!

github-actions Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E Scenario Advisor Recommendation

E2E Scenario Advisor

Required scenario E2E

Optional scenario E2E

Relevant changed files

Uh oh!

github-actions Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review Advisor

🛠️ Needs attention

🔎 Worth checking

🌱 Nice ideas

Uh oh!

copy-pr-bot Bot commented Jun 10, 2026

Uh oh!

cv commented Jun 10, 2026

Uh oh!

cv commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cv commented Jun 9, 2026 •

edited

Loading

coderabbitai Bot commented Jun 9, 2026 •

edited

Loading

github-actions Bot commented Jun 9, 2026 •

edited

Loading

github-actions Bot commented Jun 9, 2026 •

edited

Loading

github-actions Bot commented Jun 9, 2026 •

edited

Loading