Skip to content

test(e2e): migrate provider-switch and compatible endpoint scenarios#5058

Closed
cv wants to merge 8 commits into
codex/e2e-fanout-06-cloud-inference-routing-scenariosfrom
codex/e2e-fanout-07-provider-switch-compatible-endpoints
Closed

test(e2e): migrate provider-switch and compatible endpoint scenarios#5058
cv wants to merge 8 commits into
codex/e2e-fanout-06-cloud-inference-routing-scenariosfrom
codex/e2e-fanout-07-provider-switch-compatible-endpoints

Conversation

@cv

@cv cv commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator

Summary

Migrates the compatible-endpoint half of the provider/endpoint inference slice from placeholder planning into the typed E2E scenario framework.

Related Issue

Refs #4941
Refs #4990
Refs #4349
Stacked on #5057 via base branch codex/e2e-fanout-06-cloud-inference-routing-scenarios.

Changes

  • Adds fixture-owned OpenAI-compatible onboarding for OpenClaw using a local mock endpoint reachable from the sandbox via host.openshell.internal.
  • Requires bearer auth on the compatible mock's /v1/models and /v1/chat/completions routes, uses a per-run mock bearer token, and caps request bodies so the live scenario proves credential propagation instead of accepting unauthenticated traffic.
  • Marks the compatible-endpoint manifest as an explicit open policy scenario, matching the host-reachable mock endpoint boundary.
  • Adds a compatible-specific expected-state ID so the scenario metadata records the compatible provider, mock model, open policy tier, and COMPATIBLE_API_KEY credential ref instead of reusing generic NVIDIA cloud state.
  • Wires openai-compatible-inference into RuntimePhaseFixture, the assertion registry, live scenario support, and the live matrix so ubuntu-repo-openai-compatible-openclaw can run without external compatible-endpoint secrets.
  • Declares the double-provider-switch scenario against the inference-switch runtime suite, while keeping it unsupported and not-migrated in legacy inventory until a real lifecycle fixture performs provider registration and inference set.
  • Updates legacy inventory notes so the remaining shell-script coverage is explicit instead of pretending this slice is deletion-ready.
  • Updates typed sandbox exec calls to the current OpenShell sandbox exec --name <sandbox> -- <cmd> form and keeps live scenario tests out of the default CLI project.

Type of Change

  • Code change (feature, bug fix, or refactor)
  • Code change with doc updates
  • Doc only (prose changes, no code sample modifications)
  • Doc only (includes code sample changes)

Verification

  • npx vitest run --project e2e-scenario-framework --silent=false --reporter=default
  • npx vitest run --project e2e-scenario-framework test/e2e-scenario/framework-tests/e2e-phase-onboarding.test.ts --silent=false --reporter=default
  • npx vitest run --project e2e-scenario-framework test/e2e-scenario/framework-tests/e2e-phase-onboarding.test.ts test/e2e-scenario/framework-tests/e2e-expected-state.test.ts test/e2e-scenario/framework-tests/e2e-scenario-matrix.test.ts test/e2e-scenario/framework-tests/e2e-live-registry-discovery.test.ts --silent=false --reporter=default
  • npx vitest run --project cli --passWithNoTests test/e2e-scenario/live/registry-scenarios.test.ts --silent=false --reporter=default
  • npm run build:cli
  • npm run typecheck:cli
  • git diff --check
  • NEMOCLAW_RUN_E2E_SCENARIOS=1 NEMOCLAW_CLI_BIN=/home/cvillela/src/github.com/nvidia/NemoClaw/bin/nemoclaw.js npx vitest run --project e2e-scenarios-live test/e2e-scenario/live/registry-scenarios.test.ts -t '^ubuntu-repo-openai-compatible-openclaw$' --silent=false --reporter=default
  • HOME=$(mktemp -d) npx vitest run --project cli test/release-latest-tag.test.ts --silent=false --reporter=default
  • npx prek run --all-files passes locally; it failed only in test/release-latest-tag.test.ts because this workstation's global git signing config points at an unavailable signing key (No private key found for public key "/home/cvillela/.ssh/git-signing-key.pub"). The same release-tag test passes with a clean temporary HOME.
  • Tests added or updated for new or changed behavior
  • No secrets, API keys, or credentials committed
  • Docs updated for user-facing behavior changes
  • npm run docs builds without warnings (doc changes only)
  • Doc pages follow the style guide (doc changes only)
  • New doc pages include SPDX header and frontmatter (new pages only)

Signed-off-by: Carlos Villela cvillela@nvidia.com

Signed-off-by: Carlos Villela <cvillela@nvidia.com>
@cv cv self-assigned this Jun 9, 2026
@copy-pr-bot

copy-pr-bot Bot commented Jun 9, 2026

Copy link
Copy Markdown

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@cv cv added area: e2e End-to-end tests, nightly failures, or validation infrastructure area: inference Inference routing, serving, model selection, or outputs chore Build, CI, dependency, or tooling maintenance labels Jun 9, 2026
@coderabbitai

coderabbitai Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 545c4d2a-3561-4cd2-93b2-f33477de5b4a

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex/e2e-fanout-07-provider-switch-compatible-endpoints

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

E2E Advisor Recommendation

Required E2E: ubuntu-repo-openai-compatible-openclaw, ubuntu-repo-cloud-openclaw
Optional E2E: ubuntu-repo-docker-post-reboot-recovery, test/e2e/test-inference-routing.sh

Dispatch hint: ubuntu-repo-openai-compatible-openclaw,ubuntu-repo-cloud-openclaw

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/codex/e2e-fanout-06-cloud-inference-routing-scenarios
Head: HEAD
Confidence: high

Required E2E

  • ubuntu-repo-openai-compatible-openclaw (medium): This PR newly wires the OpenAI-compatible live Vitest scenario, onboarding fixture mock endpoint, expected state, manifest credential refs, and openai-compatible-inference runtime suite. Run the scenario to prove the real CLI can onboard with the custom provider and route inference.local calls through the sandbox.
  • ubuntu-repo-cloud-openclaw (high): The canonical cloud OpenClaw scenario exercises the changed SandboxClient openshell sandbox exec --name command path and the existing inference/inference-routing suites that this PR modifies adjacent to the new compatible-provider suite. It guards against regressions to the existing NVIDIA cloud route while adding the new route.

Optional E2E

  • ubuntu-repo-docker-post-reboot-recovery (high): Useful confidence check because the default live Vitest matrix expectations were updated and this supported lifecycle scenario remains in that matrix. It is adjacent to state-validation/sandbox lifecycle behavior but not directly changed by the OpenAI-compatible onboarding path.
  • test/e2e/test-inference-routing.sh (high): Legacy inference-routing script remains broader than the migrated Vitest coverage per the updated legacy inventory. Run only if maintainers want extra migration confidence around credential isolation, negative classification, cleanup, provider-route, and compatible-endpoint cases not yet represented by live Vitest.

New E2E recommendations

  • provider-switch-lifecycle (high): The PR wires an inference-switch runtime suite and declares ubuntu-repo-cloud-openclaw-double-provider-switch, but live Vitest still skips it because lifecycle double-provider-switch is not wired. Existing coverage remains in test/e2e/test-openclaw-inference-switch.sh.
    • Suggested test: Add a live Vitest lifecycle fixture for double-provider-switch so ubuntu-repo-cloud-openclaw-double-provider-switch can execute the inference-switch suite instead of being skipped.
  • compatible-endpoint-negative-and-security-coverage (medium): The new compatible endpoint path validates happy-path onboarding and inference, but the legacy inventory still calls out missing credential isolation, negative classification, cleanup, and all compatible-endpoint cases.
    • Suggested test: Add live Vitest scenarios for invalid compatible endpoint credential/model and credential-redaction assertions around the fixture-owned compatible provider.

Dispatch hint

  • Workflow: .github/workflows/e2e-vitest-scenarios.yaml
  • jobs input: ubuntu-repo-openai-compatible-openclaw,ubuntu-repo-cloud-openclaw

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

E2E Scenario Advisor Recommendation

Required scenario E2E: e2e-scenarios-all
Optional scenario E2E: None

Dispatch required scenario E2E:

  • gh workflow run e2e-scenarios-all.yaml --ref <pr-head-ref>

Workflow run

Full scenario advisor summary

E2E Scenario Advisor

Base: origin/codex/e2e-fanout-06-cloud-inference-routing-scenarios
Head: HEAD
Confidence: high

Required scenario E2E

  • e2e-scenarios-all: Changes touch scenario catalog/typed registry metadata, expected-state definitions, runtime support classification, manifests, and shared onboarding/runtime fixture code; these can affect scenario selection and behavior across the matrix, so the full scenario fan-out is required.
    • Dispatch: gh workflow run e2e-scenarios-all.yaml --ref <pr-head-ref>

Optional scenario E2E

  • None.

Relevant changed files

  • test/e2e-scenario/framework-tests/e2e-clients.test.ts
  • test/e2e-scenario/framework-tests/e2e-expected-state.test.ts
  • test/e2e-scenario/framework-tests/e2e-live-project-config.test.ts
  • test/e2e-scenario/framework-tests/e2e-live-registry-discovery.test.ts
  • test/e2e-scenario/framework-tests/e2e-phase-onboarding.test.ts
  • test/e2e-scenario/framework-tests/e2e-phase-runtime.test.ts
  • test/e2e-scenario/framework-tests/e2e-phase-state-validation.test.ts
  • test/e2e-scenario/framework-tests/e2e-scenario-matrix.test.ts
  • test/e2e-scenario/framework/clients/sandbox.ts
  • test/e2e-scenario/framework/phases/onboarding.ts
  • test/e2e-scenario/framework/phases/runtime.ts
  • test/e2e-scenario/manifests/openclaw-openai-compatible.yaml
  • test/e2e-scenario/migration/legacy-inventory.json
  • test/e2e-scenario/scenarios/assertions/registry.ts
  • test/e2e-scenario/scenarios/expected-states.ts
  • test/e2e-scenario/scenarios/runtime-support.ts
  • test/e2e-scenario/scenarios/scenarios/baseline.ts
  • test/e2e-scenario/scenarios/types.ts
  • vitest.config.ts

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

PR Review Advisor

Findings: 0 needs attention, 2 worth checking, 0 nice ideas
Since last review: 1 prior item resolved, 1 still applies, 0 new items found

Review findings

🛠️ Needs attention

  • None.

🔎 Worth checking

  • Source-of-truth review needed: OpenAI-compatible endpoint mock binding for host.openshell.internal reachability: The advisor marked localized patch analysis as needs_followup.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: Comment above server.listen(0, "0.0.0.0", ...) describes Docker-backed OpenShell reachability and the removal condition; protected /v1 routes require the per-run bearer token.
  • Fixture-owned compatible endpoint still binds on all host interfaces (test/e2e-scenario/framework/phases/onboarding.ts:213): The OpenAI-compatible test mock still listens on 0.0.0.0 so Docker-backed OpenShell sandboxes can reach it through host.openshell.internal. The source-boundary and removal-condition comments now address the prior source-of-truth gap, and the sensitive /v1 routes require a per-run bearer token with tests for wrong auth, wrong model, and body-size rejection. However, the TCP listener remains exposed beyond the host-to-sandbox path during the live test window, /health is unauthenticated, and there is still no regression proof that binding to 127.0.0.1 or a specific host-gateway address is impossible.
    • Recommendation: Bind the mock to the narrowest sandbox-reachable address if possible, or add a host-side allowlist/firewall/auth guard around the fixture. If OpenShell requires the all-interface bind, keep the source-boundary comment and add a reachability regression test documenting why a narrower bind fails.
    • Evidence: startCompatibleEndpointMock() returns http://host.openshell.internal:&lt;port&gt;/v1 and calls server.listen(0, "0.0.0.0", ...). Protected /v1/models and /v1/chat/completions compare Authorization against the per-run token, while GET /health returns 200 without auth.

🌱 Nice ideas

  • None.
Consider writing more tests for
  • **Runtime validation** — OpenAI-compatible live runtime evidence verifies persisted registry/session state includes provider compatible-endpoint, model mock-compatible-model, policyTier open, and COMPATIBLE_API_KEY credential ref after onboarding. Fixture/unit coverage is strong for command construction, bearer auth, model checks, body caps, suite dispatch, and live-matrix discovery. Additional live/runtime evidence would make the compatible scenario fail on persisted state, policy, credential propagation, and listener reachability drift rather than only on HTTP response shape.
  • **Runtime validation** — OpenAI-compatible live scenario fails when NemoClaw forwards an incorrect bearer token to the fixture mock. Fixture/unit coverage is strong for command construction, bearer auth, model checks, body caps, suite dispatch, and live-matrix discovery. Additional live/runtime evidence would make the compatible scenario fail on persisted state, policy, credential propagation, and listener reachability drift rather than only on HTTP response shape.
  • **Runtime validation** — Compatible endpoint fixture reachability test documents whether binding the mock to 127.0.0.1 or a specific host-gateway address is unreachable from inside OpenShell before retaining 0.0.0.0. Fixture/unit coverage is strong for command construction, bearer auth, model checks, body caps, suite dispatch, and live-matrix discovery. Additional live/runtime evidence would make the compatible scenario fail on persisted state, policy, credential propagation, and listener reachability drift rather than only on HTTP response shape.
  • **Runtime validation** — Compatible endpoint mock returns HTTP 400 for malformed JSON without crashing or leaking the request body. Fixture/unit coverage is strong for command construction, bearer auth, model checks, body caps, suite dispatch, and live-matrix discovery. Additional live/runtime evidence would make the compatible scenario fail on persisted state, policy, credential propagation, and listener reachability drift rather than only on HTTP response shape.
  • **Runtime validation** — openai-compatible-inference fails fast when called for a compatible endpoint instance with no model instead of silently falling back to default. Fixture/unit coverage is strong for command construction, bearer auth, model checks, body caps, suite dispatch, and live-matrix discovery. Additional live/runtime evidence would make the compatible scenario fail on persisted state, policy, credential propagation, and listener reachability drift rather than only on HTTP response shape.
  • **OpenAI-compatible endpoint mock binding for host.openshell.internal reachability** — Partial. Unit coverage asserts the host.openshell.internal endpoint shape and verifies auth/model/body-cap behavior by rewriting to 127.0.0.1; the live scenario would exercise sandbox reachability. No test proves a 127.0.0.1 or host-gateway-specific bind fails from inside OpenShell.. Comment above server.listen(0, "0.0.0.0", ...) describes Docker-backed OpenShell reachability and the removal condition; protected /v1 routes require the per-run bearer token.
Since last review details

Current findings:

  • Source-of-truth review needed: OpenAI-compatible endpoint mock binding for host.openshell.internal reachability: The advisor marked localized patch analysis as needs_followup.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: Comment above server.listen(0, "0.0.0.0", ...) describes Docker-backed OpenShell reachability and the removal condition; protected /v1 routes require the per-run bearer token.
  • Fixture-owned compatible endpoint still binds on all host interfaces (test/e2e-scenario/framework/phases/onboarding.ts:213): The OpenAI-compatible test mock still listens on 0.0.0.0 so Docker-backed OpenShell sandboxes can reach it through host.openshell.internal. The source-boundary and removal-condition comments now address the prior source-of-truth gap, and the sensitive /v1 routes require a per-run bearer token with tests for wrong auth, wrong model, and body-size rejection. However, the TCP listener remains exposed beyond the host-to-sandbox path during the live test window, /health is unauthenticated, and there is still no regression proof that binding to 127.0.0.1 or a specific host-gateway address is impossible.
    • Recommendation: Bind the mock to the narrowest sandbox-reachable address if possible, or add a host-side allowlist/firewall/auth guard around the fixture. If OpenShell requires the all-interface bind, keep the source-boundary comment and add a reachability regression test documenting why a narrower bind fails.
    • Evidence: startCompatibleEndpointMock() returns http://host.openshell.internal:&lt;port&gt;/v1 and calls server.listen(0, "0.0.0.0", ...). Protected /v1/models and /v1/chat/completions compare Authorization against the per-run token, while GET /health returns 200 without auth.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

@copy-pr-bot

copy-pr-bot Bot commented Jun 10, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@cv cv marked this pull request as ready for review June 10, 2026 01:20
cv added 5 commits June 9, 2026 18:37
Signed-off-by: Carlos Villela <cvillela@nvidia.com>
Signed-off-by: Carlos Villela <cvillela@nvidia.com>
Signed-off-by: Carlos Villela <cvillela@nvidia.com>
Signed-off-by: Carlos Villela <cvillela@nvidia.com>
Signed-off-by: Carlos Villela <cvillela@nvidia.com>
@cv

cv commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator Author

Cross-linking the post-#5106 migration path from #5098 before this stacked inference PR is rebased/salvaged: #5098 (comment)

The useful inference work here should move forward only if it fits the one-E2E-system model: Vitest is the harness, GitHub Actions is the matrix, and NemoClaw fixtures/helpers wrap real subprocess/system boundaries. Shell/system behavior is fine inside tests where it is the real boundary, but we should not preserve or rebuild a parallel E2E runner.

Also: please avoid treating legacy-inventory.json as the migration roadmap. The PR and linked issue should carry replacement/retirement evidence; any repo-level check should stay lightweight and deletion-focused.

@cv

cv commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator Author

Closing this stacked PR in its current form because it is based on the stale #5057 branch and predates the #5106 cutover.

The compatible endpoint mock/runtime work is still valuable. Please reopen it as a clean draft PR from current main, targeting the one Vitest E2E system, without legacy-inventory.json roadmap changes, and with the host bind boundary either narrowed or justified by a regression test.

@cv cv closed this Jun 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: e2e End-to-end tests, nightly failures, or validation infrastructure area: inference Inference routing, serving, model selection, or outputs chore Build, CI, dependency, or tooling maintenance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants