Skip to content

test(e2e): migrate network policy to Vitest [ANCHOR-6]#5226

Merged
jyaunches merged 26 commits into
mainfrom
e2e-migrate/test-network-policy-simple
Jun 11, 2026
Merged

test(e2e): migrate network policy to Vitest [ANCHOR-6]#5226
jyaunches merged 26 commits into
mainfrom
e2e-migrate/test-network-policy-simple

Conversation

@jyaunches

@jyaunches jyaunches commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Summary

Migrate test/e2e/test-network-policy.sh into focused live Vitest coverage for the security/policy anchor in #5098.

Related Issue

Refs #5098
Refs #4014

Changes

  • Adds test/e2e-scenario/live/network-policy.test.ts as a free-standing live Vitest test for the legacy network-policy contracts.
  • Covers restricted onboard, deny-by-default egress, preset allow/deny behavior, live policy-add, dry-run no-op behavior, per-binary Jira policy enforcement, /proc hot-reload, inference.local exemption, SSRF private-IP checks, host-gateway web_fetch allow/deny, and permissive mode.
  • Adds a selective network-policy-vitest job to e2e-vitest-scenarios.yaml so the new test can run via workflow_dispatch with scenarios=network-policy without invoking the registry scenario matrix.
  • Keeps legacy shell deletion and nightly shell wiring retirement deferred to Epic: Migrate legacy bash E2E into the Vitest E2E system #5098 Phase 11.

Contract mapping

  • Legacy assertion: TC-NET-01 deny-by-default egress blocks unapproved example.com.
    • Replacement: network-policy.test.ts probes from inside the sandbox with Node fetch.
    • Boundary preserved: real restricted OpenClaw sandbox + OpenShell policy proxy.
  • Legacy assertion: TC-NET-02/03/04/08/11 presets mutate live policy and enforce read-only/per-binary rules.
    • Replacement: network-policy.test.ts applies brew, pypi, slack, jira, npm and asserts sandbox curl/Node behavior.
    • Boundary preserved: real nemoclaw <sandbox> policy-add, openshell policy update, sandbox egress.
  • Legacy assertion: TC-NET-05 hot reload does not restart the sandbox.
    • Replacement: compares /proc/1/stat start time before/after policy change.
    • Boundary preserved: real sandbox /proc probe.
  • Legacy assertion: TC-NET-07 inference.local succeeds while direct provider egress is blocked.
    • Replacement: sandbox curl to inference.local plus direct provider fetch denial.
    • Boundary preserved: real cloud inference proxy boundary.
  • Legacy assertion: TC-NET-09 SSRF rejects dangerous private IPs.
    • Replacement: direct Vitest assertions against the blueprint private-network classifier.
    • Boundary preserved: product SSRF CIDR source.
  • Legacy assertion: TC-NET-10 OpenClaw web_fetch can reach approved host.openshell.internal target and cannot reach unapproved host-gateway ports.
    • Replacement: local host marker servers + custom policy + in-sandbox OpenClaw tool execution.
    • Boundary preserved: real host gateway, OpenShell policy, and OpenClaw web_fetch runtime.
  • Legacy assertion: TC-NET-06 permissive mode opens egress.
    • Replacement: applies permissive policy via openshell policy set and asserts npm ping succeeds.
    • Boundary preserved: real OpenShell policy set and sandbox npm egress.

Simplicity check

  • Test shape: simple live Vitest test.
  • New shared helpers: none; one-off helpers are local to the test.
  • New framework/registry/ledger: none.
  • Workflow changes: adds one free-standing selective Vitest job; legacy shell deletion/nightly shell retirement deferred to Epic: Migrate legacy bash E2E into the Vitest E2E system #5098 Phase 11.

Type of Change

  • Code change (feature, bug fix, or refactor)
  • Code change with doc updates
  • Doc only (prose changes, no code sample modifications)
  • Doc only (includes code sample changes)

Verification

  • npx prek run --all-files passes
  • npm test passes
  • Tests added or updated for new or changed behavior
  • No secrets, API keys, or credentials committed
  • Docs updated for user-facing behavior changes
  • npm run docs builds without warnings (doc changes only)
  • Doc pages follow the style guide (doc changes only)
  • New doc pages include SPDX header and frontmatter (new pages only)

Focused validation run locally:

  • npm run build:cli
  • npm run typecheck:cli
  • npx vitest run --project e2e-vitest-support test/e2e-scenario/support-tests/e2e-scenarios-workflow.test.ts --silent=false --reporter=default
  • NEMOCLAW_RUN_E2E_SCENARIOS=1 npx vitest run --project e2e-scenarios-live test/e2e-scenario/live/network-policy.test.ts --silent=false --reporter=default (skips without NVIDIA_API_KEY)
  • npm run test-size:check
  • git diff --check

Signed-off-by: Julie Yaunches jyaunches@nvidia.com

Summary by CodeRabbit

  • Tests

    • Added a comprehensive live E2E scenario for network-policy: policy enforcement checks, dry-run/approval flows, SSRF/IP validation, package-manager and host-gateway behavior, and per-scenario result reporting.
    • Extended workflow boundary validation to require non-empty registry scenario matrix before running live scenario matrix jobs.
  • Chores

    • CI now gates live-scenarios when no registry scenarios exist, adds a dedicated network-policy Vitest job with artifact uploads and conditional container auth, and updates job validation and reporting dependencies.

@coderabbitai

coderabbitai Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a network-policy live Vitest E2E scenario and integrates it into CI: splits free-standing vs registry-driven scenario selection when generating the test matrix, gates the registry-driven job on a non-empty matrix, introduces the network-policy-vitest free-standing job, and implements comprehensive live test logic and helpers.

Changes

Network-Policy Live E2E Test Suite

Layer / File(s) Summary
Workflow matrix generation and free-standing job routing
.github/workflows/e2e-vitest-scenarios.yaml, tools/e2e-scenarios/workflow-boundary.mts
generate-matrix parses and splits scenario inputs into free-standing vs registry-driven ids, forces an empty matrix when only free-standing ids are requested, live-scenarios is gated to skip when matrix is [] and inputs.jobs is empty, validate-jobs allows network-policy-vitest, and report-to-pr now needs network-policy-vitest.
Test module constants and core execution helpers
test/e2e-scenario/live/network-policy.test.ts (lines 1–107)
Module wiring: CLI/dist constants, sandbox naming, shouldRunLiveE2EScenarios gate, timeouts, transient-provider detection, runNemoclaw/sandbox helpers, and non-interactive preset apply with settle.
Interactive preset selection and HTTP probing helpers
test/e2e-scenario/live/network-policy.test.ts (lines 108–165)
Interactive preset automation (parse and auto-confirm preset choice) and in-sandbox HTTP probe helpers (fetchStatus, curlStatus).
Host-gateway marker server and policy YAML writer
test/e2e-scenario/live/network-policy.test.ts (lines 166–223)
Start/stop ephemeral marker HTTP servers with port validation and generate host-gateway policy YAML allowing only approved endpoint and constraining IP ranges/enforced binaries.
web_fetch probe script generator
test/e2e-scenario/live/network-policy.test.ts (lines 225–356)
Create base64-encoded Node probe script that loads OpenClaw tools, runs web_fetch against approved/denied URLs, summarizes outcomes, and classifies SSRF/private-IP behavior.
Main live E2E test execution and validation
test/e2e-scenario/live/network-policy.test.ts (lines 358–764)
Full scenario flow: metadata, prerequisites, sandbox lifecycle with retries, sequential policy probes (deny-by-default, brew, PyPI GET/POST, Slack preset, Jira dry-run+approval, npm checks, inference access, direct-provider denial), SSRF/isPrivateIp assertions, host-gateway marker verification plus in-sandbox probe execution, permissive-policy re-check, and write scenario-result.json.
Network-policy-vitest free-standing job
.github/workflows/e2e-vitest-scenarios.yaml (lines 327–406)
New free-standing job that runs the network-policy test when selected (or when selectors empty), conditionally logs in to Docker Hub with up to 3 retries and anonymous fallback, builds CLI, installs OpenShell, runs the test with secrets.NVIDIA_API_KEY, and uploads artifacts to e2e-artifacts/vitest/network-policy/.

Sequence Diagram(s)

sequenceDiagram
  participant User as inputs.scenarios / inputs.jobs
  participant Validate as validate-jobs
  participant Matrix as generate-matrix
  participant Runner as live-scenarios (registry-driven)
  participant FreeJob as network-policy-vitest
  participant Report as report-to-pr

  User->>Validate: submit selectors
  Validate->>Matrix: validate & split scenarios
  Matrix->>Runner: registry-driven matrix (if not '[]')
  User->>FreeJob: request free-standing job (network-policy-vitest)
  FreeJob->>Report: artifacts & completion
  Runner->>Report: registry-driven completion
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • NVIDIA/NemoClaw#5243: Updates .github/workflows/e2e-vitest-scenarios.yaml and tools/e2e-scenarios/workflow-boundary.mts for shared inputs.jobs selector/validate-jobs gating and report-to-pr wiring.

Suggested labels

area: e2e, area: ci

Suggested reviewers

  • cv
  • prekshivyas

Poem

🐰 A network-policy tale unfolds today,
Where sandboxes are probed in a lively way,
Matrix splits free-standing from registry-run,
Docker retries, probes, and markers — all done,
A rabbit cheers as CI routes tests to play.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'test(e2e): migrate network policy to Vitest [ANCHOR-6]' directly and clearly summarizes the main change: migrating the network-policy test from legacy shell to Vitest framework.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch e2e-migrate/test-network-policy-simple

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

github-actions Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

E2E Advisor Recommendation

Required E2E: network-policy-vitest
Optional E2E: network-policy-e2e, openshell-version-pin-vitest

Dispatch hint: network-policy-vitest

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

  • network-policy-vitest (high; timeout 90 minutes; uses Docker/OpenShell and NVIDIA_API_KEY): This PR adds and wires the live Vitest network-policy job itself. It must run to validate the new workflow routing, OpenShell install step, restricted sandbox onboarding, and live allow/deny policy probes.

Optional E2E

  • network-policy-e2e (high; live sandbox/network-policy shell E2E): Useful migration-parity check against the retained legacy shell network-policy lane while the new Vitest job is being introduced.
  • openshell-version-pin-vitest (low; timeout 15 minutes; hermetic installer-script behavior): Optional low-cost smoke of the updated free-standing scenario selector path that now maps scenario ids to discrete Vitest jobs.

New E2E recommendations

  • None.

Dispatch hint

  • Workflow: .github/workflows/e2e-vitest-scenarios.yaml
  • jobs input: network-policy-vitest

@github-actions

github-actions Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Vitest E2E Scenario Recommendation

Required Vitest E2E scenarios: e2e-scenarios-all
Optional Vitest E2E scenarios: None

Dispatch required Vitest E2E scenarios:

  • gh workflow run e2e-vitest-scenarios.yaml --ref <pr-head-ref>

Workflow run

Full Vitest E2E advisor summary

Vitest E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required Vitest E2E scenarios

  • e2e-scenarios-all: The Vitest scenario workflow itself changed, including dispatch selector validation, matrix behavior, free-standing scenario wiring, result reporting, and addition of the network-policy live Vitest job. Per policy, workflow and matrix/dispatch-surface changes require the full Vitest scenario fan-out.
    • Dispatch: gh workflow run e2e-vitest-scenarios.yaml --ref <pr-head-ref>

Optional Vitest E2E scenarios

  • None.

Relevant changed files

  • .github/workflows/e2e-vitest-scenarios.yaml
  • test/e2e-scenario/live/network-policy-transient-provider.ts
  • test/e2e-scenario/live/network-policy.test.ts
  • test/e2e-scenario/support-tests/e2e-scenarios-workflow.test.ts
  • test/e2e-scenario/support-tests/network-policy-transient-provider.test.ts
  • tools/e2e-scenarios/workflow-boundary.mts

@github-actions

github-actions Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

PR Review Advisor

Findings: 0 needs attention, 3 worth checking, 0 nice ideas
Since last review: 1 prior item resolved, 2 still apply, 0 new items found

Review findings

🛠️ Needs attention

  • None.

🔎 Worth checking

  • Source-of-truth review needed: test/e2e-scenario/live/network-policy.test.ts transient provider validation skip: The advisor marked localized patch analysis as needs_followup.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: network-policy.test.ts writes transient-provider-validation.skip.json and calls skip(...) when isTransientProviderValidationFailure(onboard) remains true in GitHub Actions after ONBOARD_ATTEMPTS.
  • Transient provider skip can suppress security-critical network-policy coverage (test/e2e-scenario/live/network-policy.test.ts:468): After onboarding retries are exhausted in GitHub Actions, a classified transient NVIDIA Endpoints validation failure writes a skip artifact and skips the entire network-policy live scenario. The classifier is narrow, documented, and unit-tested, but the runtime result is still loss of coverage for deny-by-default egress, policy-add and dry-run behavior, per-binary enforcement, inference-local/direct-provider behavior, SSRF checks, host-gateway web_fetch policy, and permissive mode.
    • Recommendation: Keep the skip highly visible and consider narrowing it so local policy assertions still run where possible, replacing provider validation with a hermetic fixture, or tracking removal once endpoint validation is stable for a release cycle.
    • Evidence: network-policy.test.ts calls skip(...) in GitHub Actions when isTransientProviderValidationFailure(onboard) remains true after ONBOARD_ATTEMPTS and writes transient-provider-validation.skip.json. network-policy-transient-provider.test.ts covers positive and negative classifier cases, but the skip still covers the full scenario.
  • Slack live policy assertion is stricter than the retained legacy contract (test/e2e-scenario/live/network-policy.test.ts:553): The Vitest migration requires slack.com to return STATUS_403 before policy-add and exactly STATUS_200 after policy-add. The retained legacy shell test instead skipped TC-NET-03 if Slack was already reachable with STATUS_[23] before policy-add, and accepted any STATUS_[2-4] response after policy-add as evidence that the endpoint was no longer proxy-blocked. The narrower Vitest contract can fail policy-correct behavior such as an ERROR_ proxy denial before approval, a redirect, or a Slack client/auth response after approval.
    • Recommendation: Either align the Vitest assertions with the retained legacy success/blocking ranges or document and test why the narrower STATUS_403-before and STATUS_200-after contract is intentionally required.
    • Evidence: network-policy.test.ts expects slackBefore to match /STATUS_403/ and slackAfter to match /STATUS_200/. test/e2e/test-network-policy.sh skips if the before probe matches STATUS_[23][0-9][0-9] and accepts after STATUS_[2-4][0-9][0-9].

🌱 Nice ideas

  • None.
Consider writing more tests for
  • **Runtime validation** — workflow_dispatch with scenarios=network-policy runs network-policy-vitest and leaves live-scenarios skipped because the generated matrix is []. The deterministic workflow-boundary tests and direct live Vitest scenario provide good code-level coverage, but this PR changes workflow dispatch behavior, a secret-bearing live sandbox job, OpenShell installation, and provider-dependent network-policy validation. Those boundaries benefit from targeted runtime validation.
  • **Runtime validation** — workflow_dispatch with scenarios=network-policy,ubuntu-repo-cloud-openclaw runs network-policy-vitest and a live-scenarios matrix containing only ubuntu-repo-cloud-openclaw. The deterministic workflow-boundary tests and direct live Vitest scenario provide good code-level coverage, but this PR changes workflow dispatch behavior, a secret-bearing live sandbox job, OpenShell installation, and provider-dependent network-policy validation. Those boundaries benefit from targeted runtime validation.
  • **Runtime validation** — workflow_dispatch with scenarios=network-policy,../escape fails before secret-bearing jobs and no workflow step or PR comment echoes the raw rejected selector. The deterministic workflow-boundary tests and direct live Vitest scenario provide good code-level coverage, but this PR changes workflow dispatch behavior, a secret-bearing live sandbox job, OpenShell installation, and provider-dependent network-policy validation. Those boundaries benefit from targeted runtime validation.
  • **Runtime validation** — network-policy-vitest OpenShell install runs with DOCKER_CONFIG, DOCKERHUB_USERNAME, DOCKERHUB_TOKEN, NVIDIA_API_KEY, and GITHUB_TOKEN absent from the installer environment. The deterministic workflow-boundary tests and direct live Vitest scenario provide good code-level coverage, but this PR changes workflow dispatch behavior, a secret-bearing live sandbox job, OpenShell installation, and provider-dependent network-policy validation. Those boundaries benefit from targeted runtime validation.
  • **Runtime validation** — raw OpenShell sandbox exec in network-policy.test.ts succeeds on a runner with no active OpenShell gateway because OPENSHELL_GATEWAY=nemoclaw is set. The deterministic workflow-boundary tests and direct live Vitest scenario provide good code-level coverage, but this PR changes workflow dispatch behavior, a secret-bearing live sandbox job, OpenShell installation, and provider-dependent network-policy validation. Those boundaries benefit from targeted runtime validation.
  • **Acceptance clause:** No deterministic linked issue acceptance clauses were provided. — add test evidence or identify existing coverage. Trusted context has linkedIssues: []. PR-body references to Epic: Migrate legacy bash E2E into the Vitest E2E system #5098/[NemoClaw][All Platforms][Policy&Network] pypi preset does not allow expected GET access to pypi.org/files.pythonhosted.org without OpenShell approval #4014 and contract mapping were treated as untrusted orientation, not literal acceptance clauses.
  • **test/e2e-scenario/live/network-policy.test.ts transient provider validation skip** — test/e2e-scenario/support-tests/network-policy-transient-provider.test.ts covers positive transient endpoint validation failures and negative cases for invalid credentials, policy failures, Docker unavailable, and non-validation curl timeouts.. network-policy.test.ts writes transient-provider-validation.skip.json and calls skip(...) when isTransientProviderValidationFailure(onboard) remains true in GitHub Actions after ONBOARD_ATTEMPTS.
Since last review details

Current findings:

  • Source-of-truth review needed: test/e2e-scenario/live/network-policy.test.ts transient provider validation skip: The advisor marked localized patch analysis as needs_followup.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: network-policy.test.ts writes transient-provider-validation.skip.json and calls skip(...) when isTransientProviderValidationFailure(onboard) remains true in GitHub Actions after ONBOARD_ATTEMPTS.
  • Transient provider skip can suppress security-critical network-policy coverage (test/e2e-scenario/live/network-policy.test.ts:468): After onboarding retries are exhausted in GitHub Actions, a classified transient NVIDIA Endpoints validation failure writes a skip artifact and skips the entire network-policy live scenario. The classifier is narrow, documented, and unit-tested, but the runtime result is still loss of coverage for deny-by-default egress, policy-add and dry-run behavior, per-binary enforcement, inference-local/direct-provider behavior, SSRF checks, host-gateway web_fetch policy, and permissive mode.
    • Recommendation: Keep the skip highly visible and consider narrowing it so local policy assertions still run where possible, replacing provider validation with a hermetic fixture, or tracking removal once endpoint validation is stable for a release cycle.
    • Evidence: network-policy.test.ts calls skip(...) in GitHub Actions when isTransientProviderValidationFailure(onboard) remains true after ONBOARD_ATTEMPTS and writes transient-provider-validation.skip.json. network-policy-transient-provider.test.ts covers positive and negative classifier cases, but the skip still covers the full scenario.
  • Slack live policy assertion is stricter than the retained legacy contract (test/e2e-scenario/live/network-policy.test.ts:553): The Vitest migration requires slack.com to return STATUS_403 before policy-add and exactly STATUS_200 after policy-add. The retained legacy shell test instead skipped TC-NET-03 if Slack was already reachable with STATUS_[23] before policy-add, and accepted any STATUS_[2-4] response after policy-add as evidence that the endpoint was no longer proxy-blocked. The narrower Vitest contract can fail policy-correct behavior such as an ERROR_ proxy denial before approval, a redirect, or a Slack client/auth response after approval.
    • Recommendation: Either align the Vitest assertions with the retained legacy success/blocking ranges or document and test why the narrower STATUS_403-before and STATUS_200-after contract is intentionally required.
    • Evidence: network-policy.test.ts expects slackBefore to match /STATUS_403/ and slackAfter to match /STATUS_200/. test/e2e/test-network-policy.sh skips if the before probe matches STATUS_[23][0-9][0-9] and accepts after STATUS_[2-4][0-9][0-9].

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
test/e2e-scenario/live/network-policy.test.ts (1)

348-727: 🏗️ Heavy lift

Split the main scenario into smaller local contract helpers.

This single callback now mixes onboarding, preset application, package-manager probes, SSRF checks, host-gateway validation, and permissive-mode assertions. That makes failures harder to localize and pushes well past the repo's low-complexity JS/TS guideline. Extract a few file-local helpers or subtests for the major contract blocks instead of keeping the full scenario in one function. As per coding guidelines, "Keep function complexity low in JavaScript and TypeScript code."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e-scenario/live/network-policy.test.ts` around lines 348 - 727, The
test callback passed to RUN_NETWORK_POLICY_TEST is too large and mixes
onboarding, preset application, package-manager probes, SSRF/host-gateway checks
and permissive-mode assertions; split it into smaller, focused helpers or
subtests (e.g., extract onboarding logic around runNemoclaw(... "onboard"...)
and subsequent denyDefault check into an onboardAndDenyDefault helper, move
preset operations using applyPreset/runNemoclaw/connectProbe/brewProbe into
presetHelpers, isolate SSRF and inference checks using
fetchStatus/isPrivateIp/inference into ssrfAndInferenceHelpers, and isolate
host-gateway logic around startMarkerServer/writeHostGatewayPolicy/webFetch into
hostGatewayHelper). Replace the big inline callback body with sequential calls
to these file-local async functions (or Vitest subtests) to keep function
complexity low, maintain existing assertions (using SANDBOX_NAME, applyPreset,
sandboxBash, fetchStatus, startMarkerServer, writeHostGatewayPolicy,
runNemoclaw, artifacts) and preserve cleanup handling and timeouts.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/e2e-vitest-scenarios.yaml:
- Around line 280-341: The network-policy-vitest job fails because the test
(invocation of openshell at line 387 in
test/e2e-scenario/live/network-policy.test.ts) requires OpenShell to be
installed; add the same OpenShell bootstrap/install step used by other
sandbox-backed live jobs into this job before the "Run network-policy live test"
step (i.e., add an install/bootstrap step that provisions the openshell binary
or runtime so that the test's call to openshell can spawn successfully).

In `@test/e2e-scenario/live/network-policy.test.ts`:
- Around line 508-513: Replace the loose status checks around the Slack preset
with origin-specific assertions: assert that slackBefore (from fetchStatus)
equals/matches the blocked status (e.g., /STATUS_403/) rather than just "not
2xx/3xx", and assert that slackAfter equals/matches a true success status (e.g.,
/STATUS_200/) rather than allowing any 2xx-4xx; update the assertions around
slackBefore, slackApply, and slackAfter (and keep using fetchStatus and
applyPresetInteractively) so the test proves the preset changed reachability by
checking a specific blocked code before and a specific success code after.

---

Nitpick comments:
In `@test/e2e-scenario/live/network-policy.test.ts`:
- Around line 348-727: The test callback passed to RUN_NETWORK_POLICY_TEST is
too large and mixes onboarding, preset application, package-manager probes,
SSRF/host-gateway checks and permissive-mode assertions; split it into smaller,
focused helpers or subtests (e.g., extract onboarding logic around
runNemoclaw(... "onboard"...) and subsequent denyDefault check into an
onboardAndDenyDefault helper, move preset operations using
applyPreset/runNemoclaw/connectProbe/brewProbe into presetHelpers, isolate SSRF
and inference checks using fetchStatus/isPrivateIp/inference into
ssrfAndInferenceHelpers, and isolate host-gateway logic around
startMarkerServer/writeHostGatewayPolicy/webFetch into hostGatewayHelper).
Replace the big inline callback body with sequential calls to these file-local
async functions (or Vitest subtests) to keep function complexity low, maintain
existing assertions (using SANDBOX_NAME, applyPreset, sandboxBash, fetchStatus,
startMarkerServer, writeHostGatewayPolicy, runNemoclaw, artifacts) and preserve
cleanup handling and timeouts.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: ec0be148-7809-4ed2-91c2-c1ed41561074

📥 Commits

Reviewing files that changed from the base of the PR and between 6622476 and 2393538.

📒 Files selected for processing (2)
  • .github/workflows/e2e-vitest-scenarios.yaml
  • test/e2e-scenario/live/network-policy.test.ts

Comment thread .github/workflows/e2e-vitest-scenarios.yaml
Comment thread test/e2e-scenario/live/network-policy.test.ts Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
.github/workflows/e2e-vitest-scenarios.yaml (1)

333-343: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Install OpenShell before running the network-policy live test.

Line 341 executes a test that checks openshell --version before onboarding, but this job still doesn’t provision OpenShell. That can keep failing with spawn openshell ENOENT. Add the same OpenShell bootstrap/install step used by other sandbox-backed live jobs before this run step.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/e2e-vitest-scenarios.yaml around lines 333 - 343, The
network-policy live test job runs tests that call `openshell --version` but
never provisions OpenShell, causing `spawn openshell ENOENT`; add the same
OpenShell bootstrap/install step used in the other sandbox-backed live jobs
before the `npx vitest run --project e2e-scenarios-live
test/e2e-scenario/live/network-policy.test.ts` run step so `openshell` is
available (mirror the install/bootstrap commands used in the other live jobs to
ensure `openshell --version` succeeds).

Source: Pipeline failures

🧹 Nitpick comments (1)
tools/e2e-scenarios/workflow-boundary.mts (1)

487-488: ⚡ Quick win

Extend boundary checks to all free-standing selective-filter jobs.

The boundary validator now enforces selective job.if only for two jobs. It still doesn’t assert the selective filters for network-policy-vitest and openclaw-tui-chat-correlation-vitest (workflow Lines 283 and 359), so regressions there won’t be caught by this support layer.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tools/e2e-scenarios/workflow-boundary.mts` around lines 487 - 488, The
boundary checks only call validateOpenShellVersionPinVitestJob and
validateOnboardNegativePathsVitestJob; add validation for the remaining
selective-filter jobs so they enforce job.if as well. Update the invocation in
workflow-boundary.mts to also call validators for "network-policy-vitest" and
"openclaw-tui-chat-correlation-vitest" (either by adding new functions like
validateNetworkPolicyVitestJob and validateOpenclawTuiChatCorrelationVitestJob
or by extending an existing validator to accept a list of job names), and ensure
those validators assert the selective job.if condition the same way as
validateOpenShellVersionPinVitestJob/validateOnboardNegativePathsVitestJob do.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In @.github/workflows/e2e-vitest-scenarios.yaml:
- Around line 333-343: The network-policy live test job runs tests that call
`openshell --version` but never provisions OpenShell, causing `spawn openshell
ENOENT`; add the same OpenShell bootstrap/install step used in the other
sandbox-backed live jobs before the `npx vitest run --project e2e-scenarios-live
test/e2e-scenario/live/network-policy.test.ts` run step so `openshell` is
available (mirror the install/bootstrap commands used in the other live jobs to
ensure `openshell --version` succeeds).

---

Nitpick comments:
In `@tools/e2e-scenarios/workflow-boundary.mts`:
- Around line 487-488: The boundary checks only call
validateOpenShellVersionPinVitestJob and validateOnboardNegativePathsVitestJob;
add validation for the remaining selective-filter jobs so they enforce job.if as
well. Update the invocation in workflow-boundary.mts to also call validators for
"network-policy-vitest" and "openclaw-tui-chat-correlation-vitest" (either by
adding new functions like validateNetworkPolicyVitestJob and
validateOpenclawTuiChatCorrelationVitestJob or by extending an existing
validator to accept a list of job names), and ensure those validators assert the
selective job.if condition the same way as
validateOpenShellVersionPinVitestJob/validateOnboardNegativePathsVitestJob do.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: aae9ad9d-48b7-48d2-b0e2-77b92052a021

📥 Commits

Reviewing files that changed from the base of the PR and between 2393538 and 651dfa5.

📒 Files selected for processing (3)
  • .github/workflows/e2e-vitest-scenarios.yaml
  • test/e2e-scenario/support-tests/e2e-scenarios-workflow.test.ts
  • tools/e2e-scenarios/workflow-boundary.mts

@jyaunches jyaunches changed the title test(e2e): migrate network policy to Vitest test(e2e): P1 anchor 6 migrate test-network-policy.sh to vitest Jun 11, 2026
@jyaunches jyaunches changed the title test(e2e): P1 anchor 6 migrate test-network-policy.sh to vitest test(e2e): migrate network policy to Vitest Jun 11, 2026
@jyaunches jyaunches changed the title test(e2e): migrate network policy to Vitest test(e2e): migrate network policy to Vitest [ANCHOR-6] Jun 11, 2026
@jyaunches

Copy link
Copy Markdown
Contributor Author

Maintainer simplicity/security review for #5098 — request changes.

This can stay as the network-policy anchor PR, but please make the migrated Vitest coverage simpler and less false-positive-prone.

Required:

  • Split the large main scenario callback into small local contract helpers/phases inside the test file; do not introduce a new shared framework.
  • Narrow negative network assertions so generic DNS/proxy/fetch/runtime failures do not count as successful policy denials unless the invalid state is documented and intentional.
  • Add or document deterministic coverage for the transient provider-validation retry/skip classifier so it cannot hide local CLI/config/policy failures.
  • Keep workflow changes to the smallest same-runner network-policy-vitest dispatch path and required validation.

Goal: preserve the security/policy anchor in this PR, but make the test read as explicit contract phases with narrow denial evidence.

# Conflicts:
#	.github/workflows/e2e-vitest-scenarios.yaml
#	test/e2e-scenario/support-tests/e2e-scenarios-workflow.test.ts
#	tools/e2e-scenarios/workflow-boundary.mts
@cv cv added the v0.0.64 Release target label Jun 11, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
.github/workflows/e2e-vitest-scenarios.yaml (1)

92-123: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Free-standing scenario ids are filtered out before any job can run them.

With inputs.scenarios=network-policy, Lines 109-120 remove network-policy from registry_scenarios and force matrix="[]", but Lines 243, 284, 329, and 412 only dispatch the matching free-standing jobs from inputs.jobs. The result is that live-scenarios skips and every free-standing job also skips, so the documented scenarios=network-policy path runs nothing.

Minimal fix
-    if: ${{ (inputs.jobs == '' && inputs.scenarios == '') || contains(format(',{0},', inputs.jobs), ',network-policy-vitest,') }}
+    if: ${{ (inputs.jobs == '' && inputs.scenarios == '') || contains(format(',{0},', inputs.jobs), ',network-policy-vitest,') || contains(format(',{0},', inputs.scenarios), ',network-policy,') }}

Apply the same pattern to the other scenario-backed free-standing jobs (openshell-version-pin, onboard-negative-paths, openclaw-tui-chat-correlation), or emit the filtered free-standing ids from generate-matrix and route jobs from that output.

Also applies to: 243-243, 284-284, 329-329, 412-412

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/e2e-vitest-scenarios.yaml around lines 92 - 123, The
current generate-matrix logic (free_standing_scenarios array +
is_free_standing_scenario function) strips free-standing ids out of
registry_scenarios and sets matrix="[]" when only free-standing scenarios are
requested, causing downstream job dispatch (jobs matching openshell-version-pin,
onboard-negative-paths, openclaw-tui-chat-correlation, network-policy) to skip;
fix by emitting the selected free-standing ids from this script (e.g., set an
output like free_standing_selected CSV or include them in matrix when
registry_scenarios is empty) instead of forcing matrix="[]", and update the job
dispatchers that currently read matrix to consume that output (use the same
free_standing_scenarios/is_free_standing_scenario filtering to populate the
output), ensuring registry_scenarios, args and matrix logic remain consistent.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In @.github/workflows/e2e-vitest-scenarios.yaml:
- Around line 92-123: The current generate-matrix logic (free_standing_scenarios
array + is_free_standing_scenario function) strips free-standing ids out of
registry_scenarios and sets matrix="[]" when only free-standing scenarios are
requested, causing downstream job dispatch (jobs matching openshell-version-pin,
onboard-negative-paths, openclaw-tui-chat-correlation, network-policy) to skip;
fix by emitting the selected free-standing ids from this script (e.g., set an
output like free_standing_selected CSV or include them in matrix when
registry_scenarios is empty) instead of forcing matrix="[]", and update the job
dispatchers that currently read matrix to consume that output (use the same
free_standing_scenarios/is_free_standing_scenario filtering to populate the
output), ensuring registry_scenarios, args and matrix logic remain consistent.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 8c555430-0fa9-447f-9719-0a2b44649367

📥 Commits

Reviewing files that changed from the base of the PR and between 7f45a2e and a84b91e.

📒 Files selected for processing (2)
  • .github/workflows/e2e-vitest-scenarios.yaml
  • tools/e2e-scenarios/workflow-boundary.mts
🚧 Files skipped from review as they are similar to previous changes (1)
  • tools/e2e-scenarios/workflow-boundary.mts

@github-actions

Copy link
Copy Markdown
Contributor

Vitest E2E Scenario Results — ❌ Some jobs failed

Run: 27372501486
Workflow ref: e2e-migrate/test-network-policy-simple
Requested scenarios: (default — all supported)
Requested jobs: (default — all free-standing when no scenarios are requested)
Summary: 5 passed, 3 failed, 0 skipped

Job Result
gateway-guard-recovery ❌ failure
generate-matrix ✅ success
live-scenarios ❌ failure
network-policy-vitest ✅ success
onboard-negative-paths-vitest ✅ success
openclaw-tui-chat-correlation-vitest ❌ failure
openshell-version-pin-vitest ✅ success
validate-jobs ✅ success

Failed jobs: gateway-guard-recovery, live-scenarios, openclaw-tui-chat-correlation-vitest. Check run artifacts for logs.

@github-actions

Copy link
Copy Markdown
Contributor

Vitest E2E Scenario Results — ❌ Some jobs failed

Run: 27373045359
Workflow ref: e2e-migrate/test-network-policy-simple
Requested scenarios: (default — all supported)
Requested jobs: (default — all free-standing when no scenarios are requested)
Summary: 5 passed, 3 failed, 0 skipped

Job Result
gateway-guard-recovery ❌ failure
generate-matrix ✅ success
live-scenarios ❌ failure
network-policy-vitest ✅ success
onboard-negative-paths-vitest ✅ success
openclaw-tui-chat-correlation-vitest ❌ failure
openshell-version-pin-vitest ✅ success
validate-jobs ✅ success

Failed jobs: gateway-guard-recovery, live-scenarios, openclaw-tui-chat-correlation-vitest. Check run artifacts for logs.

@cv cv left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved. Normal PR checks are green on the latest head, review feedback is addressed, unresolved threads are clear, and network-policy-vitest passed. The remaining full scenario fan-out failures appear tied to unstable third-party NVIDIA inference endpoint validation rather than this PR.

@github-actions

Copy link
Copy Markdown
Contributor

Vitest E2E Scenario Results — ❌ Some jobs failed

Run: 27373045359
Workflow ref: e2e-migrate/test-network-policy-simple
Requested scenarios: (default — all supported)
Requested jobs: (default — all free-standing when no scenarios are requested)
Summary: 6 passed, 2 failed, 0 skipped

Job Result
gateway-guard-recovery ❌ failure
generate-matrix ✅ success
live-scenarios ❌ failure
network-policy-vitest ✅ success
onboard-negative-paths-vitest ✅ success
openclaw-tui-chat-correlation-vitest ✅ success
openshell-version-pin-vitest ✅ success
validate-jobs ✅ success

Failed jobs: gateway-guard-recovery, live-scenarios. Check run artifacts for logs.

1 similar comment
@github-actions

Copy link
Copy Markdown
Contributor

Vitest E2E Scenario Results — ❌ Some jobs failed

Run: 27373045359
Workflow ref: e2e-migrate/test-network-policy-simple
Requested scenarios: (default — all supported)
Requested jobs: (default — all free-standing when no scenarios are requested)
Summary: 6 passed, 2 failed, 0 skipped

Job Result
gateway-guard-recovery ❌ failure
generate-matrix ✅ success
live-scenarios ❌ failure
network-policy-vitest ✅ success
onboard-negative-paths-vitest ✅ success
openclaw-tui-chat-correlation-vitest ✅ success
openshell-version-pin-vitest ✅ success
validate-jobs ✅ success

Failed jobs: gateway-guard-recovery, live-scenarios. Check run artifacts for logs.

…ork-policy-simple

# Conflicts:
#	.github/workflows/e2e-vitest-scenarios.yaml
#	test/e2e-scenario/support-tests/e2e-scenarios-workflow.test.ts
#	tools/e2e-scenarios/workflow-boundary.mts
@jyaunches jyaunches merged commit f18fb25 into main Jun 11, 2026
37 of 38 checks passed
@jyaunches jyaunches deleted the e2e-migrate/test-network-policy-simple branch June 11, 2026 20:37
@wscurran wscurran added area: ci CI workflows, checks, release automation, or GitHub Actions area: e2e End-to-end tests, nightly failures, or validation infrastructure area: policy Network policy, egress rules, presets, or sandbox policy chore Build, CI, dependency, or tooling maintenance labels Jun 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: ci CI workflows, checks, release automation, or GitHub Actions area: e2e End-to-end tests, nightly failures, or validation infrastructure area: policy Network policy, egress rules, presets, or sandbox policy chore Build, CI, dependency, or tooling maintenance v0.0.64 Release target

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants