Skip to content

test(e2e): add Hermes live Vitest migration [ANCHOR-5]#5227

Closed
jyaunches wants to merge 20 commits into
mainfrom
e2e-migrate/test-hermes-e2e-simple
Closed

test(e2e): add Hermes live Vitest migration [ANCHOR-5]#5227
jyaunches wants to merge 20 commits into
mainfrom
e2e-migrate/test-hermes-e2e-simple

Conversation

@jyaunches

@jyaunches jyaunches commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Summary

Migrate test/e2e/test-hermes-e2e.sh with simple live Vitest coverage.

Related Issues

Refs #5098

Contract mapping

  • Legacy assertion: non-interactive install selects and onboards Hermes via NEMOCLAW_AGENT=hermes.
    • Replacement: test/e2e-scenario/live/hermes-e2e.test.ts runs bash install.sh --non-interactive with Hermes env.
    • Boundary preserved: real installer shell, Docker/OpenShell, host PATH/install side effects.
  • Legacy assertion: Hermes sandbox exists, status works, session records agent=hermes, inference provider and policy are configured.
    • Replacement: live Vitest nemoclaw list/status, session JSON, openshell inference get, and openshell policy get --full assertions.
    • Boundary preserved: real nemoclaw and openshell commands.
  • Legacy assertion: Hermes health, binary, config/state directory, and optional dashboard respond from the sandbox/host.
    • Replacement: sandbox exec health/version/config probes plus optional dashboard registry/forward/HTTP checks.
    • Boundary preserved: real sandbox exec, HTTP, OpenShell forwards.
  • Legacy assertion: live NVIDIA Endpoints and sandbox inference.local chat return PONG.
    • Replacement: direct provider curl and sandbox curl https://inference.local/v1/chat/completions assertions.
    • Boundary preserved: real external provider call and sandbox routing path.
  • Legacy assertion: CLI logs and agent manifest loading still work.
    • Replacement: nemoclaw <sandbox> logs and bin/lib/agent-defs manifest checks.
    • Boundary preserved: real CLI and built repo module load.

Simplicity check

  • Test shape: simple live Vitest test.
  • New shared helpers: none.
  • New framework/registry/ledger: none.
  • Workflow changes: adds a selective hermes-e2e free-standing Vitest job in e2e-vitest-scenarios.yaml; legacy shell script deletion and nightly shell retirement are deferred to Epic: Migrate legacy bash E2E into the Vitest E2E system #5098 Phase 11.

Verification

  • npm ci --ignore-scripts
  • npm run build:cli
  • npx vitest run --project e2e-vitest-support test/e2e-scenario/support-tests/e2e-scenarios-workflow.test.ts test/e2e-scenario/support-tests/e2e-scenario-matrix.test.ts --silent=false --reporter=default
  • env -u NVIDIA_API_KEY NEMOCLAW_RUN_E2E_SCENARIOS=1 npx vitest run --project e2e-scenarios-live test/e2e-scenario/live/hermes-e2e.test.ts --silent=false --reporter=default
  • npx biome check test/e2e-scenario/live/hermes-e2e.test.ts .github/workflows/e2e-vitest-scenarios.yaml
  • git diff --check

Live validation

Selective hermes-e2e dispatch in E2E / Vitest Scenarios is pending after PR creation.

Summary by CodeRabbit

  • Tests

    • Added a Hermes live end-to-end Vitest covering installer/runtime, health, inference, dashboard probes, logs, and cleanup; records scenario artifacts.
    • Added local helpers/tests to run the workflow’s matrix-generation step and validate Hermes selection behavior.
  • New Features

    • CI emits a hermes_selected flag and conditionally runs a Hermes-specific Vitest job; PR reporting includes its result.
    • live-scenarios is skipped when generated matrix is empty.
  • Chores

    • Workflow validation extended for the new Hermes job, stricter matrix/output checks, env/secret and job-level constraints.

@coderabbitai

coderabbitai Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

The PR adds a gated Hermes live Vitest job and a comprehensive Hermes live E2E test, extends matrix generation to export hermes_selected and filter hermes-e2e, and updates workflow-boundary validators and tests to validate the new job and matrix outputs.

Changes

Hermes E2E scenario in CI and test implementation

Layer / File(s) Summary
Workflow scenario matrix and hermes-e2e-vitest job
.github/workflows/e2e-vitest-scenarios.yaml
When inputs.scenarios/inputs.jobs are provided, matrix generation parses comma-separated IDs, removes hermes-e2e from registry-driven scenarios, and conditionally emits a filtered registry matrix or []. generate-matrix now exports hermes_selected. live-scenarios skips when matrix == '[]'. A new hermes-e2e-vitest job runs when hermes_selected == 'true', sets Hermes-specific env vars, runs the Hermes Vitest test, and uploads Hermes artifacts.
Hermes E2E test configuration and utility helpers
test/e2e-scenario/live/hermes-e2e.test.ts
Adds sandbox/URL constants, env-driven sandbox/model/dashboard configuration, subprocess env builders, chat payload builders, heterogeneous response-content extractors, assertion/registry/HTTP helpers, forward-list parsing, retry wrapper, and best-effort cleanup utilities.
Hermes E2E test scenario phases
test/e2e-scenario/live/hermes-e2e.test.ts
Implements gated live E2E flow: secret-gated artifact writes, pre-cleanup teardown, Docker/manifest/model checks, non-interactive Hermes install, CLI probes, sandbox listing/status/session/inference/policy assertions, health polling, Hermes version/config probes, conditional dashboard registry/forward/HTTP checks, direct and in-sandbox NVIDIA inference “PONG” validation, runtime log and agent checks, optional sandbox destruction, and writes scenario-result.json.
Workflow validators and boundary checks
tools/e2e-scenarios/workflow-boundary.mts
Adds validateHermesE2EVitestJob, requires generate-matrix to expose hermes_selected mapped from the matrix step, tightens matrix-generation assertions to include both hermes_selected outcomes, updates live-scenarios skip condition to matrix != '[]', and wires the new validator into workflow boundary checks.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • NVIDIA/NemoClaw#5243: Updates to e2e Vitest job-selector/matrix logic and workflow-boundary validation similar to this change.
  • NVIDIA/NemoClaw#5152: Adds workflow-boundary validation for a free-standing Vitest job; related validation patterns.

Suggested labels

area: e2e, area: ci

Suggested reviewers

  • cv
  • prekshivyas

Poem

🐰 I hopped through CI with a test so bright,
Installed Hermes by script in the night,
Polls to /health until a PONG did sing,
Sandbox spun up, dashboards took wing,
Logs and agents checked — the run took flight.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and specifically describes the main change: migrating a legacy E2E test to a Vitest implementation, with a clear reference to the associated issue ticket.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch e2e-migrate/test-hermes-e2e-simple

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

github-actions Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

E2E Advisor Recommendation

Required E2E: hermes-e2e-vitest
Optional E2E: openshell-version-pin-vitest, network-policy-vitest

Dispatch hint: hermes-e2e-vitest

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

  • hermes-e2e-vitest (high): This PR adds the Hermes live E2E job and test and wires it into the workflow. It directly covers installer/onboarding, Hermes sandbox lifecycle, gateway cleanup, health checks, and live inference routing, all of which are required-risk domains.

Optional E2E

  • openshell-version-pin-vitest (low): Useful low-cost confidence that the modified free-standing jobs selector path still dispatches an existing non-Hermes job without selecting Hermes accidentally.
  • network-policy-vitest (high): Optional regression signal for another secret-bearing/free-standing workflow path after selector and matrix-generation changes; not required because network-policy runtime assets are not modified.

New E2E recommendations

  • None.

Dispatch hint

  • Workflow: .github/workflows/e2e-vitest-scenarios.yaml
  • jobs input: hermes-e2e-vitest

@github-actions

github-actions Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Vitest E2E Scenario Recommendation

Required Vitest E2E scenarios: e2e-scenarios-all
Optional Vitest E2E scenarios: None

Dispatch required Vitest E2E scenarios:

  • gh workflow run e2e-vitest-scenarios.yaml --ref <pr-head-ref>

Workflow run

Full Vitest E2E advisor summary

Vitest E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required Vitest E2E scenarios

  • e2e-scenarios-all: The PR changes the canonical Vitest scenario workflow, matrix/selector behavior, workflow boundary validation, and adds a live Hermes Vitest job/test. Workflow and matrix dispatch changes require the full e2e-vitest-scenarios fan-out rather than a targeted scenario dispatch.
    • Dispatch: gh workflow run e2e-vitest-scenarios.yaml --ref <pr-head-ref>

Optional Vitest E2E scenarios

  • None.

Relevant changed files

  • .github/workflows/e2e-vitest-scenarios.yaml
  • test/e2e-scenario/live/hermes-e2e.test.ts
  • test/e2e-scenario/support-tests/e2e-scenarios-workflow.test.ts
  • tools/e2e-scenarios/workflow-boundary.mts

@github-actions

github-actions Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

PR Review Advisor

Findings: 0 needs attention, 5 worth checking, 0 nice ideas
Since last review: 0 prior items resolved, 5 still apply, 0 new items found

Review findings

🛠️ Needs attention

  • None.

🔎 Worth checking

  • Source-of-truth review needed: .github/workflows/e2e-vitest-scenarios.yaml Hermes artifact upload: The advisor marked localized patch analysis as missing.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: .github/workflows/e2e-vitest-scenarios.yaml:408-416 uploads e2e-artifacts/vitest/hermes-e2e/ after the Hermes test runs with NVIDIA_API_KEY.
  • Source-of-truth review needed: test/e2e-scenario/live/hermes-e2e.test.ts migration boundary versus legacy security-posture phase: The advisor marked localized patch analysis as needs_followup.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: test/e2e-scenario/live/hermes-e2e.test.ts:16 says it is migrated from test/e2e/test-hermes-e2e.sh; test/e2e/test-hermes-e2e.sh:720-723 still runs security_posture_assertions_run for Hermes when NEMOCLAW_E2E_SECURITY_POSTURE=1.
  • Hermes artifact upload lacks a final secret-leak guard (.github/workflows/e2e-vitest-scenarios.yaml:413): The Hermes workflow job passes NVIDIA_API_KEY into a live installer/sandbox/provider test and then uploads the entire Hermes artifact directory. Fixture redaction and explicit redactionValues reduce risk, but the upload boundary does not fail closed if any raw NVIDIA key, Authorization: Bearer value, or token-shaped secret reaches the artifact tree.
    • Recommendation: Before actions/upload-artifact, add a fail-closed scan of e2e-artifacts/vitest/hermes-e2e for the raw NVIDIA_API_KEY, Authorization: Bearer values, and fixture token-pattern secrets. Alternatively, upload only a known-safe allowlist of Hermes artifact files instead of the whole directory.
    • Evidence: .github/workflows/e2e-vitest-scenarios.yaml:400-401 passes NVIDIA_API_KEY to the Hermes Vitest step; .github/workflows/e2e-vitest-scenarios.yaml:408-416 uploads path: e2e-artifacts/vitest/hermes-e2e/. test/e2e-scenario/live/hermes-e2e.test.ts writes shell/provider/sandbox/log artifacts, and no pre-upload scan or Hermes-specific allowlist was found.
  • Hermes selector tests still miss mixed and rejected secret-bearing dispatch paths (test/e2e-scenario/support-tests/e2e-scenarios-workflow.test.ts:125): The support tests now cover some jobs-only Hermes and non-Hermes matrix behavior, but the changed workflow also supports mixed registry plus Hermes scenario requests and rejects malformed or mutually exclusive selectors before the secret-bearing Hermes job can run. These paths decide whether NVIDIA_API_KEY reaches hermes-e2e-vitest.
    • Recommendation: Add behavior-specific support tests for mixed scenarios=ubuntu-repo-cloud-openclaw,hermes-e2e, malformed Hermes-containing scenario input, and jobs plus scenarios being set. Assert that rejected inputs exit before writing hermes_selected=true or otherwise enabling the Hermes job.
    • Evidence: test/e2e-scenario/support-tests/e2e-scenarios-workflow.test.ts:125-145 covers jobs-only openshell-version-pin, hermes-e2e-vitest, network-policy, and scenarios=hermes-e2e. .github/workflows/e2e-vitest-scenarios.yaml:91-164 contains separate jobs/scenarios validation, mixed registry scenario routing, and hermes_selected output logic.
  • Optional legacy Hermes security-posture phase is not migrated or explicitly owned (test/e2e-scenario/live/hermes-e2e.test.ts:16): The new Vitest file says it is migrated from test/e2e/test-hermes-e2e.sh, but the retained legacy shell script still has an optional NEMOCLAW_E2E_SECURITY_POSTURE=1 branch that runs Hermes security-posture assertions. Without equivalent Vitest coverage or an in-code retained-owner note plus guard, readers can treat the migration as complete while a security-relevant branch remains outside the replacement test.
    • Recommendation: Either migrate equivalent Hermes security-posture assertions into Vitest, or add a clear in-code note near the migration comment stating that the retained legacy shell/nightly lane continues to own that optional phase until a later retirement PR, with a regression guard proving that ownership remains wired.
    • Evidence: test/e2e-scenario/live/hermes-e2e.test.ts:16 says "Migrated from test/e2e/test-hermes-e2e.sh." test/e2e/test-hermes-e2e.sh:720-723 still invokes security_posture_assertions_run "$SANDBOX_NAME" "hermes" when NEMOCLAW_E2E_SECURITY_POSTURE=1.

🌱 Nice ideas

  • None.
Consider writing more tests for
  • **Runtime validation** — workflow dispatch scenarios=ubuntu-repo-cloud-openclaw,hermes-e2e emits the ubuntu-repo-cloud-openclaw matrix row and hermes_selected=true. The PR changes a secret-bearing workflow selector and migrates a large live installer/sandbox/provider boundary. Static workflow-boundary checks and some runtime matrix-script coverage help, but mixed/rejected selector paths and artifact publication safety still need behavior-specific validation.
  • **Runtime validation** — workflow dispatch scenarios=hermes-e2e,../escape exits nonzero before writing hermes_selected=true. The PR changes a secret-bearing workflow selector and migrates a large live installer/sandbox/provider boundary. Static workflow-boundary checks and some runtime matrix-script coverage help, but mixed/rejected selector paths and artifact publication safety still need behavior-specific validation.
  • **Runtime validation** — workflow dispatch with jobs=hermes-e2e-vitest and scenarios=network-policy exits nonzero before enabling hermes-e2e-vitest. The PR changes a secret-bearing workflow selector and migrates a large live installer/sandbox/provider boundary. Static workflow-boundary checks and some runtime matrix-script coverage help, but mixed/rejected selector paths and artifact publication safety still need behavior-specific validation.
  • **Runtime validation** — Hermes artifact upload guard fails when e2e-artifacts/vitest/hermes-e2e contains the raw NVIDIA_API_KEY. The PR changes a secret-bearing workflow selector and migrates a large live installer/sandbox/provider boundary. Static workflow-boundary checks and some runtime matrix-script coverage help, but mixed/rejected selector paths and artifact publication safety still need behavior-specific validation.
  • **Runtime validation** — Hermes artifact upload guard fails when e2e-artifacts/vitest/hermes-e2e contains Authorization: Bearer <token>. The PR changes a secret-bearing workflow selector and migrates a large live installer/sandbox/provider boundary. Static workflow-boundary checks and some runtime matrix-script coverage help, but mixed/rejected selector paths and artifact publication safety still need behavior-specific validation.
  • **Hermes selector tests still miss mixed and rejected secret-bearing dispatch paths** — Add behavior-specific support tests for mixed scenarios=ubuntu-repo-cloud-openclaw,hermes-e2e, malformed Hermes-containing scenario input, and jobs plus scenarios being set. Assert that rejected inputs exit before writing hermes_selected=true or otherwise enabling the Hermes job.
  • **Acceptance clause:** No deterministic linked issue acceptance clauses or issue comments were provided for this review context. — add test evidence or identify existing coverage. Trusted context has linkedIssues: []. PR body references Refs Epic: Migrate legacy bash E2E into the Vitest E2E system #5098 and contract-mapping prose, but PR-provided prose was treated as untrusted scope evidence rather than authoritative acceptance criteria.
  • **.github/workflows/e2e-vitest-scenarios.yaml Hermes artifact upload** — Missing: a workflow/support test or helper proving upload fails when the Hermes artifact directory contains the raw NVIDIA_API_KEY or Authorization: Bearer <token>.. .github/workflows/e2e-vitest-scenarios.yaml:408-416 uploads e2e-artifacts/vitest/hermes-e2e/ after the Hermes test runs with NVIDIA_API_KEY.
Since last review details

Current findings:

  • Source-of-truth review needed: .github/workflows/e2e-vitest-scenarios.yaml Hermes artifact upload: The advisor marked localized patch analysis as missing.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: .github/workflows/e2e-vitest-scenarios.yaml:408-416 uploads e2e-artifacts/vitest/hermes-e2e/ after the Hermes test runs with NVIDIA_API_KEY.
  • Source-of-truth review needed: test/e2e-scenario/live/hermes-e2e.test.ts migration boundary versus legacy security-posture phase: The advisor marked localized patch analysis as needs_followup.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: test/e2e-scenario/live/hermes-e2e.test.ts:16 says it is migrated from test/e2e/test-hermes-e2e.sh; test/e2e/test-hermes-e2e.sh:720-723 still runs security_posture_assertions_run for Hermes when NEMOCLAW_E2E_SECURITY_POSTURE=1.
  • Hermes artifact upload lacks a final secret-leak guard (.github/workflows/e2e-vitest-scenarios.yaml:413): The Hermes workflow job passes NVIDIA_API_KEY into a live installer/sandbox/provider test and then uploads the entire Hermes artifact directory. Fixture redaction and explicit redactionValues reduce risk, but the upload boundary does not fail closed if any raw NVIDIA key, Authorization: Bearer value, or token-shaped secret reaches the artifact tree.
    • Recommendation: Before actions/upload-artifact, add a fail-closed scan of e2e-artifacts/vitest/hermes-e2e for the raw NVIDIA_API_KEY, Authorization: Bearer values, and fixture token-pattern secrets. Alternatively, upload only a known-safe allowlist of Hermes artifact files instead of the whole directory.
    • Evidence: .github/workflows/e2e-vitest-scenarios.yaml:400-401 passes NVIDIA_API_KEY to the Hermes Vitest step; .github/workflows/e2e-vitest-scenarios.yaml:408-416 uploads path: e2e-artifacts/vitest/hermes-e2e/. test/e2e-scenario/live/hermes-e2e.test.ts writes shell/provider/sandbox/log artifacts, and no pre-upload scan or Hermes-specific allowlist was found.
  • Hermes selector tests still miss mixed and rejected secret-bearing dispatch paths (test/e2e-scenario/support-tests/e2e-scenarios-workflow.test.ts:125): The support tests now cover some jobs-only Hermes and non-Hermes matrix behavior, but the changed workflow also supports mixed registry plus Hermes scenario requests and rejects malformed or mutually exclusive selectors before the secret-bearing Hermes job can run. These paths decide whether NVIDIA_API_KEY reaches hermes-e2e-vitest.
    • Recommendation: Add behavior-specific support tests for mixed scenarios=ubuntu-repo-cloud-openclaw,hermes-e2e, malformed Hermes-containing scenario input, and jobs plus scenarios being set. Assert that rejected inputs exit before writing hermes_selected=true or otherwise enabling the Hermes job.
    • Evidence: test/e2e-scenario/support-tests/e2e-scenarios-workflow.test.ts:125-145 covers jobs-only openshell-version-pin, hermes-e2e-vitest, network-policy, and scenarios=hermes-e2e. .github/workflows/e2e-vitest-scenarios.yaml:91-164 contains separate jobs/scenarios validation, mixed registry scenario routing, and hermes_selected output logic.
  • Optional legacy Hermes security-posture phase is not migrated or explicitly owned (test/e2e-scenario/live/hermes-e2e.test.ts:16): The new Vitest file says it is migrated from test/e2e/test-hermes-e2e.sh, but the retained legacy shell script still has an optional NEMOCLAW_E2E_SECURITY_POSTURE=1 branch that runs Hermes security-posture assertions. Without equivalent Vitest coverage or an in-code retained-owner note plus guard, readers can treat the migration as complete while a security-relevant branch remains outside the replacement test.
    • Recommendation: Either migrate equivalent Hermes security-posture assertions into Vitest, or add a clear in-code note near the migration comment stating that the retained legacy shell/nightly lane continues to own that optional phase until a later retirement PR, with a regression guard proving that ownership remains wired.
    • Evidence: test/e2e-scenario/live/hermes-e2e.test.ts:16 says "Migrated from test/e2e/test-hermes-e2e.sh." test/e2e/test-hermes-e2e.sh:720-723 still invokes security_posture_assertions_run "$SANDBOX_NAME" "hermes" when NEMOCLAW_E2E_SECURITY_POSTURE=1.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

@jyaunches

Copy link
Copy Markdown
Contributor Author

Temporarily closing to make room for the maintainer PR-limit exemption fix; will reopen after that lands.

@jyaunches jyaunches closed this Jun 11, 2026
@jyaunches jyaunches reopened this Jun 11, 2026
@jyaunches jyaunches changed the title test(e2e): add Hermes live Vitest migration test(e2e): P1 anchor 5 migrate test-hermes-e2e.sh to vitest Jun 11, 2026
@jyaunches jyaunches changed the title test(e2e): P1 anchor 5 migrate test-hermes-e2e.sh to vitest test(e2e): add Hermes live Vitest migration Jun 11, 2026
@jyaunches jyaunches changed the title test(e2e): add Hermes live Vitest migration test(e2e): add Hermes live Vitest migration [ANCHOR-5] Jun 11, 2026
@jyaunches

Copy link
Copy Markdown
Contributor Author

Maintainer simplicity/equivalence review for #5098 — request changes.

This can stay as the Hermes anchor PR, but please tighten the workflow/test boundary before merge.

Required:

  • Document or simplify the special hermes-e2e workflow routing; prefer the smallest dispatch path and avoid bespoke selector machinery unless it is required for the runner/secret boundary.
  • Add source-of-truth clarity and regression coverage for the selector behavior if the bespoke hermes_selected path remains.
  • Explicitly document whether the legacy optional NEMOCLAW_E2E_SECURITY_POSTURE=1 phase is covered here or deferred to a separate migration/dependent PR.
  • Keep artifact upload/redaction evidence sufficient for the secret-bearing Hermes path.

Goal: keep this as a focused Hermes live migration, not a durable workflow-selection framework.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
.github/workflows/e2e-vitest-scenarios.yaml (1)

120-121: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Hermes job bypasses jobs selector and runs unexpectedly in jobs-only dispatches.

At Line 120, hermes_selected is forced to true whenever inputs.scenarios is empty. At Line 328, Hermes is gated only on that output, so a dispatch like jobs=gateway-guard-recovery still runs hermes-e2e-vitest (secret-bearing path), which breaks the jobs input contract.

Suggested fix
-  hermes-e2e-vitest:
+  hermes-e2e-vitest:
     needs: generate-matrix
-    if: ${{ needs.generate-matrix.outputs.hermes_selected == 'true' }}
+    if: ${{ inputs.jobs == '' && needs.generate-matrix.outputs.hermes_selected == 'true' }}

Based on learnings from the workflow boundary contract (tools/e2e-scenarios/workflow-boundary.mts:338-436), this gate is currently modeled only via hermes_selected, so that validator should be updated alongside this workflow change.

Also applies to: 328-328

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/e2e-vitest-scenarios.yaml around lines 120 - 121,
hermes_selected is being forced true whenever inputs.scenarios is empty, which
bypasses the jobs-only selector and lets hermes-e2e-vitest run incorrectly;
update the logic that sets hermes_selected (the assignment that currently sets
hermes_selected=true alongside matrix="$(npx tsx
test/e2e-scenario/scenarios/run.ts ...)") so it only becomes true when hermes is
actually selected by the dispatch (e.g., inputs.jobs includes the hermes job or
inputs.scenarios indicates hermes), not merely when inputs.scenarios is empty,
and then gate the hermes-e2e-vitest job on that corrected hermes_selected value;
also update the corresponding validator in workflow-boundary.mts (the validation
logic around hermes_selected) to match this new selection rule so jobs-only
dispatches no longer trigger the hermes path.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In @.github/workflows/e2e-vitest-scenarios.yaml:
- Around line 120-121: hermes_selected is being forced true whenever
inputs.scenarios is empty, which bypasses the jobs-only selector and lets
hermes-e2e-vitest run incorrectly; update the logic that sets hermes_selected
(the assignment that currently sets hermes_selected=true alongside matrix="$(npx
tsx test/e2e-scenario/scenarios/run.ts ...)") so it only becomes true when
hermes is actually selected by the dispatch (e.g., inputs.jobs includes the
hermes job or inputs.scenarios indicates hermes), not merely when
inputs.scenarios is empty, and then gate the hermes-e2e-vitest job on that
corrected hermes_selected value; also update the corresponding validator in
workflow-boundary.mts (the validation logic around hermes_selected) to match
this new selection rule so jobs-only dispatches no longer trigger the hermes
path.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: f6ff4170-8793-4808-9c5a-7acd5f8f17c6

📥 Commits

Reviewing files that changed from the base of the PR and between 9545ea9 and 985e97a.

📒 Files selected for processing (2)
  • .github/workflows/e2e-vitest-scenarios.yaml
  • tools/e2e-scenarios/workflow-boundary.mts
🚧 Files skipped from review as they are similar to previous changes (1)
  • tools/e2e-scenarios/workflow-boundary.mts

@cv cv added the v0.0.64 Release target label Jun 11, 2026
@github-actions

Copy link
Copy Markdown
Contributor

Vitest E2E Scenario Results — ❌ Some jobs failed

Run: 27362619197
Workflow ref: e2e-migrate/test-hermes-e2e-simple
Requested scenarios: (default — all supported)
Requested jobs: (default — all free-standing when no scenarios are requested)
Summary: 7 passed, 1 failed, 0 skipped

Job Result
gateway-guard-recovery ❌ failure
generate-matrix ✅ success
hermes-e2e-vitest ✅ success
live-scenarios ✅ success
onboard-negative-paths-vitest ✅ success
openclaw-tui-chat-correlation-vitest ✅ success
openshell-version-pin-vitest ✅ success
validate-jobs ✅ success

Failed jobs: gateway-guard-recovery. Check run artifacts for logs.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/e2e-scenario/support-tests/e2e-scenarios-workflow.test.ts`:
- Around line 37-47: The spawnSync call executing generateStep?.run currently
has no timeout and can hang CI; add a timeout option (e.g., timeout: 60000) to
the spawnSync options and make the test fail fast if the child times out by
checking the returned result (the result variable) for a timeout signal (e.g.,
result.signal === 'SIGTERM' or 'SIGKILL') or non-zero status and throwing or
asserting accordingly; update the invocation that sets
env/GITHUB_OUTPUT/GITHUB_STEP_SUMMARY/JOBS/SCENARIOS so the new timeout key is
included and ensure any timeout is expressed in milliseconds and handled as a
test failure.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 10a84e24-3993-4771-bb7c-5b3edc104bed

📥 Commits

Reviewing files that changed from the base of the PR and between 985e97a and 169fad9.

📒 Files selected for processing (3)
  • .github/workflows/e2e-vitest-scenarios.yaml
  • test/e2e-scenario/support-tests/e2e-scenarios-workflow.test.ts
  • tools/e2e-scenarios/workflow-boundary.mts
🚧 Files skipped from review as they are similar to previous changes (1)
  • .github/workflows/e2e-vitest-scenarios.yaml

Comment thread test/e2e-scenario/support-tests/e2e-scenarios-workflow.test.ts
@github-actions

Copy link
Copy Markdown
Contributor

Vitest E2E Scenario Results — ❌ Some jobs failed

Run: 27364597797
Workflow ref: e2e-migrate/test-hermes-e2e-simple
Requested scenarios: (default — all supported)
Requested jobs: (default — all free-standing when no scenarios are requested)
Summary: 5 passed, 3 failed, 0 skipped

Job Result
gateway-guard-recovery ❌ failure
generate-matrix ✅ success
hermes-e2e-vitest ✅ success
live-scenarios ❌ failure
onboard-negative-paths-vitest ✅ success
openclaw-tui-chat-correlation-vitest ❌ failure
openshell-version-pin-vitest ✅ success
validate-jobs ✅ success

Failed jobs: gateway-guard-recovery, live-scenarios, openclaw-tui-chat-correlation-vitest. Check run artifacts for logs.

@github-actions

Copy link
Copy Markdown
Contributor

Vitest E2E Scenario Results — ❌ Some jobs failed

Run: 27365778706
Workflow ref: e2e-migrate/test-hermes-e2e-simple
Requested scenarios: (default — all supported)
Requested jobs: (default — all free-standing when no scenarios are requested)
Summary: 4 passed, 4 failed, 0 skipped

Job Result
gateway-guard-recovery ❌ failure
generate-matrix ✅ success
hermes-e2e-vitest ❌ failure
live-scenarios ❌ failure
onboard-negative-paths-vitest ✅ success
openclaw-tui-chat-correlation-vitest ❌ failure
openshell-version-pin-vitest ✅ success
validate-jobs ✅ success

Failed jobs: gateway-guard-recovery, hermes-e2e-vitest, live-scenarios, openclaw-tui-chat-correlation-vitest. Check run artifacts for logs.

@copy-pr-bot

copy-pr-bot Bot commented Jun 11, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions

Copy link
Copy Markdown
Contributor

Vitest E2E Scenario Results — ❌ Some jobs failed

Run: 27368101005
Workflow ref: e2e-migrate/test-hermes-e2e-simple
Requested scenarios: (default — all supported)
Requested jobs: (default — all free-standing when no scenarios are requested)
Summary: 6 passed, 2 failed, 0 skipped

Job Result
gateway-guard-recovery ❌ failure
generate-matrix ✅ success
hermes-e2e-vitest ✅ success
live-scenarios ❌ failure
onboard-negative-paths-vitest ✅ success
openclaw-tui-chat-correlation-vitest ✅ success
openshell-version-pin-vitest ✅ success
validate-jobs ✅ success

Failed jobs: gateway-guard-recovery, live-scenarios. Check run artifacts for logs.

@github-actions

Copy link
Copy Markdown
Contributor

Vitest E2E Scenario Results — ❌ Some jobs failed

Run: 27370614675
Workflow ref: e2e-migrate/test-hermes-e2e-simple
Requested scenarios: (default — all supported)
Requested jobs: (default — all free-standing when no scenarios are requested)
Summary: 5 passed, 3 failed, 0 skipped

Job Result
gateway-guard-recovery ❌ failure
generate-matrix ✅ success
hermes-e2e-vitest ✅ success
live-scenarios ❌ failure
onboard-negative-paths-vitest ✅ success
openclaw-tui-chat-correlation-vitest ❌ failure
openshell-version-pin-vitest ✅ success
validate-jobs ✅ success

Failed jobs: gateway-guard-recovery, live-scenarios, openclaw-tui-chat-correlation-vitest. Check run artifacts for logs.

1 similar comment
@github-actions

Copy link
Copy Markdown
Contributor

Vitest E2E Scenario Results — ❌ Some jobs failed

Run: 27370614675
Workflow ref: e2e-migrate/test-hermes-e2e-simple
Requested scenarios: (default — all supported)
Requested jobs: (default — all free-standing when no scenarios are requested)
Summary: 5 passed, 3 failed, 0 skipped

Job Result
gateway-guard-recovery ❌ failure
generate-matrix ✅ success
hermes-e2e-vitest ✅ success
live-scenarios ❌ failure
onboard-negative-paths-vitest ✅ success
openclaw-tui-chat-correlation-vitest ❌ failure
openshell-version-pin-vitest ✅ success
validate-jobs ✅ success

Failed jobs: gateway-guard-recovery, live-scenarios, openclaw-tui-chat-correlation-vitest. Check run artifacts for logs.

@cv cv left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved. Normal PR checks are green, review feedback is addressed, unresolved threads are clear, and hermes-e2e-vitest passed. The remaining full scenario fan-out failures appear tied to unstable third-party NVIDIA inference endpoint validation rather than this PR.

@jyaunches

Copy link
Copy Markdown
Contributor Author

Superseded by #5256 with the identical final diff and verified signed history. This PR branch contains unsigned historical commit 02e8ddb7f9, and branch rules prevent rewriting it in place.

@jyaunches

Copy link
Copy Markdown
Contributor Author

Closing as superseded by #5256, which carries the same final diff on a branch with verified signed history.

@jyaunches jyaunches closed this Jun 11, 2026
auto-merge was automatically disabled June 11, 2026 21:11

Pull request was closed

cv pushed a commit that referenced this pull request Jun 11, 2026
Supersedes #5227 due to an unsigned historical commit that cannot be
rewritten under branch rules. This branch has the identical final diff
with verified signed history.

## Summary
Migrate `test/e2e/test-hermes-e2e.sh` with simple live Vitest coverage.

## Related Issues
Refs #5098

## Contract mapping
- Legacy assertion: non-interactive install selects and onboards Hermes
via `NEMOCLAW_AGENT=hermes`.
- Replacement: `test/e2e-scenario/live/hermes-e2e.test.ts` runs `bash
install.sh --non-interactive` with Hermes env.
- Boundary preserved: real installer shell, Docker/OpenShell, host
PATH/install side effects.
- Legacy assertion: Hermes sandbox exists, status works, session records
`agent=hermes`, inference provider and policy are configured.
- Replacement: live Vitest `nemoclaw list/status`, session JSON,
`openshell inference get`, and `openshell policy get --full` assertions.
  - Boundary preserved: real `nemoclaw` and `openshell` commands.
- Legacy assertion: Hermes health, binary, config/state directory, and
optional dashboard respond from the sandbox/host.
- Replacement: sandbox exec health/version/config probes plus optional
dashboard registry/forward/HTTP checks.
  - Boundary preserved: real sandbox exec, HTTP, OpenShell forwards.
- Legacy assertion: live NVIDIA Endpoints and sandbox `inference.local`
chat return PONG.
- Replacement: direct provider curl and sandbox `curl
https://inference.local/v1/chat/completions` assertions.
- Boundary preserved: real external provider call and sandbox routing
path.
- Legacy assertion: CLI logs and agent manifest loading still work.
- Replacement: `nemoclaw <sandbox> logs` and `bin/lib/agent-defs`
manifest checks.
  - Boundary preserved: real CLI and built repo module load.

## Simplicity check
- Test shape: simple live Vitest test.
- New shared helpers: none.
- New framework/registry/ledger: **none**.
- Workflow changes: adds a selective `hermes-e2e` free-standing Vitest
job in `e2e-vitest-scenarios.yaml`; legacy shell script deletion and
nightly shell retirement are deferred to #5098 Phase 11.

## Verification
- `npm ci --ignore-scripts`
- `npm run build:cli`
- `npx vitest run --project e2e-vitest-support
test/e2e-scenario/support-tests/e2e-scenarios-workflow.test.ts
test/e2e-scenario/support-tests/e2e-scenario-matrix.test.ts
--silent=false --reporter=default`
- `env -u NVIDIA_API_KEY NEMOCLAW_RUN_E2E_SCENARIOS=1 npx vitest run
--project e2e-scenarios-live test/e2e-scenario/live/hermes-e2e.test.ts
--silent=false --reporter=default`
- `npx biome check test/e2e-scenario/live/hermes-e2e.test.ts
.github/workflows/e2e-vitest-scenarios.yaml`
- `git diff --check`

## Live validation
Selective `hermes-e2e` dispatch in E2E / Vitest Scenarios is pending
after PR creation.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Tests**
* Added comprehensive end-to-end live test scenario for Hermes,
including CLI validation, sandbox management, health checks, inference
verification, and artifact collection.

* **Chores**
* Extended CI/CD workflow to support new Hermes E2E test job lane with
proper job validation, matrix generation, and PR reporting integration.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
@wscurran wscurran added area: e2e End-to-end tests, nightly failures, or validation infrastructure integration: hermes Hermes integration behavior refactor PR restructures code without intended behavior change labels Jun 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: e2e End-to-end tests, nightly failures, or validation infrastructure integration: hermes Hermes integration behavior refactor PR restructures code without intended behavior change v0.0.64 Release target

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants