test(e2e): add Hermes live Vitest migration [ANCHOR-5]#5227
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughThe PR adds a gated Hermes live Vitest job and a comprehensive Hermes live E2E test, extends matrix generation to export ChangesHermes E2E scenario in CI and test implementation
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
E2E Advisor RecommendationRequired E2E: Dispatch hint: Full advisor summaryE2E Recommendation AdvisorBase: Required E2E
Optional E2E
New E2E recommendations
Dispatch hint
|
Vitest E2E Scenario RecommendationRequired Vitest E2E scenarios: Dispatch required Vitest E2E scenarios:
Full Vitest E2E advisor summaryVitest E2E Scenario AdvisorBase: Required Vitest E2E scenarios
Optional Vitest E2E scenarios
Relevant changed files
|
PR Review AdvisorFindings: 0 needs attention, 5 worth checking, 0 nice ideas Review findings🛠️ Needs attention
🔎 Worth checking
🌱 Nice ideas
Consider writing more tests for
Since last review detailsCurrent findings:
This is an automated advisory review. A human maintainer must make the final merge decision. |
|
Temporarily closing to make room for the maintainer PR-limit exemption fix; will reopen after that lands. |
|
Maintainer simplicity/equivalence review for #5098 — request changes. This can stay as the Hermes anchor PR, but please tighten the workflow/test boundary before merge. Required:
Goal: keep this as a focused Hermes live migration, not a durable workflow-selection framework. |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
.github/workflows/e2e-vitest-scenarios.yaml (1)
120-121:⚠️ Potential issue | 🟠 Major | ⚡ Quick winHermes job bypasses
jobsselector and runs unexpectedly in jobs-only dispatches.At Line 120,
hermes_selectedis forced totruewheneverinputs.scenariosis empty. At Line 328, Hermes is gated only on that output, so a dispatch likejobs=gateway-guard-recoverystill runshermes-e2e-vitest(secret-bearing path), which breaks thejobsinput contract.Suggested fix
- hermes-e2e-vitest: + hermes-e2e-vitest: needs: generate-matrix - if: ${{ needs.generate-matrix.outputs.hermes_selected == 'true' }} + if: ${{ inputs.jobs == '' && needs.generate-matrix.outputs.hermes_selected == 'true' }}Based on learnings from the workflow boundary contract (
tools/e2e-scenarios/workflow-boundary.mts:338-436), this gate is currently modeled only viahermes_selected, so that validator should be updated alongside this workflow change.Also applies to: 328-328
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/e2e-vitest-scenarios.yaml around lines 120 - 121, hermes_selected is being forced true whenever inputs.scenarios is empty, which bypasses the jobs-only selector and lets hermes-e2e-vitest run incorrectly; update the logic that sets hermes_selected (the assignment that currently sets hermes_selected=true alongside matrix="$(npx tsx test/e2e-scenario/scenarios/run.ts ...)") so it only becomes true when hermes is actually selected by the dispatch (e.g., inputs.jobs includes the hermes job or inputs.scenarios indicates hermes), not merely when inputs.scenarios is empty, and then gate the hermes-e2e-vitest job on that corrected hermes_selected value; also update the corresponding validator in workflow-boundary.mts (the validation logic around hermes_selected) to match this new selection rule so jobs-only dispatches no longer trigger the hermes path.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In @.github/workflows/e2e-vitest-scenarios.yaml:
- Around line 120-121: hermes_selected is being forced true whenever
inputs.scenarios is empty, which bypasses the jobs-only selector and lets
hermes-e2e-vitest run incorrectly; update the logic that sets hermes_selected
(the assignment that currently sets hermes_selected=true alongside matrix="$(npx
tsx test/e2e-scenario/scenarios/run.ts ...)") so it only becomes true when
hermes is actually selected by the dispatch (e.g., inputs.jobs includes the
hermes job or inputs.scenarios indicates hermes), not merely when
inputs.scenarios is empty, and then gate the hermes-e2e-vitest job on that
corrected hermes_selected value; also update the corresponding validator in
workflow-boundary.mts (the validation logic around hermes_selected) to match
this new selection rule so jobs-only dispatches no longer trigger the hermes
path.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: f6ff4170-8793-4808-9c5a-7acd5f8f17c6
📒 Files selected for processing (2)
.github/workflows/e2e-vitest-scenarios.yamltools/e2e-scenarios/workflow-boundary.mts
🚧 Files skipped from review as they are similar to previous changes (1)
- tools/e2e-scenarios/workflow-boundary.mts
Vitest E2E Scenario Results — ❌ Some jobs failedRun: 27362619197
|
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/e2e-scenario/support-tests/e2e-scenarios-workflow.test.ts`:
- Around line 37-47: The spawnSync call executing generateStep?.run currently
has no timeout and can hang CI; add a timeout option (e.g., timeout: 60000) to
the spawnSync options and make the test fail fast if the child times out by
checking the returned result (the result variable) for a timeout signal (e.g.,
result.signal === 'SIGTERM' or 'SIGKILL') or non-zero status and throwing or
asserting accordingly; update the invocation that sets
env/GITHUB_OUTPUT/GITHUB_STEP_SUMMARY/JOBS/SCENARIOS so the new timeout key is
included and ensure any timeout is expressed in milliseconds and handled as a
test failure.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 10a84e24-3993-4771-bb7c-5b3edc104bed
📒 Files selected for processing (3)
.github/workflows/e2e-vitest-scenarios.yamltest/e2e-scenario/support-tests/e2e-scenarios-workflow.test.tstools/e2e-scenarios/workflow-boundary.mts
🚧 Files skipped from review as they are similar to previous changes (1)
- .github/workflows/e2e-vitest-scenarios.yaml
Vitest E2E Scenario Results — ❌ Some jobs failedRun: 27364597797
|
Vitest E2E Scenario Results — ❌ Some jobs failedRun: 27365778706
|
Vitest E2E Scenario Results — ❌ Some jobs failedRun: 27368101005
|
Vitest E2E Scenario Results — ❌ Some jobs failedRun: 27370614675
|
1 similar comment
Vitest E2E Scenario Results — ❌ Some jobs failedRun: 27370614675
|
cv
left a comment
There was a problem hiding this comment.
Approved. Normal PR checks are green, review feedback is addressed, unresolved threads are clear, and hermes-e2e-vitest passed. The remaining full scenario fan-out failures appear tied to unstable third-party NVIDIA inference endpoint validation rather than this PR.
|
Superseded by #5256 with the identical final diff and verified signed history. This PR branch contains unsigned historical commit |
|
Closing as superseded by #5256, which carries the same final diff on a branch with verified signed history. |
Pull request was closed
Supersedes #5227 due to an unsigned historical commit that cannot be rewritten under branch rules. This branch has the identical final diff with verified signed history. ## Summary Migrate `test/e2e/test-hermes-e2e.sh` with simple live Vitest coverage. ## Related Issues Refs #5098 ## Contract mapping - Legacy assertion: non-interactive install selects and onboards Hermes via `NEMOCLAW_AGENT=hermes`. - Replacement: `test/e2e-scenario/live/hermes-e2e.test.ts` runs `bash install.sh --non-interactive` with Hermes env. - Boundary preserved: real installer shell, Docker/OpenShell, host PATH/install side effects. - Legacy assertion: Hermes sandbox exists, status works, session records `agent=hermes`, inference provider and policy are configured. - Replacement: live Vitest `nemoclaw list/status`, session JSON, `openshell inference get`, and `openshell policy get --full` assertions. - Boundary preserved: real `nemoclaw` and `openshell` commands. - Legacy assertion: Hermes health, binary, config/state directory, and optional dashboard respond from the sandbox/host. - Replacement: sandbox exec health/version/config probes plus optional dashboard registry/forward/HTTP checks. - Boundary preserved: real sandbox exec, HTTP, OpenShell forwards. - Legacy assertion: live NVIDIA Endpoints and sandbox `inference.local` chat return PONG. - Replacement: direct provider curl and sandbox `curl https://inference.local/v1/chat/completions` assertions. - Boundary preserved: real external provider call and sandbox routing path. - Legacy assertion: CLI logs and agent manifest loading still work. - Replacement: `nemoclaw <sandbox> logs` and `bin/lib/agent-defs` manifest checks. - Boundary preserved: real CLI and built repo module load. ## Simplicity check - Test shape: simple live Vitest test. - New shared helpers: none. - New framework/registry/ledger: **none**. - Workflow changes: adds a selective `hermes-e2e` free-standing Vitest job in `e2e-vitest-scenarios.yaml`; legacy shell script deletion and nightly shell retirement are deferred to #5098 Phase 11. ## Verification - `npm ci --ignore-scripts` - `npm run build:cli` - `npx vitest run --project e2e-vitest-support test/e2e-scenario/support-tests/e2e-scenarios-workflow.test.ts test/e2e-scenario/support-tests/e2e-scenario-matrix.test.ts --silent=false --reporter=default` - `env -u NVIDIA_API_KEY NEMOCLAW_RUN_E2E_SCENARIOS=1 npx vitest run --project e2e-scenarios-live test/e2e-scenario/live/hermes-e2e.test.ts --silent=false --reporter=default` - `npx biome check test/e2e-scenario/live/hermes-e2e.test.ts .github/workflows/e2e-vitest-scenarios.yaml` - `git diff --check` ## Live validation Selective `hermes-e2e` dispatch in E2E / Vitest Scenarios is pending after PR creation. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Tests** * Added comprehensive end-to-end live test scenario for Hermes, including CLI validation, sandbox management, health checks, inference verification, and artifact collection. * **Chores** * Extended CI/CD workflow to support new Hermes E2E test job lane with proper job validation, matrix generation, and PR reporting integration. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
Summary
Migrate
test/e2e/test-hermes-e2e.shwith simple live Vitest coverage.Related Issues
Refs #5098
Contract mapping
NEMOCLAW_AGENT=hermes.test/e2e-scenario/live/hermes-e2e.test.tsrunsbash install.sh --non-interactivewith Hermes env.agent=hermes, inference provider and policy are configured.nemoclaw list/status, session JSON,openshell inference get, andopenshell policy get --fullassertions.nemoclawandopenshellcommands.inference.localchat return PONG.curl https://inference.local/v1/chat/completionsassertions.nemoclaw <sandbox> logsandbin/lib/agent-defsmanifest checks.Simplicity check
hermes-e2efree-standing Vitest job ine2e-vitest-scenarios.yaml; legacy shell script deletion and nightly shell retirement are deferred to Epic: Migrate legacy bash E2E into the Vitest E2E system #5098 Phase 11.Verification
npm ci --ignore-scriptsnpm run build:clinpx vitest run --project e2e-vitest-support test/e2e-scenario/support-tests/e2e-scenarios-workflow.test.ts test/e2e-scenario/support-tests/e2e-scenario-matrix.test.ts --silent=false --reporter=defaultenv -u NVIDIA_API_KEY NEMOCLAW_RUN_E2E_SCENARIOS=1 npx vitest run --project e2e-scenarios-live test/e2e-scenario/live/hermes-e2e.test.ts --silent=false --reporter=defaultnpx biome check test/e2e-scenario/live/hermes-e2e.test.ts .github/workflows/e2e-vitest-scenarios.yamlgit diff --checkLive validation
Selective
hermes-e2edispatch in E2E / Vitest Scenarios is pending after PR creation.Summary by CodeRabbit
Tests
New Features
Chores