Skip to content

fix(onboard): announce and recover declared agent forward_ports#5389

Merged
cv merged 3 commits into
mainfrom
fix/hermes-openai-api-port-forward
Jun 13, 2026
Merged

fix(onboard): announce and recover declared agent forward_ports#5389
cv merged 3 commits into
mainfrom
fix/hermes-openai-api-port-forward

Conversation

@laitingsheng

@laitingsheng laitingsheng commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

Summary

Hermes onboard declares forward_ports: [18789, 8642] but the dashboard summary only printed the primary port and process recovery only re-established the primary forward. After the OpenShell gateway restarted during policy-preset apply, the secondary OpenAI-compatible API forward on port 8642 was silently dropped and never restored.

Related Issue

Fixes #5206

Changes

  • printDashboardUi now walks agent.forward_ports and emits a labelled block per non-primary entry.
  • checkAndRecoverSandboxProcesses now invokes a new ensureDeclaredAgentForwardPortsHealthy helper in all three branches.
  • Regression tests cover the print output and the recovery loop.

Type of Change

  • Code change (feature, bug fix, or refactor)
  • Code change with doc updates
  • Doc only (prose changes, no code sample modifications)
  • Doc only (includes code sample changes)

Verification

  • npx prek run --all-files passes
  • npm test passes
  • Tests added or updated for new or changed behavior
  • No secrets, API keys, or credentials committed
  • Docs updated for user-facing behavior changes
  • npm run docs builds without warnings (doc changes only)
  • Doc pages follow the style guide (doc changes only)
  • New doc pages include SPDX header and frontmatter (new pages only)

Signed-off-by: Tinson Lai tinsonl@nvidia.com

Summary by CodeRabbit

  • New Features

    • Sandbox recovery now re-establishes secondary agent-declared port forwards in addition to the primary dashboard forward.
    • Dashboard/onboarding output now shows forwarded OpenAI-compatible API and other secondary endpoint URLs with forwarding notes.
  • Tests

    • Added tests verifying secondary forwards are detected and re-established when missing and that onboarding prints secondary forward URLs correctly.
  • Documentation

    • Quickstart and troubleshooting guides updated to show the forwarded OpenAI-compatible API and how to verify/recover missing forwards.

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
@coderabbitai

coderabbitai Bot commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 4e49cadd-2cdd-4d22-bf72-86b6a5efd6f5

📥 Commits

Reviewing files that changed from the base of the PR and between 4883743 and 6ea01a9.

📒 Files selected for processing (4)
  • docs/reference/troubleshooting.mdx
  • src/lib/agent/onboard.test.ts
  • src/lib/agent/onboard.ts
  • test/process-recovery.test.ts
✅ Files skipped from review due to trivial changes (1)
  • docs/reference/troubleshooting.mdx
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/lib/agent/onboard.ts
  • test/process-recovery.test.ts

📝 Walkthrough

Walkthrough

Recovery now re-establishes every non-primary manifest-declared agent forward port during gateway-related sandbox recovery. Onboard output prints all non-primary declared ports (labeling API ports specially) with per-port forwarding URLs and /v1 API path where applicable.

Changes

Agent Forward Port Recovery and Announcement

Layer / File(s) Summary
Forward port recovery helper and integration
src/lib/actions/sandbox/process-recovery.ts, test/process-recovery.test.ts
Adds ensureDeclaredAgentForwardPortsHealthy to probe and restart non-primary agent-declared forward_ports. Invoked across gateway-alive (dashboard-missing), gateway-alive (non-occupied), and gateway-restart paths; failures are logged and aggregated into the function's forwardRecovered result. Includes test verifying secondary forward start and skipping primary.
Onboard UI forward port announcement
src/lib/agent/onboard.ts, src/lib/agent/onboard.test.ts, docs/get-started/quickstart-hermes.mdx, docs/reference/troubleshooting.mdx
Adds printAdditionalForwardPorts to validate and display non-primary agent.forward_ports. Ports matching agent.healthProbe.port are labeled "OpenAI-compatible API" and rendered with /v1; URLs are normalized, deduplicated, and integrated into all dashboard announcement branches. Tests and docs updated.

Sequence Diagram(s)

sequenceDiagram
  participant checkAndRecoverSandboxProcesses
  participant ensureDeclaredAgentForwardPortsHealthy
  participant forwardHealthProbe
  participant OpenShell
  participant result

  checkAndRecoverSandboxProcesses->>ensureDeclaredAgentForwardPortsHealthy: inspect active agent forward_ports
  ensureDeclaredAgentForwardPortsHealthy->>forwardHealthProbe: probe each non-primary port
  alt port missing or unhealthy
    ensureDeclaredAgentForwardPortsHealthy->>OpenShell: forward start for port
    OpenShell-->>ensureDeclaredAgentForwardPortsHealthy: forward started
  end
  ensureDeclaredAgentForwardPortsHealthy-->>checkAndRecoverSandboxProcesses: return true/false/null
  checkAndRecoverSandboxProcesses->>result: include declaredForwardsRecovered in forwardRecovered
Loading
sequenceDiagram
  participant printDashboardUi
  participant printAdditionalForwardPorts
  participant buildControlUiUrls
  participant output

  printDashboardUi->>printAdditionalForwardPorts: validate agent.forward_ports
  printAdditionalForwardPorts->>buildControlUiUrls: generate per-port control UI URLs
  buildControlUiUrls-->>printAdditionalForwardPorts: dashboard link(s)
  alt port == agent.healthProbe.port
    printAdditionalForwardPorts->>output: print "OpenAI-compatible API" block with URLs (/v1)
  else
    printAdditionalForwardPorts->>output: print "additional port" block with URLs
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested labels

area: sandbox, v0.0.65

Suggested reviewers

  • cv
  • sandl99

Poem

🐰 A rabbit hops through ports both new and old,
Recovery nudges forwards back into hold,
Eight-six-four-two now peeks through the door,
Announced with a path and a little bit more,
Hop—connect, call /v1, and explore!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 71.43% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main changes: announcing and recovering declared agent forward_ports in the onboard flow.
Linked Issues check ✅ Passed The PR fully addresses issue #5206 by implementing port 8642 announcement in dashboard output and recovery logic for declared forward ports.
Out of Scope Changes check ✅ Passed All changes are directly scoped to addressing issue #5206: process recovery, onboard UI rendering, documentation, and test coverage for declared forward ports.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/hermes-openai-api-port-forward

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-code-quality

github-code-quality Bot commented Jun 13, 2026

Copy link
Copy Markdown

Code Coverage Overview

Languages: TypeScript

TypeScript / code-coverage/plugin

The overall coverage in the branch is 96%. Coverage data for the branch is not yet available.

Show a code coverage summary of the most covered files.
File 6ea01a9 +/-
nemoclaw/src/se...cret-scanner.ts 100%
nemoclaw/src/commands/slash.ts 100%
nemoclaw/src/li...bprocess-env.ts 100%
nemoclaw/src/bl...eprint/state.ts 98%
nemoclaw/src/onboard/config.ts 98%
nemoclaw/src/bl...int/snapshot.ts 97%
nemoclaw/src/bl...print/runner.ts 95%
nemoclaw/src/co...ration-state.ts 94%
nemoclaw/src/bl...ate-networks.ts 94%
nemoclaw/src/index.ts 94%

TypeScript / code-coverage/cli

The overall coverage in the branch is 44%. Coverage data for the branch is not yet available.

Show a code coverage summary of the most covered files.
File 6ea01a9 +/-
src/lib/state/o...oard-session.ts 90%
src/lib/inference/local.ts 77%
src/lib/sandbox/config.ts 72%
src/lib/inference/nim.ts 72%
src/lib/onboard/preflight.ts 64%
src/lib/state/sandbox.ts 55%
src/lib/onboard...er-gpu-patch.ts 50%
src/lib/actions...licy-channel.ts 49%
src/lib/policy/index.ts 48%
src/lib/onboard.ts 17%

Updated June 13, 2026 14:41 UTC
Code Coverage is in Public Preview. Learn more and provide us with your feedback.

@laitingsheng laitingsheng added integration: hermes Hermes integration behavior area: onboarding Onboarding FSM, provider setup, sandbox launch, or first-run flow bug-fix PR fixes a bug or regression labels Jun 13, 2026
@github-actions

github-actions Bot commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

PR Review Advisor

Findings: 0 needs attention, 1 worth checking, 0 nice ideas
Top item: PR review advisor unavailable

Review findings

🛠️ Needs attention

  • None.

🔎 Worth checking

  • PR review advisor unavailable: The automated advisor could not complete: PR review advisor SDK provider error: orient-drift: 403 <html> <head><title>403 Forbidden</title></head> <body> <center><h1>403 Forbidden</h1></center> </body> </html>; security: 403 <html> <head><title>403 Forbidden</title></head> <body> <center><h1>403 Forbidden</h1></center> </body> </html>; acceptance-correctness-tests: 403 <html> <head><title>403 Forbidden</title></head> <body> <center><h1>403 Forbidden</h1></center> </body> </html>; synthesize-json: 403 <html> <head><title>403 Forbidden</title></head> <body> <center><h1>403 Forbidden</h1></center> </body> </html>
    • Recommendation: Re-run the PR Review Advisor or perform a manual review.
    • Evidence: PR review advisor SDK provider error: orient-drift: 403 <html> <head><title>403 Forbidden</title></head> <body> <center><h1>403 Forbidden</h1></center> </body> </html>; security: 403 <html> <head><title>403 Forbidden</title></head> <body> <center><h1>403 Forbidden</h1></center> </body> </html>; acceptance-correctness-tests: 403 <html> <head><title>403 Forbidden</title></head> <body> <center><h1>403 Forbidden</h1></center> </body> </html>; synthesize-json: 403 <html> <head><title>403 Forbidden</title></head> <body> <center><h1>403 Forbidden</h1></center> </body> </html>

🌱 Nice ideas

  • None.
Consider writing more tests for
  • **Runtime validation** — Add or identify targeted runtime/integration validation for the changed behavior; do not report external E2E job pass/fail here.. Runtime/sandbox/infrastructure paths need behavioral runtime validation: docs/get-started/quickstart-hermes.mdx, docs/reference/troubleshooting.mdx, src/lib/actions/sandbox/process-recovery.ts, src/lib/agent/onboard.ts.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/lib/actions/sandbox/process-recovery.ts`:
- Around line 434-476: The loop in ensureDeclaredAgentForwardPortsHealthy
currently only skips primaryPort but must also skip the optional Hermes web
dashboard port so it isn't redundantly managed; retrieve the Hermes dashboard
port using the same helper/logic used elsewhere for Hermes (the code path that
ensures Hermes' dashboard port, referenced by
ensureHermesDashboardPortForwardIfEnabled) for the given sandboxName (or call
the existing helper that returns that port) and add a check in the for loop to
continue when candidate === hermesDashboardPort; keep the other validations and
return behavior unchanged (use agentRuntime.getSessionAgent,
isSandboxPortForwardHealthy, and ensureSandboxPortForwardForPort as before).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: a1496e13-5b2d-4780-aec5-0d0471e68f51

📥 Commits

Reviewing files that changed from the base of the PR and between 158f575 and bfb41f7.

📒 Files selected for processing (4)
  • src/lib/actions/sandbox/process-recovery.ts
  • src/lib/agent/onboard.test.ts
  • src/lib/agent/onboard.ts
  • test/process-recovery.test.ts

Comment thread src/lib/actions/sandbox/process-recovery.ts
@github-actions

github-actions Bot commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

E2E Advisor Recommendation

Required E2E: hermes-e2e-vitest, sandbox-survival-vitest
Optional E2E: gateway-guard-recovery, rebuild-hermes-vitest

Dispatch hint: hermes-e2e-vitest,sandbox-survival-vitest

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

  • hermes-e2e-vitest (medium): Validates the Hermes live user flow affected by this PR: nemohermes onboarding, dashboard/API URL reporting, OpenShell forward list entries for Hermes ports including 8642, and host-side API health access.
  • sandbox-survival-vitest (medium): Exercises live sandbox restart/survival and gateway recovery behavior through install/onboard and recovery paths. The PR modifies the generic process recovery function used by sandbox lifecycle commands, so this is merge-blocking coverage for regressions outside Hermes.

Optional E2E

  • gateway-guard-recovery (medium): Adjacent confidence for recovery internals: it runs a live gateway recovery scenario through the same connect/probe recovery path, but focuses on guard-chain restoration rather than manifest-declared secondary forwards.
  • rebuild-hermes-vitest (medium): Useful adjacent Hermes lifecycle coverage if maintainers want extra confidence that Hermes port and dashboard/API state still survive rebuild-related lifecycle flows, but the PR primarily changes recovery and onboarding summary behavior.

New E2E recommendations

  • Hermes secondary forward recovery (high): The existing Hermes E2E appears to check that port 8642 is initially forwarded and healthy, but the PR's core behavior is recovery of manifest-declared non-primary forward_ports after they go missing while the primary dashboard forward remains healthy.
    • Suggested test: Add a live Hermes recovery scenario that onboards Hermes, stops/removes only the OpenShell forward for 8642, runs nemohermes <sandbox> connect --probe-only or nemohermes <sandbox> recover, then asserts openshell forward list shows 8642 for that sandbox and curl -sf http://127.0.0.1:8642/health succeeds without restarting or stealing the primary dashboard forward.

Dispatch hint

  • Workflow: E2E / Vitest Scenarios
  • jobs input: hermes-e2e-vitest,sandbox-survival-vitest

@github-actions

github-actions Bot commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

Vitest E2E Scenario Recommendation

Required Vitest E2E scenarios: None
Optional Vitest E2E scenarios: None

Workflow run

Full Vitest E2E advisor summary

Vitest E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required Vitest E2E scenarios

  • None. Advisor reported no Vitest E2E scenario impact.

Optional Vitest E2E scenarios

  • None.

Relevant changed files

  • src/lib/actions/sandbox/process-recovery.ts
  • src/lib/agent/onboard.ts

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
@github-actions

Copy link
Copy Markdown
Contributor

…overage

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
@laitingsheng laitingsheng added the v0.0.65 Release target label Jun 13, 2026
@cv cv merged commit 720bee9 into main Jun 13, 2026
46 checks passed
@cv cv deleted the fix/hermes-openai-api-port-forward branch June 13, 2026 17:33
@miyoungc miyoungc mentioned this pull request Jun 16, 2026
13 tasks
cv pushed a commit that referenced this pull request Jun 17, 2026
## Summary
Refreshes release-prep documentation for NemoClaw v0.0.65.
Adds the v0.0.65 release-notes section and refreshes generated
`nemoclaw-user-*` skills from the Fern MDX source docs.

## Changes
- Added the v0.0.65 release notes to `docs/about/release-notes.mdx` with
links to the deeper docs pages for lifecycle, troubleshooting,
inference, CLI commands, messaging, credentials, network policy, Hermes,
and sub-agents.
- Regenerated the `nemoclaw-user-*` skills with
`scripts/docs-to-skills.py` so release-prep skill output matches the
merged source docs.
- Used the v0.0.65 announcement discussion as release context:
#5472.

## Source Summary
- #2492 -> `docs/about/release-notes.mdx`: Documents deadline-based
gateway wait reliability in the v0.0.65 recovery summary.
- #4958 -> `docs/about/release-notes.mdx`: Documents re-execed OpenClaw
gateway health check recovery in the sandbox recovery summary.
- #5163 -> `docs/about/release-notes.mdx`: Documents safer uninstall TTY
confirmation behavior in the day-two CLI summary.
- #5178 -> `docs/about/release-notes.mdx`: Documents fail-closed config
restore merge behavior in the rebuild and restore summary.
- #5179 -> `docs/about/release-notes.mdx`: Documents WeChat QR token
redaction in the messaging summary.
- #5182 -> `docs/about/release-notes.mdx`: Documents sustained gateway
serving checks in the recovery summary.
- #5194 -> `docs/about/release-notes.mdx`: Documents model-router
teardown during uninstall in the day-two CLI summary.
- #5195 -> `docs/about/release-notes.mdx`: Documents Shields
auto-restore lock reconfirmation in the rebuild and restore summary.
- #5198 -> `docs/about/release-notes.mdx`: Documents Docker Desktop WSL
CDI injection failure handling in the onboarding diagnostics summary.
- #5201 -> `docs/about/release-notes.mdx`: Documents sandbox
download/upload wrappers and sessions export in the day-two CLI summary.
- #5205 -> `docs/about/release-notes.mdx`: Documents reporter-owned
model metadata preservation in the rebuild and restore summary.
- #5214 -> `docs/about/release-notes.mdx`: Documents managed vLLM model
preflight before side effects in the inference setup summary.
- #5215 -> `docs/about/release-notes.mdx`: Documents managed vLLM extra
serve arguments in the inference setup summary.
- #5216 -> `docs/about/release-notes.mdx`: Documents silent OpenClaw
runtime fallback surfacing in the onboarding diagnostics summary.
- #5225 -> `docs/about/release-notes.mdx`: Documents persisted sandbox
gateway lookup in the gateway recovery summary.
- #5238 -> `docs/about/release-notes.mdx`: Documents sub-agent gateway
dial-back through the sandbox interface in the Hermes and sub-agent
summary.
- #5248 -> `docs/about/release-notes.mdx`: Documents Discord per-account
proxy resolution in the messaging summary.
- #5264 -> `docs/about/release-notes.mdx`: Documents reserved Hermes
port `8642` handling in the Hermes compatibility summary.
- #5267 -> `docs/about/release-notes.mdx`: Documents the narrower Hermes
baseline policy in the Hermes compatibility summary.
- #5321 -> `docs/about/release-notes.mdx`: Documents restored gateway
guard chains in the gateway recovery summary.
- #5328 -> `docs/about/release-notes.mdx`: Documents compact persisted
messaging plans in the messaging summary.
- #5338 -> `docs/about/release-notes.mdx`: Documents manifest channel
migration in the messaging summary.
- #5352 -> `docs/about/release-notes.mdx`: Documents persisted agent
preservation through registry recovery in the rebuild and restore
summary.
- #5371 ->
`.agents/skills/nemoclaw-user-reference/references/commands.md`:
Refreshes generated skill output for custom build cache and
layer-ordering source docs.
- #5379 -> `docs/about/release-notes.mdx`: Documents dashboard port
allocation across multiple NemoClaw gateways in the recovery summary.
- #5382 -> `docs/about/release-notes.mdx`: Documents recovery when an
active gateway has no sandbox spec in the recovery summary.
- #5389 ->
`.agents/skills/nemoclaw-user-reference/references/troubleshooting.md`:
Refreshes generated skill output for declared agent `forward_ports`
recovery source docs.
- #5400 -> `docs/about/release-notes.mdx`: Documents bounded compatible
endpoint probes in the inference setup summary.
- #5410 -> `docs/about/release-notes.mdx`: Documents provider credential
hash removal from sandbox registry entries in the messaging summary.
- #5418 -> `docs/about/release-notes.mdx`: Documents summarized
inference validation failures in the onboarding diagnostics summary.
- #5457 -> `docs/about/release-notes.mdx`: Documents context-window
recomputation after runtime model switches in the inference setup
summary.
- #5463 -> `docs/about/release-notes.mdx`: Documents cleanup of
hard-coded messaging channel stragglers in the messaging summary.

## Skipped
- #5366 matched `docs/.docs-skip` entries through skipped experimental
paths, so this PR does not add new release-note text for that commit.

## Type of Change
- [ ] Code change (feature, bug fix, or refactor)
- [ ] Code change with doc updates
- [ ] Doc only (prose changes, no code sample modifications)
- [x] Doc only (includes code sample changes)

## Verification
- [x] Git hooks passed during commit and push, or `npx prek run
--from-ref main --to-ref HEAD` passes
- [ ] Targeted tests pass for changed behavior
- [ ] Full `npm test` passes (broad runtime changes only)
- [ ] Tests added or updated for new or changed behavior
- [x] No secrets, API keys, or credentials committed
- [x] Docs updated for user-facing behavior changes
- [ ] `npm run docs` builds without warnings (doc changes only)
- [x] Doc pages follow the [style
guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md)
(doc changes only)
- [ ] New doc pages include SPDX header and frontmatter (new pages only)

Verification notes:
- `npm run docs` passed after rerunning outside the sandbox. Fern
reported 0 errors and 1 hidden warning.
- The first sandboxed `npm run docs` attempt failed before validation
because `tsx` could not create its local IPC pipe under sandbox
restrictions.
- `npm run build:cli` passed before push to refresh the local `dist/`
artifacts used by the CLI typecheck hook.
- `npm test` was not run because this is a docs-only release refresh.

---
Signed-off-by: Miyoung Choi <miyoungc@nvidia.com>

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Released NemoClaw v0.0.65 with improved gateway/sandbox recovery,
safer day-two workflows, and enhanced Hermes compatibility.
* Added managed vLLM extra-arguments configuration via
`NEMOCLAW_VLLM_EXTRA_ARGS_JSON`.
* Added Hermes troubleshooting guidance for port forwarding and health
checks.

* **Documentation**
* Updated NVIDIA Endpoints/NIM setup and examples to use
`NVIDIA_INFERENCE_API_KEY`.
* Refined NVIDIA network policy and Model Router API base configuration.
* Expanded CLI/environment variable documentation (including sub-agent
gateway connectivity) and plugin build performance tips.

* **Tests**
  * Expanded Vitest-backed E2E release validation coverage.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: onboarding Onboarding FSM, provider setup, sandbox launch, or first-run flow bug-fix PR fixes a bug or regression integration: hermes Hermes integration behavior v0.0.65 Release target

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Ubuntu 24.04][Onboard] nemohermes onboard does not forward OpenAI-compatible API on port 8642

2 participants