Skip to content

perf(onboard): add deadline-based gateway wait#2492

Merged
cv merged 18 commits into
NVIDIA:mainfrom
HOYALIM:issue-2001-latency-waits
Jun 12, 2026
Merged

perf(onboard): add deadline-based gateway wait#2492
cv merged 18 commits into
NVIDIA:mainfrom
HOYALIM:issue-2001-latency-waits

Conversation

@HOYALIM

@HOYALIM HOYALIM commented Apr 26, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds a reusable deadline-based wait primitive and applies it to gateway startup health polling as a scoped first step for #2001.
The rebased implementation keeps the existing health-poll environment knobs compatible and preserves the dual readiness gate: OpenShell metadata must be healthy and the host HTTP endpoint must be serving.

Related Issue

Refs #2001.

Changes

  • Extend src/lib/core/wait.ts with waitUntil(...), waitUntilAsync(...), absolute deadlines, configurable backoff, injectable clock/sleeper hooks, and optional attempt caps.
  • Extract gateway startup health polling into src/lib/onboard/gateway-health-wait.ts, keeping src/lib/onboard.ts net smaller for the growth guardrail.
  • Preserve NEMOCLAW_HEALTH_POLL_COUNT and NEMOCLAW_HEALTH_POLL_INTERVAL behavior by using fixed intervals and maxAttempts; per-probe shell/HTTP runtime is not consumed by a startup deadline.
  • Add deterministic coverage for the wait primitive and direct regression coverage for waitForGatewayHealth(...) call ordering, metadata repair refresh, dual readiness, attempt limits, final-attempt sleep behavior, and zero-attempt behavior.

Type of Change

  • Code change (feature, bug fix, or refactor)
  • Code change with doc updates
  • Doc only (prose changes, no code sample modifications)
  • Doc only (includes code sample changes)

Verification

  • npx prek run --all-files passes via make check
  • npm test passes
  • Tests added or updated for new or changed behavior
  • No secrets, API keys, or credentials committed
  • Docs updated for user-facing behavior changes
  • make docs builds without warnings (doc changes only)
  • Doc pages follow the style guide (doc changes only)
  • New doc pages include SPDX header and frontmatter (new pages only)

Additional validation run locally:

  • npm run build:cli passes
  • npm run typecheck:cli passes
  • npx vitest run src/lib/onboard/gateway-health-wait.test.ts test/wait.test.ts passes
  • make check passes

Notes:

AI Disclosure

  • AI-assisted — tool: OpenAI Codex

Signed-off-by: Ho Lim subhoya@gmail.com

Summary by CodeRabbit

  • Refactor

    • Gateway health polling mechanism during startup refactored from manual loop-based approach to a retry-driven method with configurable timing intervals, exponential backoff strategy, and maximum attempt limits.
  • Tests

    • Significantly expanded test coverage for retry and polling behavior, including deadline handling, exponential backoff behavior, maximum attempt limits, and various edge cases.
  • Chores

    • Updated test environment configuration for token rotation integration tests.

@copy-pr-bot

copy-pr-bot Bot commented Apr 26, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai

coderabbitai Bot commented Apr 26, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: ca593f2e-4391-4a69-b518-ab24d95b9395

📥 Commits

Reviewing files that changed from the base of the PR and between 38350a1 and ff86b84.

📒 Files selected for processing (1)
  • .github/workflows/nightly-e2e.yaml
💤 Files with no reviewable changes (1)
  • .github/workflows/nightly-e2e.yaml

📝 Walkthrough

Walkthrough

Gateway health polling during onboard transitions from a manual for/sleep loop to a waitUntil-driven retry mechanism with consistent millisecond-based timing. Test coverage for the wait utility expands from sleep functions to comprehensive waitUntil scenarios. Discord token fixtures in the CI workflow are simplified.

Changes

Polling refactor and test expansion

Layer / File(s) Summary
Gateway health polling refactor
src/lib/onboard.ts
startGatewayWithOptions replaces the manual polling loop with waitUntil, performing per-attempt bootstrap repair, metadata attachment, re-selection, and health probing (via status, gateway info -g, gateway info) until isGatewayHealthy() returns true. Timing converts to milliseconds with initialIntervalMs/maxIntervalMs matching the converted healthPollInterval, backoffFactor: 1, and maxAttempts: healthPollCount. A high deadlineMs prevents slow commands from truncating attempts. Falls through to throw Error("Gateway failed to start") if health is unreached.
Wait utility test expansion
test/wait.test.ts
Test suite extends from covering sleepMs/sleepSeconds to validating waitUntil behavior, including immediate success, immediate failure on expired deadline, TypeError on non-finite deadline without attempt cap, repeated polling with exact sleep interval verification, deadline-based truncation with final short sleep, exponential backoff sequence accuracy, termination via maxAttempts with and without zero-length intervals, and deadline-constrained unbounded retry progress.
Discord token fixture update
.github/workflows/nightly-e2e.yaml
token-rotation-e2e job environment variables for fake Discord tokens (DISCORD_BOT_TOKEN_A, DISCORD_BOT_TOKEN_B) are changed to simplified strings (discord-a, discord-b).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • NVIDIA/NemoClaw#5119: Both PRs tie to startGatewayWithOptions's health-polling/probing flow: the main PR refactors the health-poll retry loop in src/lib/onboard.ts (now via waitUntil), while the retrieved PR adds/updates tests to assert the Docker-unreachable path exits before any health polling/status/gateway-info probes run.

Suggested reviewers

  • cv
  • ericksoa

Poem

🐰 From loops of sleep and counting tries,
The gateway now retries more wise,
With waitUntil and zero back,
We polish timeouts off the track.
Discord tokens simplified—
A rabbit's test-run satisfied! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Out of Scope Changes check ⚠️ Warning The Discord token environment variable changes in nightly-e2e.yaml appear unrelated to the deadline-based wait implementation or issue #2001 objectives. Clarify whether the Discord token changes are necessary for this PR or should be separated into a distinct pull request focused on test environment configuration.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: introducing a deadline-based gateway wait mechanism to improve onboard performance.
Linked Issues check ✅ Passed The PR implements foundational deadline-based wait primitives that address Phase 2 optimization goals (replace sleeps with event-driven waits) from issue #2001, specifically applied to gateway health polling.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@HOYALIM HOYALIM changed the title [codex] Add deadline-based gateway wait perf(onboard): add deadline-based gateway wait Apr 26, 2026
@HOYALIM HOYALIM marked this pull request as ready for review April 27, 2026 00:02
Copilot AI review requested due to automatic review settings April 27, 2026 00:02
@HOYALIM HOYALIM marked this pull request as draft April 27, 2026 00:06

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a reusable, synchronous, deadline-based polling helper and applies it to the gateway startup health wait loop as part of the onboard latency work (refs #2001), while aiming to preserve existing health poll env knob behavior.

Changes:

  • Added waitUntil(...) to src/lib/wait.ts with deadline, backoff, injectable clock/sleeper hooks, and optional attempt cap.
  • Replaced the gateway startup health polling loop in src/lib/onboard.ts with the new waitUntil(...) helper.
  • Expanded test/wait.test.ts with focused unit tests for waitUntil(...) behavior (immediate success, retries, deadline, backoff, and attempt caps).

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
test/wait.test.ts Adds unit coverage for the new waitUntil(...) helper.
src/lib/wait.ts Introduces the waitUntil(...) synchronous polling primitive and options.
src/lib/onboard.ts Switches gateway health polling to use waitUntil(...) with existing poll count/interval inputs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/lib/onboard.ts Outdated
Comment thread src/lib/wait.ts Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
test/wait.test.ts (1)

42-172: Add a contract test for invalid deadlineMs.

Please add a test that asserts waitUntil throws TypeError for non-finite deadlineMs (e.g., NaN), so the input validation path is locked in.

✅ Suggested test
+  it("waitUntil throws when deadlineMs is non-finite", () => {
+    expect(() =>
+      waitUntil(() => false, {
+        deadlineMs: Number.NaN,
+      }),
+    ).toThrow(TypeError);
+  });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/wait.test.ts` around lines 42 - 172, Add a new test in wait.test.ts
using the existing waitUntil harness that verifies passing a non-finite
deadlineMs (e.g., NaN) causes waitUntil to throw a TypeError: call waitUntil
with a simple predicate and options including deadlineMs: NaN plus the same now
and sleep mocks used in other tests, and assert with expect(() =>
waitUntil(...)).toThrow(TypeError) so the input validation path for deadlineMs
is locked in.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/lib/onboard.ts`:
- Around line 3512-3541: The deadline calculation (healthDeadlineMs) only
accounts for sleep budget and can expire while each waitUntil attempt runs
multiple shell-outs (repairGatewayBootstrapSecrets, runCaptureOpenshell,
isGatewayHealthy), shortening the effective retry window; fix by extending or
removing the deadline: either compute healthDeadlineMs to also include an
allowance for per-attempt probe runtime (e.g., add healthPollCount *
probeTimeoutMs or add a conservative maxProbeRuntimeMs) so the deadline >= total
sleep budget + total probe runtime, or omit deadlineMs from the waitUntil call
to rely solely on maxAttempts/healthPollCount; update the healthDeadlineMs
computation or the waitUntil invocation accordingly (references:
healthPollIntervalMs, healthDeadlineMs, healthPollCount, waitUntil,
repairGatewayBootstrapSecrets, runCaptureOpenshell, isGatewayHealthy).

In `@src/lib/wait.ts`:
- Around line 98-99: The loop can hot-spin when intervalMs (or maxIntervalMs) is
0 and maxAttempts is unbounded; change the sleeper call and interval update to
enforce a minimum non-zero sleep (e.g., MIN_SLEEP_MS = 1) so you never call
sleeper(0). Specifically, in the block using sleeper(Math.min(intervalMs,
deadlineMs - currentMs)) and intervalMs = Math.min(maxIntervalMs, intervalMs *
backoffFactor), clamp the sleep duration with Math.max(MIN_SLEEP_MS, ...) and
ensure intervalMs is also floored to at least MIN_SLEEP_MS after applying
backoff so the loop yields CPU even when initialIntervalMs/maxIntervalMs are
zero (references: sleeper, intervalMs, maxIntervalMs, backoffFactor, deadlineMs,
currentMs).

---

Nitpick comments:
In `@test/wait.test.ts`:
- Around line 42-172: Add a new test in wait.test.ts using the existing
waitUntil harness that verifies passing a non-finite deadlineMs (e.g., NaN)
causes waitUntil to throw a TypeError: call waitUntil with a simple predicate
and options including deadlineMs: NaN plus the same now and sleep mocks used in
other tests, and assert with expect(() => waitUntil(...)).toThrow(TypeError) so
the input validation path for deadlineMs is locked in.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 88a34e7e-6332-4895-82d8-e2cfc9f03551

📥 Commits

Reviewing files that changed from the base of the PR and between 1f615e2 and 2b8bc08.

📒 Files selected for processing (3)
  • src/lib/onboard.ts
  • src/lib/wait.ts
  • test/wait.test.ts

Comment thread src/lib/onboard.ts Outdated
Comment thread src/lib/wait.ts Outdated
@HOYALIM HOYALIM marked this pull request as ready for review April 27, 2026 00:08

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/lib/wait.ts`:
- Around line 94-108: The loop currently calls condition() even after the
deadline guard because the deadline check (using now() and deadlineMs) comes
after condition(); move the deadline/time check so it runs before invoking
condition() (and keep the existing attempts checks), ensuring you check
Number.isFinite(currentMs) and currentMs >= deadlineMs prior to calling
condition() to avoid the extra probe; update the loop in src/lib/wait.ts
(references: attempts, maxAttempts, deadlineMs, now(), condition()) accordingly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: d13eeb32-c6e5-4ffb-bbea-dae420740d03

📥 Commits

Reviewing files that changed from the base of the PR and between 49316b4 and 26b594e.

📒 Files selected for processing (2)
  • src/lib/wait.ts
  • test/wait.test.ts

Comment thread src/lib/wait.ts Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
src/lib/wait.ts (1)

93-109: ⚠️ Potential issue | 🟠 Major

Honor an already-expired deadline before the first probe.

Line 95 only checks the deadline after at least one attempt, so condition() on Line 107 still runs once when deadlineMs <= now() at entry. For callers using this to drive real health probes, that is still one probe past the requested budget. Please move the deadline guard out of the attempts > 0 branch and add a regression test for an already-expired deadline.

Suggested fix
   let attempts = 0;
   for (;;) {
-    if (attempts > 0) {
-      const currentMs = now();
-      if (!Number.isFinite(currentMs) || currentMs >= deadlineMs) {
-        return false;
-      }
+    const currentMs = now();
+    if (!Number.isFinite(currentMs) || currentMs >= deadlineMs) {
+      return false;
     }
 
     if (attempts >= maxAttempts) {
       return false;
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/wait.ts` around lines 93 - 109, The loop currently only checks the
deadline after the first probe (gated by attempts > 0), so move the deadline
guard to run on every iteration before invoking condition(): call const
currentMs = now(); if (!Number.isFinite(currentMs) || currentMs >= deadlineMs)
return false; (remove the attempts > 0 gate) so an already-expired deadline
causes an immediate false; also add a regression test that sets deadlineMs <=
now() and asserts the wait function returns false (and that condition() is not
invoked) to prevent regressions; refer to symbols attempts, now(), deadlineMs,
condition(), and maxAttempts when making changes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@src/lib/wait.ts`:
- Around line 93-109: The loop currently only checks the deadline after the
first probe (gated by attempts > 0), so move the deadline guard to run on every
iteration before invoking condition(): call const currentMs = now(); if
(!Number.isFinite(currentMs) || currentMs >= deadlineMs) return false; (remove
the attempts > 0 gate) so an already-expired deadline causes an immediate false;
also add a regression test that sets deadlineMs <= now() and asserts the wait
function returns false (and that condition() is not invoked) to prevent
regressions; refer to symbols attempts, now(), deadlineMs, condition(), and
maxAttempts when making changes.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: ab7479bd-599b-456e-8318-8c5f61cdf8fa

📥 Commits

Reviewing files that changed from the base of the PR and between 26b594e and 38350a1.

📒 Files selected for processing (2)
  • src/lib/wait.ts
  • test/wait.test.ts

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/lib/wait.ts`:
- Around line 95-116: The code uses currentMs captured before calling
condition(), which can be stale if condition() is slow; update the timestamp
after the condition() call and before computing remainingMs/sleepDurationMs so
the sleep budget respects deadlineMs and avoids sleeping past the deadline; in
the wait function recompute currentMs = now() (or equivalent) after condition()
returns and use that value to derive remainingMs, requestedSleepMs and to clamp
sleeper(...) (referencing currentMs, now(), deadlineMs, condition(),
remainingMs, requestedSleepMs, MIN_UNCAPPED_SLEEP_MS, hasAttemptCap, and
sleeper).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 3b8e8326-56d6-4585-bb6e-fa32ebce51cf

📥 Commits

Reviewing files that changed from the base of the PR and between 38350a1 and 44901e0.

📒 Files selected for processing (2)
  • src/lib/wait.ts
  • test/wait.test.ts

Comment thread src/lib/wait.ts Outdated
@HOYALIM HOYALIM force-pushed the issue-2001-latency-waits branch from bb47782 to 5a709c4 Compare April 27, 2026 04:17
HOYALIM

This comment was marked as resolved.

@wscurran wscurran added dependencies Pull requests that update a dependency file enhancement: performance labels Apr 27, 2026
@wscurran

Copy link
Copy Markdown
Contributor

✨ Thanks for submitting this pull request that proposes a way to improve the performance of NemoClaw's onboard process by introducing a deadline-based wait primitive.


Related open issues:

@wscurran

Copy link
Copy Markdown
Contributor

Sprint 5 planning update: we’re organizing this PR as the foundation PR for #3768, with #2001 remaining the umbrella tracker.

Relationship:

This PR should be tracked for Sprint 5 review, but it should not auto-close #3768 or #2001 when merged. If it lands, #3768 should remain open for the remaining readiness paths called out there.

@wscurran wscurran added chore Build, CI, dependency, or tooling maintenance feature PR adds or expands user-visible functionality area: performance Latency, throughput, resource use, benchmarks, or scaling and removed enhancement: performance labels Jun 3, 2026
@cv

cv commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@cv cv merged commit 7a8fcae into NVIDIA:main Jun 12, 2026
34 checks passed
@cv cv added the v0.0.65 Release target label Jun 13, 2026
@miyoungc miyoungc mentioned this pull request Jun 16, 2026
13 tasks
cv pushed a commit that referenced this pull request Jun 17, 2026
## Summary
Refreshes release-prep documentation for NemoClaw v0.0.65.
Adds the v0.0.65 release-notes section and refreshes generated
`nemoclaw-user-*` skills from the Fern MDX source docs.

## Changes
- Added the v0.0.65 release notes to `docs/about/release-notes.mdx` with
links to the deeper docs pages for lifecycle, troubleshooting,
inference, CLI commands, messaging, credentials, network policy, Hermes,
and sub-agents.
- Regenerated the `nemoclaw-user-*` skills with
`scripts/docs-to-skills.py` so release-prep skill output matches the
merged source docs.
- Used the v0.0.65 announcement discussion as release context:
#5472.

## Source Summary
- #2492 -> `docs/about/release-notes.mdx`: Documents deadline-based
gateway wait reliability in the v0.0.65 recovery summary.
- #4958 -> `docs/about/release-notes.mdx`: Documents re-execed OpenClaw
gateway health check recovery in the sandbox recovery summary.
- #5163 -> `docs/about/release-notes.mdx`: Documents safer uninstall TTY
confirmation behavior in the day-two CLI summary.
- #5178 -> `docs/about/release-notes.mdx`: Documents fail-closed config
restore merge behavior in the rebuild and restore summary.
- #5179 -> `docs/about/release-notes.mdx`: Documents WeChat QR token
redaction in the messaging summary.
- #5182 -> `docs/about/release-notes.mdx`: Documents sustained gateway
serving checks in the recovery summary.
- #5194 -> `docs/about/release-notes.mdx`: Documents model-router
teardown during uninstall in the day-two CLI summary.
- #5195 -> `docs/about/release-notes.mdx`: Documents Shields
auto-restore lock reconfirmation in the rebuild and restore summary.
- #5198 -> `docs/about/release-notes.mdx`: Documents Docker Desktop WSL
CDI injection failure handling in the onboarding diagnostics summary.
- #5201 -> `docs/about/release-notes.mdx`: Documents sandbox
download/upload wrappers and sessions export in the day-two CLI summary.
- #5205 -> `docs/about/release-notes.mdx`: Documents reporter-owned
model metadata preservation in the rebuild and restore summary.
- #5214 -> `docs/about/release-notes.mdx`: Documents managed vLLM model
preflight before side effects in the inference setup summary.
- #5215 -> `docs/about/release-notes.mdx`: Documents managed vLLM extra
serve arguments in the inference setup summary.
- #5216 -> `docs/about/release-notes.mdx`: Documents silent OpenClaw
runtime fallback surfacing in the onboarding diagnostics summary.
- #5225 -> `docs/about/release-notes.mdx`: Documents persisted sandbox
gateway lookup in the gateway recovery summary.
- #5238 -> `docs/about/release-notes.mdx`: Documents sub-agent gateway
dial-back through the sandbox interface in the Hermes and sub-agent
summary.
- #5248 -> `docs/about/release-notes.mdx`: Documents Discord per-account
proxy resolution in the messaging summary.
- #5264 -> `docs/about/release-notes.mdx`: Documents reserved Hermes
port `8642` handling in the Hermes compatibility summary.
- #5267 -> `docs/about/release-notes.mdx`: Documents the narrower Hermes
baseline policy in the Hermes compatibility summary.
- #5321 -> `docs/about/release-notes.mdx`: Documents restored gateway
guard chains in the gateway recovery summary.
- #5328 -> `docs/about/release-notes.mdx`: Documents compact persisted
messaging plans in the messaging summary.
- #5338 -> `docs/about/release-notes.mdx`: Documents manifest channel
migration in the messaging summary.
- #5352 -> `docs/about/release-notes.mdx`: Documents persisted agent
preservation through registry recovery in the rebuild and restore
summary.
- #5371 ->
`.agents/skills/nemoclaw-user-reference/references/commands.md`:
Refreshes generated skill output for custom build cache and
layer-ordering source docs.
- #5379 -> `docs/about/release-notes.mdx`: Documents dashboard port
allocation across multiple NemoClaw gateways in the recovery summary.
- #5382 -> `docs/about/release-notes.mdx`: Documents recovery when an
active gateway has no sandbox spec in the recovery summary.
- #5389 ->
`.agents/skills/nemoclaw-user-reference/references/troubleshooting.md`:
Refreshes generated skill output for declared agent `forward_ports`
recovery source docs.
- #5400 -> `docs/about/release-notes.mdx`: Documents bounded compatible
endpoint probes in the inference setup summary.
- #5410 -> `docs/about/release-notes.mdx`: Documents provider credential
hash removal from sandbox registry entries in the messaging summary.
- #5418 -> `docs/about/release-notes.mdx`: Documents summarized
inference validation failures in the onboarding diagnostics summary.
- #5457 -> `docs/about/release-notes.mdx`: Documents context-window
recomputation after runtime model switches in the inference setup
summary.
- #5463 -> `docs/about/release-notes.mdx`: Documents cleanup of
hard-coded messaging channel stragglers in the messaging summary.

## Skipped
- #5366 matched `docs/.docs-skip` entries through skipped experimental
paths, so this PR does not add new release-note text for that commit.

## Type of Change
- [ ] Code change (feature, bug fix, or refactor)
- [ ] Code change with doc updates
- [ ] Doc only (prose changes, no code sample modifications)
- [x] Doc only (includes code sample changes)

## Verification
- [x] Git hooks passed during commit and push, or `npx prek run
--from-ref main --to-ref HEAD` passes
- [ ] Targeted tests pass for changed behavior
- [ ] Full `npm test` passes (broad runtime changes only)
- [ ] Tests added or updated for new or changed behavior
- [x] No secrets, API keys, or credentials committed
- [x] Docs updated for user-facing behavior changes
- [ ] `npm run docs` builds without warnings (doc changes only)
- [x] Doc pages follow the [style
guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md)
(doc changes only)
- [ ] New doc pages include SPDX header and frontmatter (new pages only)

Verification notes:
- `npm run docs` passed after rerunning outside the sandbox. Fern
reported 0 errors and 1 hidden warning.
- The first sandboxed `npm run docs` attempt failed before validation
because `tsx` could not create its local IPC pipe under sandbox
restrictions.
- `npm run build:cli` passed before push to refresh the local `dist/`
artifacts used by the CLI typecheck hook.
- `npm test` was not run because this is a docs-only release refresh.

---
Signed-off-by: Miyoung Choi <miyoungc@nvidia.com>

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Released NemoClaw v0.0.65 with improved gateway/sandbox recovery,
safer day-two workflows, and enhanced Hermes compatibility.
* Added managed vLLM extra-arguments configuration via
`NEMOCLAW_VLLM_EXTRA_ARGS_JSON`.
* Added Hermes troubleshooting guidance for port forwarding and health
checks.

* **Documentation**
* Updated NVIDIA Endpoints/NIM setup and examples to use
`NVIDIA_INFERENCE_API_KEY`.
* Refined NVIDIA network policy and Model Router API base configuration.
* Expanded CLI/environment variable documentation (including sub-agent
gateway connectivity) and plugin build performance tips.

* **Tests**
  * Expanded Vitest-backed E2E release validation coverage.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: performance Latency, throughput, resource use, benchmarks, or scaling chore Build, CI, dependency, or tooling maintenance dependencies Pull requests that update a dependency file feature PR adds or expands user-visible functionality v0.0.65 Release target

Projects

None yet

Development

Successfully merging this pull request may close these issues.

perf: investigate and reduce networking latency during onboard and validation

4 participants