perf(onboard): add deadline-based gateway wait by HOYALIM · Pull Request #2492 · NVIDIA/NemoClaw

HOYALIM · 2026-04-26T23:47:43Z

Summary

Adds a reusable deadline-based wait primitive and applies it to gateway startup health polling as a scoped first step for #2001.
The rebased implementation keeps the existing health-poll environment knobs compatible and preserves the dual readiness gate: OpenShell metadata must be healthy and the host HTTP endpoint must be serving.

Related Issue

Refs #2001.

Changes

Extend src/lib/core/wait.ts with waitUntil(...), waitUntilAsync(...), absolute deadlines, configurable backoff, injectable clock/sleeper hooks, and optional attempt caps.
Extract gateway startup health polling into src/lib/onboard/gateway-health-wait.ts, keeping src/lib/onboard.ts net smaller for the growth guardrail.
Preserve NEMOCLAW_HEALTH_POLL_COUNT and NEMOCLAW_HEALTH_POLL_INTERVAL behavior by using fixed intervals and maxAttempts; per-probe shell/HTTP runtime is not consumed by a startup deadline.
Add deterministic coverage for the wait primitive and direct regression coverage for waitForGatewayHealth(...) call ordering, metadata repair refresh, dual readiness, attempt limits, final-attempt sleep behavior, and zero-attempt behavior.

Type of Change

Code change (feature, bug fix, or refactor)
Code change with doc updates
Doc only (prose changes, no code sample modifications)
Doc only (includes code sample changes)

Verification

npx prek run --all-files passes via make check
npm test passes
Tests added or updated for new or changed behavior
No secrets, API keys, or credentials committed
Docs updated for user-facing behavior changes
make docs builds without warnings (doc changes only)
Doc pages follow the style guide (doc changes only)
New doc pages include SPDX header and frontmatter (new pages only)

Additional validation run locally:

npm run build:cli passes
npm run typecheck:cli passes
npx vitest run src/lib/onboard/gateway-health-wait.test.ts test/wait.test.ts passes
make check passes

Notes:

This intentionally avoids the larger perf: investigate and reduce networking latency during onboard and validation #2001 follow-up areas: adaptive network calibration, curl connection reuse, and onboard orchestration parallelism.
This should not auto-close perf: investigate and reduce networking latency during onboard and validation #2001 or perf(onboard): replace remaining fixed readiness polls with deadline-based waits #3768; it remains a foundation step for the broader onboard latency work.

AI Disclosure

AI-assisted — tool: OpenAI Codex

Signed-off-by: Ho Lim subhoya@gmail.com

Summary by CodeRabbit

Refactor
- Gateway health polling mechanism during startup refactored from manual loop-based approach to a retry-driven method with configurable timing intervals, exponential backoff strategy, and maximum attempt limits.
Tests
- Significantly expanded test coverage for retry and polling behavior, including deadline handling, exponential backoff behavior, maximum attempt limits, and various edge cases.
Chores
- Updated test environment configuration for token rotation integration tests.

copy-pr-bot · 2026-04-26T23:47:47Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-04-26T23:47:50Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: ca593f2e-4391-4a69-b518-ab24d95b9395

📥 Commits

Reviewing files that changed from the base of the PR and between 38350a1 and ff86b84.

📒 Files selected for processing (1)

.github/workflows/nightly-e2e.yaml

💤 Files with no reviewable changes (1)

.github/workflows/nightly-e2e.yaml

📝 Walkthrough

Walkthrough

Gateway health polling during onboard transitions from a manual for/sleep loop to a waitUntil-driven retry mechanism with consistent millisecond-based timing. Test coverage for the wait utility expands from sleep functions to comprehensive waitUntil scenarios. Discord token fixtures in the CI workflow are simplified.

Changes

Polling refactor and test expansion

Layer / File(s)	Summary
Gateway health polling refactor `src/lib/onboard.ts`	`startGatewayWithOptions` replaces the manual polling loop with `waitUntil`, performing per-attempt bootstrap repair, metadata attachment, re-selection, and health probing (via `status`, `gateway info -g`, `gateway info`) until `isGatewayHealthy()` returns true. Timing converts to milliseconds with `initialIntervalMs`/`maxIntervalMs` matching the converted `healthPollInterval`, `backoffFactor: 1`, and `maxAttempts: healthPollCount`. A high `deadlineMs` prevents slow commands from truncating attempts. Falls through to throw `Error("Gateway failed to start")` if health is unreached.
Wait utility test expansion `test/wait.test.ts`	Test suite extends from covering `sleepMs`/`sleepSeconds` to validating `waitUntil` behavior, including immediate success, immediate failure on expired deadline, `TypeError` on non-finite deadline without attempt cap, repeated polling with exact sleep interval verification, deadline-based truncation with final short sleep, exponential backoff sequence accuracy, termination via `maxAttempts` with and without zero-length intervals, and deadline-constrained unbounded retry progress.
Discord token fixture update `.github/workflows/nightly-e2e.yaml`	`token-rotation-e2e` job environment variables for fake Discord tokens (`DISCORD_BOT_TOKEN_A`, `DISCORD_BOT_TOKEN_B`) are changed to simplified strings (`discord-a`, `discord-b`).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

NVIDIA/NemoClaw#5119: Both PRs tie to startGatewayWithOptions's health-polling/probing flow: the main PR refactors the health-poll retry loop in src/lib/onboard.ts (now via waitUntil), while the retrieved PR adds/updates tests to assert the Docker-unreachable path exits before any health polling/status/gateway-info probes run.

Suggested reviewers

cv
ericksoa

Poem

🐰 From loops of sleep and counting tries,
The gateway now retries more wise,
With waitUntil and zero back,
We polish timeouts off the track.
Discord tokens simplified—
A rabbit's test-run satisfied! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Out of Scope Changes check	⚠️ Warning	The Discord token environment variable changes in nightly-e2e.yaml appear unrelated to the deadline-based wait implementation or issue `#2001` objectives.	Clarify whether the Discord token changes are necessary for this PR or should be separated into a distinct pull request focused on test environment configuration.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main change: introducing a deadline-based gateway wait mechanism to improve onboard performance.
Linked Issues check	✅ Passed	The PR implements foundational deadline-based wait primitives that address Phase 2 optimization goals (replace sleeps with event-driven waits) from issue `#2001`, specifically applied to gateway health polling.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Copilot

Pull request overview

This PR introduces a reusable, synchronous, deadline-based polling helper and applies it to the gateway startup health wait loop as part of the onboard latency work (refs #2001), while aiming to preserve existing health poll env knob behavior.

Changes:

Added waitUntil(...) to src/lib/wait.ts with deadline, backoff, injectable clock/sleeper hooks, and optional attempt cap.
Replaced the gateway startup health polling loop in src/lib/onboard.ts with the new waitUntil(...) helper.
Expanded test/wait.test.ts with focused unit tests for waitUntil(...) behavior (immediate success, retries, deadline, backoff, and attempt caps).

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File	Description
test/wait.test.ts	Adds unit coverage for the new `waitUntil(...)` helper.
src/lib/wait.ts	Introduces the `waitUntil(...)` synchronous polling primitive and options.
src/lib/onboard.ts	Switches gateway health polling to use `waitUntil(...)` with existing poll count/interval inputs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

test/wait.test.ts (1)

42-172: Add a contract test for invalid deadlineMs.

Please add a test that asserts waitUntil throws TypeError for non-finite deadlineMs (e.g., NaN), so the input validation path is locked in.

✅ Suggested test

+  it("waitUntil throws when deadlineMs is non-finite", () => {
+    expect(() =>
+      waitUntil(() => false, {
+        deadlineMs: Number.NaN,
+      }),
+    ).toThrow(TypeError);
+  });

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@test/wait.test.ts` around lines 42 - 172, Add a new test in wait.test.ts
using the existing waitUntil harness that verifies passing a non-finite
deadlineMs (e.g., NaN) causes waitUntil to throw a TypeError: call waitUntil
with a simple predicate and options including deadlineMs: NaN plus the same now
and sleep mocks used in other tests, and assert with expect(() =>
waitUntil(...)).toThrow(TypeError) so the input validation path for deadlineMs
is locked in.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/lib/onboard.ts`:
- Around line 3512-3541: The deadline calculation (healthDeadlineMs) only
accounts for sleep budget and can expire while each waitUntil attempt runs
multiple shell-outs (repairGatewayBootstrapSecrets, runCaptureOpenshell,
isGatewayHealthy), shortening the effective retry window; fix by extending or
removing the deadline: either compute healthDeadlineMs to also include an
allowance for per-attempt probe runtime (e.g., add healthPollCount *
probeTimeoutMs or add a conservative maxProbeRuntimeMs) so the deadline >= total
sleep budget + total probe runtime, or omit deadlineMs from the waitUntil call
to rely solely on maxAttempts/healthPollCount; update the healthDeadlineMs
computation or the waitUntil invocation accordingly (references:
healthPollIntervalMs, healthDeadlineMs, healthPollCount, waitUntil,
repairGatewayBootstrapSecrets, runCaptureOpenshell, isGatewayHealthy).

In `@src/lib/wait.ts`:
- Around line 98-99: The loop can hot-spin when intervalMs (or maxIntervalMs) is
0 and maxAttempts is unbounded; change the sleeper call and interval update to
enforce a minimum non-zero sleep (e.g., MIN_SLEEP_MS = 1) so you never call
sleeper(0). Specifically, in the block using sleeper(Math.min(intervalMs,
deadlineMs - currentMs)) and intervalMs = Math.min(maxIntervalMs, intervalMs *
backoffFactor), clamp the sleep duration with Math.max(MIN_SLEEP_MS, ...) and
ensure intervalMs is also floored to at least MIN_SLEEP_MS after applying
backoff so the loop yields CPU even when initialIntervalMs/maxIntervalMs are
zero (references: sleeper, intervalMs, maxIntervalMs, backoffFactor, deadlineMs,
currentMs).

---

Nitpick comments:
In `@test/wait.test.ts`:
- Around line 42-172: Add a new test in wait.test.ts using the existing
waitUntil harness that verifies passing a non-finite deadlineMs (e.g., NaN)
causes waitUntil to throw a TypeError: call waitUntil with a simple predicate
and options including deadlineMs: NaN plus the same now and sleep mocks used in
other tests, and assert with expect(() => waitUntil(...)).toThrow(TypeError) so
the input validation path for deadlineMs is locked in.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 88a34e7e-6332-4895-82d8-e2cfc9f03551

📥 Commits

Reviewing files that changed from the base of the PR and between 1f615e2 and 2b8bc08.

📒 Files selected for processing (3)

src/lib/onboard.ts
src/lib/wait.ts
test/wait.test.ts

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/lib/wait.ts`:
- Around line 94-108: The loop currently calls condition() even after the
deadline guard because the deadline check (using now() and deadlineMs) comes
after condition(); move the deadline/time check so it runs before invoking
condition() (and keep the existing attempts checks), ensuring you check
Number.isFinite(currentMs) and currentMs >= deadlineMs prior to calling
condition() to avoid the extra probe; update the loop in src/lib/wait.ts
(references: attempts, maxAttempts, deadlineMs, now(), condition()) accordingly.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: d13eeb32-c6e5-4ffb-bbea-dae420740d03

📥 Commits

Reviewing files that changed from the base of the PR and between 49316b4 and 26b594e.

📒 Files selected for processing (2)

src/lib/wait.ts
test/wait.test.ts

coderabbitai

♻️ Duplicate comments (1)

src/lib/wait.ts (1)

93-109: ⚠️ Potential issue | 🟠 Major

Honor an already-expired deadline before the first probe.

Line 95 only checks the deadline after at least one attempt, so condition() on Line 107 still runs once when deadlineMs <= now() at entry. For callers using this to drive real health probes, that is still one probe past the requested budget. Please move the deadline guard out of the attempts > 0 branch and add a regression test for an already-expired deadline.

Suggested fix

   let attempts = 0;
   for (;;) {
-    if (attempts > 0) {
-      const currentMs = now();
-      if (!Number.isFinite(currentMs) || currentMs >= deadlineMs) {
-        return false;
-      }
+    const currentMs = now();
+    if (!Number.isFinite(currentMs) || currentMs >= deadlineMs) {
+      return false;
     }
 
     if (attempts >= maxAttempts) {
       return false;
     }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/lib/wait.ts` around lines 93 - 109, The loop currently only checks the
deadline after the first probe (gated by attempts > 0), so move the deadline
guard to run on every iteration before invoking condition(): call const
currentMs = now(); if (!Number.isFinite(currentMs) || currentMs >= deadlineMs)
return false; (remove the attempts > 0 gate) so an already-expired deadline
causes an immediate false; also add a regression test that sets deadlineMs <=
now() and asserts the wait function returns false (and that condition() is not
invoked) to prevent regressions; refer to symbols attempts, now(), deadlineMs,
condition(), and maxAttempts when making changes.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@src/lib/wait.ts`:
- Around line 93-109: The loop currently only checks the deadline after the
first probe (gated by attempts > 0), so move the deadline guard to run on every
iteration before invoking condition(): call const currentMs = now(); if
(!Number.isFinite(currentMs) || currentMs >= deadlineMs) return false; (remove
the attempts > 0 gate) so an already-expired deadline causes an immediate false;
also add a regression test that sets deadlineMs <= now() and asserts the wait
function returns false (and that condition() is not invoked) to prevent
regressions; refer to symbols attempts, now(), deadlineMs, condition(), and
maxAttempts when making changes.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: ab7479bd-599b-456e-8318-8c5f61cdf8fa

📥 Commits

Reviewing files that changed from the base of the PR and between 26b594e and 38350a1.

📒 Files selected for processing (2)

src/lib/wait.ts
test/wait.test.ts

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/lib/wait.ts`:
- Around line 95-116: The code uses currentMs captured before calling
condition(), which can be stale if condition() is slow; update the timestamp
after the condition() call and before computing remainingMs/sleepDurationMs so
the sleep budget respects deadlineMs and avoids sleeping past the deadline; in
the wait function recompute currentMs = now() (or equivalent) after condition()
returns and use that value to derive remainingMs, requestedSleepMs and to clamp
sleeper(...) (referencing currentMs, now(), deadlineMs, condition(),
remainingMs, requestedSleepMs, MIN_UNCAPPED_SLEEP_MS, hasAttemptCap, and
sleeper).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 3b8e8326-56d6-4585-bb6e-fa32ebce51cf

📥 Commits

Reviewing files that changed from the base of the PR and between 38350a1 and 44901e0.

📒 Files selected for processing (2)

src/lib/wait.ts
test/wait.test.ts

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

wscurran · 2026-04-27T15:08:42Z

✨ Thanks for submitting this pull request that proposes a way to improve the performance of NemoClaw's onboard process by introducing a deadline-based wait primitive.

Related open issues:

#2001 perf: investigate and reduce networking latency during onboard and validation

wscurran · 2026-05-19T01:53:53Z

Sprint 5 planning update: we’re organizing this PR as the foundation PR for #3768, with #2001 remaining the umbrella tracker.

Relationship:

perf: investigate and reduce networking latency during onboard and validation #2001 = parent performance/onboard latency program
perf(onboard): replace remaining fixed readiness polls with deadline-based waits #3768 = direct Sprint 5 child issue for replacing remaining fixed readiness polls with deadline-based waits
perf(onboard): add deadline-based gateway wait #2492 = first scoped implementation step, applying the deadline-based wait primitive to gateway startup health polling

This PR should be tracked for Sprint 5 review, but it should not auto-close #3768 or #2001 when merged. If it lands, #3768 should remain open for the remaining readiness paths called out there.

cv · 2026-06-12T16:08:25Z

@coderabbitai review

coderabbitai · 2026-06-12T16:08:38Z

✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

## Summary Refreshes release-prep documentation for NemoClaw v0.0.65. Adds the v0.0.65 release-notes section and refreshes generated `nemoclaw-user-*` skills from the Fern MDX source docs. ## Changes - Added the v0.0.65 release notes to `docs/about/release-notes.mdx` with links to the deeper docs pages for lifecycle, troubleshooting, inference, CLI commands, messaging, credentials, network policy, Hermes, and sub-agents. - Regenerated the `nemoclaw-user-*` skills with `scripts/docs-to-skills.py` so release-prep skill output matches the merged source docs. - Used the v0.0.65 announcement discussion as release context: #5472. ## Source Summary - #2492 -> `docs/about/release-notes.mdx`: Documents deadline-based gateway wait reliability in the v0.0.65 recovery summary. - #4958 -> `docs/about/release-notes.mdx`: Documents re-execed OpenClaw gateway health check recovery in the sandbox recovery summary. - #5163 -> `docs/about/release-notes.mdx`: Documents safer uninstall TTY confirmation behavior in the day-two CLI summary. - #5178 -> `docs/about/release-notes.mdx`: Documents fail-closed config restore merge behavior in the rebuild and restore summary. - #5179 -> `docs/about/release-notes.mdx`: Documents WeChat QR token redaction in the messaging summary. - #5182 -> `docs/about/release-notes.mdx`: Documents sustained gateway serving checks in the recovery summary. - #5194 -> `docs/about/release-notes.mdx`: Documents model-router teardown during uninstall in the day-two CLI summary. - #5195 -> `docs/about/release-notes.mdx`: Documents Shields auto-restore lock reconfirmation in the rebuild and restore summary. - #5198 -> `docs/about/release-notes.mdx`: Documents Docker Desktop WSL CDI injection failure handling in the onboarding diagnostics summary. - #5201 -> `docs/about/release-notes.mdx`: Documents sandbox download/upload wrappers and sessions export in the day-two CLI summary. - #5205 -> `docs/about/release-notes.mdx`: Documents reporter-owned model metadata preservation in the rebuild and restore summary. - #5214 -> `docs/about/release-notes.mdx`: Documents managed vLLM model preflight before side effects in the inference setup summary. - #5215 -> `docs/about/release-notes.mdx`: Documents managed vLLM extra serve arguments in the inference setup summary. - #5216 -> `docs/about/release-notes.mdx`: Documents silent OpenClaw runtime fallback surfacing in the onboarding diagnostics summary. - #5225 -> `docs/about/release-notes.mdx`: Documents persisted sandbox gateway lookup in the gateway recovery summary. - #5238 -> `docs/about/release-notes.mdx`: Documents sub-agent gateway dial-back through the sandbox interface in the Hermes and sub-agent summary. - #5248 -> `docs/about/release-notes.mdx`: Documents Discord per-account proxy resolution in the messaging summary. - #5264 -> `docs/about/release-notes.mdx`: Documents reserved Hermes port `8642` handling in the Hermes compatibility summary. - #5267 -> `docs/about/release-notes.mdx`: Documents the narrower Hermes baseline policy in the Hermes compatibility summary. - #5321 -> `docs/about/release-notes.mdx`: Documents restored gateway guard chains in the gateway recovery summary. - #5328 -> `docs/about/release-notes.mdx`: Documents compact persisted messaging plans in the messaging summary. - #5338 -> `docs/about/release-notes.mdx`: Documents manifest channel migration in the messaging summary. - #5352 -> `docs/about/release-notes.mdx`: Documents persisted agent preservation through registry recovery in the rebuild and restore summary. - #5371 -> `.agents/skills/nemoclaw-user-reference/references/commands.md`: Refreshes generated skill output for custom build cache and layer-ordering source docs. - #5379 -> `docs/about/release-notes.mdx`: Documents dashboard port allocation across multiple NemoClaw gateways in the recovery summary. - #5382 -> `docs/about/release-notes.mdx`: Documents recovery when an active gateway has no sandbox spec in the recovery summary. - #5389 -> `.agents/skills/nemoclaw-user-reference/references/troubleshooting.md`: Refreshes generated skill output for declared agent `forward_ports` recovery source docs. - #5400 -> `docs/about/release-notes.mdx`: Documents bounded compatible endpoint probes in the inference setup summary. - #5410 -> `docs/about/release-notes.mdx`: Documents provider credential hash removal from sandbox registry entries in the messaging summary. - #5418 -> `docs/about/release-notes.mdx`: Documents summarized inference validation failures in the onboarding diagnostics summary. - #5457 -> `docs/about/release-notes.mdx`: Documents context-window recomputation after runtime model switches in the inference setup summary. - #5463 -> `docs/about/release-notes.mdx`: Documents cleanup of hard-coded messaging channel stragglers in the messaging summary. ## Skipped - #5366 matched `docs/.docs-skip` entries through skipped experimental paths, so this PR does not add new release-note text for that commit. ## Type of Change - [ ] Code change (feature, bug fix, or refactor) - [ ] Code change with doc updates - [ ] Doc only (prose changes, no code sample modifications) - [x] Doc only (includes code sample changes) ## Verification - [x] Git hooks passed during commit and push, or `npx prek run --from-ref main --to-ref HEAD` passes - [ ] Targeted tests pass for changed behavior - [ ] Full `npm test` passes (broad runtime changes only) - [ ] Tests added or updated for new or changed behavior - [x] No secrets, API keys, or credentials committed - [x] Docs updated for user-facing behavior changes - [ ] `npm run docs` builds without warnings (doc changes only) - [x] Doc pages follow the [style guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md) (doc changes only) - [ ] New doc pages include SPDX header and frontmatter (new pages only) Verification notes: - `npm run docs` passed after rerunning outside the sandbox. Fern reported 0 errors and 1 hidden warning. - The first sandboxed `npm run docs` attempt failed before validation because `tsx` could not create its local IPC pipe under sandbox restrictions. - `npm run build:cli` passed before push to refresh the local `dist/` artifacts used by the CLI typecheck hook. - `npm test` was not run because this is a docs-only release refresh. --- Signed-off-by: Miyoung Choi <miyoungc@nvidia.com>  ## Summary by CodeRabbit * **New Features** * Released NemoClaw v0.0.65 with improved gateway/sandbox recovery, safer day-two workflows, and enhanced Hermes compatibility. * Added managed vLLM extra-arguments configuration via `NEMOCLAW_VLLM_EXTRA_ARGS_JSON`. * Added Hermes troubleshooting guidance for port forwarding and health checks. * **Documentation** * Updated NVIDIA Endpoints/NIM setup and examples to use `NVIDIA_INFERENCE_API_KEY`. * Refined NVIDIA network policy and Model Router API base configuration. * Expanded CLI/environment variable documentation (including sub-agent gateway connectivity) and plugin build performance tips. * **Tests** * Expanded Vitest-backed E2E release validation coverage.

HOYALIM changed the title ~~[codex] Add deadline-based gateway wait~~ perf(onboard): add deadline-based gateway wait Apr 26, 2026

HOYALIM marked this pull request as ready for review April 27, 2026 00:02

Copilot AI review requested due to automatic review settings April 27, 2026 00:02

Copilot started reviewing on behalf of HOYALIM April 27, 2026 00:03 View session

HOYALIM marked this pull request as draft April 27, 2026 00:06

Copilot AI reviewed Apr 27, 2026

View reviewed changes

Comment thread src/lib/onboard.ts Outdated

Comment thread src/lib/wait.ts Outdated

coderabbitai Bot reviewed Apr 27, 2026

View reviewed changes

Comment thread src/lib/onboard.ts Outdated

Comment thread src/lib/wait.ts Outdated

HOYALIM marked this pull request as ready for review April 27, 2026 00:08

coderabbitai Bot reviewed Apr 27, 2026

View reviewed changes

Comment thread src/lib/wait.ts Outdated

coderabbitai Bot reviewed Apr 27, 2026

View reviewed changes

Comment thread src/lib/wait.ts Outdated

HOYALIM and others added 6 commits April 26, 2026 21:15

perf(onboard): add deadline-based gateway wait

3045a38

Update src/lib/onboard.ts

f2e7b76

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

fix(wait): address polling review feedback

4568f14

fix(wait): check deadline before retry probe

6d0daab

fix(wait): honor expired deadline before probing

8cee17d

Update src/lib/wait.ts

5a709c4

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

HOYALIM force-pushed the issue-2001-latency-waits branch from bb47782 to 5a709c4 Compare April 27, 2026 04:17

Merge branch 'main' into issue-2001-latency-waits

a301e6e

This comment was marked as resolved.

Sign in to view

wscurran added dependencies Pull requests that update a dependency file enhancement: performance labels Apr 27, 2026

cv closed this May 12, 2026

cv reopened this May 12, 2026

cv mentioned this pull request May 14, 2026

perf: investigate and reduce networking latency during onboard and validation #2001

Closed

9 tasks

wscurran mentioned this pull request May 18, 2026

perf(onboard): replace remaining fixed readiness polls with deadline-based waits #3768

Open

wscurran added chore Build, CI, dependency, or tooling maintenance feature PR adds or expands user-visible functionality area: performance Latency, throughput, resource use, benchmarks, or scaling and removed enhancement: performance labels Jun 3, 2026

cv added 11 commits June 12, 2026 00:41

merge(main): resolve gateway wait conflicts

91ad1fb

refactor(onboard): extract gateway health wait

4beebb4

test(onboard): cover gateway health wait helper

2dc26d1

fix(wait): avoid blocking async default sleep

fc479f6

Merge branch 'main' into issue-2001-latency-waits

5fe89d7

refactor(wait): migrate local adapter health wait

4d96f94

refactor(wait): migrate gateway http readiness wait

713a9e4

refactor(wait): migrate docker driver gateway service wait

6e8096f

refactor(wait): migrate sandbox recovery gateway wait

ff5807d

refactor(wait): migrate host gateway exit wait

5c9d128

refactor(wait): migrate stale gateway exit wait

ff86b84

cv approved these changes Jun 12, 2026

View reviewed changes

cv merged commit 7a8fcae into NVIDIA:main Jun 12, 2026
34 checks passed

cv added the v0.0.65 Release target label Jun 13, 2026

miyoungc mentioned this pull request Jun 16, 2026

docs: refresh v0.0.65 release docs #5519

Merged

13 tasks

Conversation

HOYALIM commented Apr 26, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Related Issue

Changes

Type of Change

Verification

AI Disclosure

Summary by CodeRabbit

Uh oh!

copy-pr-bot Bot commented Apr 26, 2026

Uh oh!

coderabbitai Bot commented Apr 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

This comment was marked as resolved.

Uh oh!

wscurran commented Apr 27, 2026

Uh oh!

wscurran commented May 19, 2026

Uh oh!

cv commented Jun 12, 2026

Uh oh!

coderabbitai Bot commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

HOYALIM commented Apr 26, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 26, 2026 •

edited

Loading

coderabbitai Bot commented Jun 12, 2026 •

edited

Loading