Skip to content

fix(connect): recover when active gateway has no spec for the sandbox#5382

Merged
cv merged 2 commits into
mainfrom
feat/sandbox-no-spec-recovery
Jun 13, 2026
Merged

fix(connect): recover when active gateway has no spec for the sandbox#5382
cv merged 2 commits into
mainfrom
feat/sandbox-no-spec-recovery

Conversation

@laitingsheng

@laitingsheng laitingsheng commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

Summary

When a sandbox is onboarded against one NemoClaw gateway and the user later switches the active OpenShell gateway to a sibling instance, nemoclaw <sandbox> connect calls openshell sandbox get <sandbox> against the wrong gateway and receives the gRPC reply sandbox has no spec. The lookup currently falls through to the catch-all unknown_error bucket, so getReconciledSandboxGatewayState never attempts recovery and the user is told the sandbox could not be verified. Treat that reply as a missing-on-this-gateway signal, route it through the existing named-gateway reconciler, and let it openshell gateway select the sandbox's owning gateway and retry — so the connect proceeds without manual intervention.

Changes

  • src/lib/actions/sandbox/gateway-state.ts — extend the missing pattern in getSandboxGatewayState and getSandboxGatewayStateForStatus to cover sandbox has no spec. The reconciler already self-heals a missing lookup against the sandbox's recorded gateway, so this single classification change wires the existing recovery path to the multi-instance case without any new branch logic.
  • src/lib/onboard/dashboard.ts — accept an optional listSandboxes dep on OnboardDashboardDeps and thread it through to getRegistryOccupiedDashboardPorts in ensureDashboardForward. Production callers keep the live-registry default; tests can inject a stub so the dashboard helpers do not read whatever sandboxes happen to live in the test runner's ~/.nemoclaw/sandboxes.json.
  • src/lib/onboard/dashboard-port.ts — re-export ListSandboxesFn so dashboard.ts (and any other consumer of the dep seam) can name the type without duplicating the shape.
  • test/onboard-dashboard.test.ts — pass listSandboxes: () => ({ sandboxes: [] }) into every createOnboardDashboardHelpers call so the existing port-allocation expectations stay independent of the runner's real registry.
  • src/lib/actions/sandbox/gateway-state-drift.test.ts — two regression tests: (1) getSandboxGatewayState classifies the sandbox has no spec gRPC reply as missing; (2) reconcileMissingAgainstNamedGateway selects the sandbox's owning gateway and retries when the active gateway is a sibling that does not know about the sandbox.

Type of Change

  • Code change (feature, bug fix, or refactor)
  • Code change with doc updates
  • Doc only (prose changes, no code sample modifications)
  • Doc only (includes code sample changes)

Verification

  • npx prek run --all-files passes
  • npm test passes
  • Tests added or updated for new or changed behavior
  • No secrets, API keys, or credentials committed
  • Docs updated for user-facing behavior changes
  • npm run docs builds without warnings (doc changes only)
  • Doc pages follow the style guide (doc changes only)
  • New doc pages include SPDX header and frontmatter (new pages only)

Signed-off-by: Tinson Lai tinsonl@nvidia.com

Summary by CodeRabbit

  • Bug Fixes

    • Improved sandbox-state detection and recovery when a reachable gateway reports the sandbox as missing, enabling automatic gateway selection and recovery.
  • New Features

    • Made the sandbox-listing hook injectable so registry reads can be provided or stubbed during allocation.
  • Tests

    • Added tests covering sandbox state-drift scenarios and updated dashboard allocation tests to use the injectable listing hook.

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
@coderabbitai

coderabbitai Bot commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 51822a7a-3053-4e15-bb22-f72ab02f9eac

📥 Commits

Reviewing files that changed from the base of the PR and between 2dde4cd and bb0ca38.

📒 Files selected for processing (1)
  • src/lib/actions/sandbox/gateway-state-drift.test.ts

📝 Walkthrough

Walkthrough

Classify gRPC responses containing "sandbox has no spec" as a missing-ownership sandbox state (both query paths) and add tests validating detection and owning-gateway recovery; export and inject a listSandboxes dependency for dashboard onboarding and update tests to stub registry reads.

Changes

Sandbox Gateway State Drift Recovery

Layer / File(s) Summary
Missing sandbox state detection
src/lib/actions/sandbox/gateway-state.ts
getSandboxGatewayState and getSandboxGatewayStateForStatus now treat gRPC replies containing "sandbox has no spec" (case-insensitive / NotFound variants) as a missing-ownership classification, with comments describing multi-gateway drift.
Gateway state drift test cases
src/lib/actions/sandbox/gateway-state-drift.test.ts
Three Vitest cases: classify "sandbox has no spec" as missing (preserve output) on both paths and verify reconciler selects owning gateway with ignoreError: true, returning present with recoveredGateway: true and recoveryVia: "select".

Dashboard Onboarding Registry Dependency Injection

Layer / File(s) Summary
Type export and dependency injection
src/lib/onboard/dashboard-port.ts, src/lib/onboard/dashboard.ts
Export ListSandboxesFn, import it in dashboard.ts, add optional listSandboxes?: ListSandboxesFn to OnboardDashboardDeps, and pass deps.listSandboxes to getRegistryOccupiedDashboardPorts in ensureDashboardForward.
Test dependency stub injection
test/onboard-dashboard.test.ts
Five tests updated to inject listSandboxes: () => ({ sandboxes: [] }) into createOnboardDashboardHelpers so sandbox-registry reads are stubbed in tests.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • NVIDIA/NemoClaw#5379: Also touches recognition of "sandbox has no spec" in sandbox gateway state handling and related allocation logic.
  • NVIDIA/NemoClaw#5091: Overlaps in changes to missing-sandbox handling and recovery paths in gateway-state reconciliation.

Suggested labels

area: sandbox, bug-fix, area: onboarding

Suggested reviewers

  • cv
  • ericksoa

Poem

🐰 I hopped where gateways drift and specs went thin,

I sniffed the message, "no spec", and found who'd been.
I nudged the owner gateway to take the helm,
Tests clap their paws — the sandbox finds its realm. ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and clearly summarizes the main change: recovering when an active gateway lacks the sandbox spec, which aligns with the primary fix across gateway-state.ts and test updates.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/sandbox-no-spec-recovery

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

github-actions Bot commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

E2E Advisor Recommendation

Required E2E: None
Optional E2E: None

Workflow run

Full advisor summary

E2E Recommendation Advisor

Failed: Could not parse JSON from advisor output; see /home/runner/work/NemoClaw/NemoClaw/artifacts/e2e-advisor/e2e-advisor-raw-output.txt

@github-actions

github-actions Bot commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

Vitest E2E Scenario Recommendation

Required Vitest E2E scenarios: None
Optional Vitest E2E scenarios: None

Workflow run

Full Vitest E2E advisor summary

Vitest E2E Scenario Advisor

Failed: Could not parse JSON from advisor output; see /home/runner/work/NemoClaw/NemoClaw/artifacts/e2e-advisor/e2e-scenario-advisor-raw-output.txt

@github-actions

github-actions Bot commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

PR Review Advisor

Findings: 0 needs attention, 1 worth checking, 0 nice ideas
Top item: PR review advisor unavailable

Review findings

🛠️ Needs attention

  • None.

🔎 Worth checking

  • PR review advisor unavailable: The automated advisor could not complete: Could not parse JSON from PR review advisor output; see /home/runner/work/NemoClaw/NemoClaw/artifacts/pr-review-advisor/pr-review-advisor-raw-output.txt
    • Recommendation: Re-run the PR Review Advisor or perform a manual review.
    • Evidence: Could not parse JSON from PR review advisor output; see /home/runner/work/NemoClaw/NemoClaw/artifacts/pr-review-advisor/pr-review-advisor-raw-output.txt

🌱 Nice ideas

  • None.
Consider writing more tests for
  • **Runtime validation** — Add or identify targeted runtime/integration validation for the changed behavior; do not report external E2E job pass/fail here.. Runtime/sandbox/infrastructure paths need behavioral runtime validation: src/lib/actions/sandbox/gateway-state.ts, src/lib/onboard/dashboard-port.ts, src/lib/onboard/dashboard.ts.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

@laitingsheng laitingsheng added area: sandbox OpenShell sandbox lifecycle, runtime, config, or recovery bug-fix PR fixes a bug or regression v0.0.65 Release target labels Jun 13, 2026
@cv cv merged commit 52323e5 into main Jun 13, 2026
45 checks passed
@cv cv deleted the feat/sandbox-no-spec-recovery branch June 13, 2026 08:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: sandbox OpenShell sandbox lifecycle, runtime, config, or recovery bug-fix PR fixes a bug or regression v0.0.65 Release target

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants