fix(sandbox): resolve gateway from persisted sandbox entry#5225
Conversation
…egregation Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (3)
🚧 Files skipped from review as they are similar to previous changes (3)
📝 WalkthroughWalkthroughThis PR implements sandbox-scoped gateway routing and recovery: canonical gateway name/port binding and resolver, extracted gateway-recovery flow, sandbox-aware routing helpers, wired per-sandbox gateway targeting across actions (connect, doctor, rebuild, snapshot, destroy, policy-channel), tightened docker volume matching, and expanded tests/CLI/e2e diagnostics. ChangesSandbox gateway resolution and routing
Sequence DiagramsequenceDiagram
participant SandboxRegistry as Sandbox Registry
participant Action as Action (rebuild/snapshot/destroy)
participant Recover as recoverNamedGatewayRuntime
participant Recovery as startGatewayForRecovery
participant OpenShell as openshell CLI
Action->>SandboxRegistry: getSandbox(sandboxName)
SandboxRegistry-->>Action: entry with gatewayName/gatewayPort
Action->>Action: recordedGateway = resolveSandboxGatewayName(entry)
Action->>OpenShell: gateway select recordedGateway
alt Recovery needed
Action->>Recover: recoverNamedGatewayRuntime({gatewayName: recordedGateway})
Recover->>OpenShell: gateway info -g recordedGateway
Recover->>OpenShell: gateway select recordedGateway
Recover->>Recovery: startGatewayForRecovery({gatewayName: recordedGateway})
Recovery->>OpenShell: gateway start --name recordedGateway
Recovery->>OpenShell: gateway select recordedGateway
Recovery-->>Recover: recovered
Recover-->>Action: result
end
Action->>OpenShell: sandbox list
OpenShell-->>Action: live sandboxes on recordedGateway
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related issues
Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
E2E Advisor RecommendationRequired E2E: None Full advisor summaryE2E Recommendation AdvisorFailed: Could not parse JSON from advisor output; see /home/runner/work/NemoClaw/NemoClaw/artifacts/e2e-advisor/e2e-advisor-raw-output.txt |
Vitest E2E Scenario RecommendationRequired Vitest E2E scenarios: None Full Vitest E2E advisor summaryVitest E2E Scenario AdvisorFailed: Could not parse JSON from advisor output; see /home/runner/work/NemoClaw/NemoClaw/artifacts/e2e-advisor/e2e-scenario-advisor-raw-output.txt |
PR Review AdvisorFindings: 0 needs attention, 1 worth checking, 0 nice ideas Review findings🛠️ Needs attention
🔎 Worth checking
🌱 Nice ideas
Consider writing more tests for
This is an automated advisory review. A human maintainer must make the final merge decision. |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
docs/reference/commands.mdx (1)
1826-1826: ⚡ Quick winUse active voice for the follow-up scope sentence.
The final clause uses passive voice (“is tracked separately as follow-up”). Rewrite it in active voice to match the docs style guide.
As per coding guidelines, “Active voice required. Flag passive constructions.”
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/reference/commands.mdx` at line 1826, Rewrite the final clause of the paragraph about NEMOCLAW_INSTANCE to use active voice: replace the passive phrase "is tracked separately as follow-up" with an active construction that names the actor and action (for example, "we track full per-instance segregation of registry, snapshots, and messaging separately as a follow-up" or "The team will track full per-instance segregation of registry, snapshots, and messaging in a separate follow-up"). Update the sentence that mentions NEMOCLAW_INSTANCE and NEMOCLAW_GATEWAY_PORT so the new clause clearly and actively states who will track the follow-up work.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@docs/reference/commands.mdx`:
- Line 1826: Rewrite the final clause of the paragraph about NEMOCLAW_INSTANCE
to use active voice: replace the passive phrase "is tracked separately as
follow-up" with an active construction that names the actor and action (for
example, "we track full per-instance segregation of registry, snapshots, and
messaging separately as a follow-up" or "The team will track full per-instance
segregation of registry, snapshots, and messaging in a separate follow-up").
Update the sentence that mentions NEMOCLAW_INSTANCE and NEMOCLAW_GATEWAY_PORT so
the new clause clearly and actively states who will track the follow-up work.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: a45332e8-9d50-4c2a-a755-5251538ff5ae
📒 Files selected for processing (9)
docs/reference/commands.mdxsrc/lib/core/instance.test.tssrc/lib/core/instance.tssrc/lib/credentials/store.tssrc/lib/onboard/gateway-binding.test.tssrc/lib/onboard/gateway-binding.tssrc/lib/state/paths.test.tssrc/lib/state/paths.tssrc/lib/state/sandbox.ts
Selective E2E Results — ❌ Some jobs failedRun: 27349948619
|
…state, sync docs Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
|
🌿 Preview your docs: https://nvidia-preview-pr-5225.docs.buildwithfern.com/nemoclaw |
Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/lib/onboard/gateway-binding.test.ts`:
- Around line 150-152: The test hardcodes POSIX paths causing Windows failures;
update the assertions that call resolveNemoclawHomeDir to build the expected
path with Node's path.join instead of string interpolation. Replace the two
occurrences (the expect around resolveNemoclawHomeDir("/tmp/fixture", resolved)
and the similar expect at lines ~160–162) to use path.join("/tmp/fixture",
BASE_NEMOCLAW_HOME_DIR_NAME) (or the appropriate fixture base) so the test uses
the OS-specific separator; import path at the top of the test file if not
already present.
- Around line 132-138: beforeEach/afterEach currently unconditionally delete
process.env[ENV_KEY]; instead save the prior value and restore it after the
test: introduce a scoped variable (e.g. let previousEnvValue) in the test file,
set previousEnvValue = process.env[ENV_KEY] in beforeEach then delete
process.env[ENV_KEY], and in afterEach restore by doing if (previousEnvValue ===
undefined) delete process.env[ENV_KEY]; else process.env[ENV_KEY] =
previousEnvValue; clear previousEnvValue afterwards. This uses the existing
ENV_KEY, beforeEach and afterEach identifiers to locate where to implement the
change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 30ec6adb-3507-473c-abf3-462db20798c5
📒 Files selected for processing (4)
docs/reference/commands-nemohermes.mdxdocs/reference/commands.mdxsrc/lib/core/instance.tssrc/lib/onboard/gateway-binding.test.ts
✅ Files skipped from review due to trivial changes (2)
- docs/reference/commands-nemohermes.mdx
- docs/reference/commands.mdx
🚧 Files skipped from review as they are similar to previous changes (1)
- src/lib/core/instance.ts
Selective E2E Results — ❌ Some jobs failedRun: 27351861469
|
Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
Selective E2E Results — ❌ Some jobs failedRun: 27352263020
|
Selective E2E Results — ❌ Some jobs failedRun: 27353038603
|
…t env, tighten doc claims Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (4)
docs/reference/commands-nemohermes.mdx (2)
1584-1584: ⚡ Quick winSplit sentences per line and reduce em-dash density in this paragraph.
Line 1584 contains multiple sentences on one source line, and Line 1586 uses multiple em dashes in one paragraph. LLM pattern detected.
As per coding guidelines, docs require one sentence per line and flag excessive em dashes per paragraph.
Also applies to: 1586-1586
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/reference/commands-nemohermes.mdx` at line 1584, Split the long paragraph into one sentence per source line and reduce em-dash usage: break the combined sentence about the default instance identity into separate lines (mentioning NEMOCLAW_INSTANCE, NemoClaw, the state root path ~/.nemoclaw, and the OpenShell gateway name separately), and rephrase the sentence that uses multiple em dashes to use commas or parentheses instead (the part describing nemo‑claw gateway naming with NEMOCLAW_GATEWAY_PORT and the implicit port-suffix identity). Ensure each logical sentence (default behavior, behavior when NEMOCLAW_INSTANCE is unset/empty/default, gateway naming, and port-suffix fallback) is on its own line and em-dashes are replaced with simpler punctuation.Source: Coding guidelines
1117-1117: ⚡ Quick winUse active voice for the snapshot storage sentence.
Line 1117 uses passive voice (“Snapshots are stored ...”). Please rewrite it in active voice.
As per coding guidelines, documentation must use active voice.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/reference/commands-nemohermes.mdx` at line 1117, Rewrite the passive sentence starting with "Snapshots are stored in `~/.nemoclaw/rebuild-backups/<name>/` ..." into active voice; for example, change it to something like "Nemoclaw stores snapshots in `~/.nemoclaw/rebuild-backups/<name>/` for the default instance, or in `~/.nemoclaw-<instance>/rebuild-backups/<name>/` when the NEMOCLAW_INSTANCE environment variable is set to a non-default value." Ensure the sentence replaces the passive phrasing and keeps the same paths and condition.Source: Coding guidelines
docs/reference/commands.mdx (2)
1820-1820: ⚡ Quick winSplit sentences in table cell and clarify default value.
The description cell contains two sentences on one line, which violates the style guide requirement for one sentence per line.
Additionally, "historical default" is unclear; consider using a concrete value or "nemoclaw" (as indicated by the implementation).Suggested revision
-| `NEMOCLAW_INSTANCE` | _historical default_ | NemoClaw instance identity for multi-instance hosts. Scopes the state root (`~/.nemoclaw` for the default instance, `~/.nemoclaw-<instance>` otherwise) and the OpenShell gateway name so two instances on the same host stay segregated. | +| `NEMOCLAW_INSTANCE` | nemoclaw | NemoClaw instance identity for multi-instance hosts. |Then add the detailed scoping behavior in a paragraph below the table where multi-sentence explanations fit naturally.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/reference/commands.mdx` at line 1820, Split the table cell into two lines: first state the default value plainly (e.g., "Default: nemoclaw") and second give a short one-line purpose like "NemoClaw instance identity for multi-instance hosts." Then remove the multi-sentence scoping details from the cell and add a brief paragraph below the table that explains the scoping behavior (how the state root is ~/.nemoclaw for the default instance and ~/.nemoclaw-<instance> otherwise, and how it affects the OpenShell gateway name) so the table follows the one-sentence-per-line style; update the `NEMOCLAW_INSTANCE` cell accordingly.Source: Coding guidelines
1343-1343: ⚡ Quick winUse active voice instead of passive.
"Snapshots are stored" is passive voice.
Per coding guidelines, documentation should use active voice.Suggested revision
-Snapshots are stored in `~/.nemoclaw/rebuild-backups/<name>/` for the default instance, or `~/.nemoclaw-<instance>/rebuild-backups/<name>/` when `NEMOCLAW_INSTANCE` is set to a non-default value. +NemoClaw stores snapshots in `~/.nemoclaw/rebuild-backups/<name>/` for the default instance, or `~/.nemoclaw-<instance>/rebuild-backups/<name>/` when you set `NEMOCLAW_INSTANCE` to a non-default value.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/reference/commands.mdx` at line 1343, Edit the sentence "Snapshots are stored in `~/.nemoclaw/rebuild-backups/<name>/` for the default instance, or `~/.nemoclaw-<instance>/rebuild-backups/<name>/` when `NEMOCLAW_INSTANCE` is set to a non-default value." to use active voice; for example change it to "Nemoclaw stores snapshots in ~/.nemoclaw/rebuild-backups/<name>/ for the default instance, or ~/.nemoclaw-<instance>/rebuild-backups/<name>/ when NEMOCLAW_INSTANCE is set to a non-default value." ensuring punctuation and backtick formatting remain consistent with surrounding docs.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/reference/commands.mdx`:
- Around line 1826-1831: Reflow the paragraph so each sentence is on its own
line, remove colons used to join clauses and replace them with full stops or
rephrased clauses, and eliminate excessive em dashes by using parentheses or
commas; specifically update the sentences that reference NEMOCLAW_INSTANCE, the
hyphen-segment rule (mentioning NEMOCLAW_GATEWAY_PORT collision), the paragraph
about the default instance identity and state root (~/.nemoclaw) and gateway
naming, and the paragraph describing how setting NEMOCLAW_INSTANCE before
`$$nemoclaw onboard` routes home/credentials/rebuild backups/local inference
adapter state (including Ollama auth-proxy token) and gateway binding—also move
the list of modules that still read `~/.nemoclaw` into parentheses or a separate
sentence and end with a sentence noting follow-up PRs will migrate those
surfaces.
---
Nitpick comments:
In `@docs/reference/commands-nemohermes.mdx`:
- Line 1584: Split the long paragraph into one sentence per source line and
reduce em-dash usage: break the combined sentence about the default instance
identity into separate lines (mentioning NEMOCLAW_INSTANCE, NemoClaw, the state
root path ~/.nemoclaw, and the OpenShell gateway name separately), and rephrase
the sentence that uses multiple em dashes to use commas or parentheses instead
(the part describing nemo‑claw gateway naming with NEMOCLAW_GATEWAY_PORT and the
implicit port-suffix identity). Ensure each logical sentence (default behavior,
behavior when NEMOCLAW_INSTANCE is unset/empty/default, gateway naming, and
port-suffix fallback) is on its own line and em-dashes are replaced with simpler
punctuation.
- Line 1117: Rewrite the passive sentence starting with "Snapshots are stored in
`~/.nemoclaw/rebuild-backups/<name>/` ..." into active voice; for example,
change it to something like "Nemoclaw stores snapshots in
`~/.nemoclaw/rebuild-backups/<name>/` for the default instance, or in
`~/.nemoclaw-<instance>/rebuild-backups/<name>/` when the NEMOCLAW_INSTANCE
environment variable is set to a non-default value." Ensure the sentence
replaces the passive phrasing and keeps the same paths and condition.
In `@docs/reference/commands.mdx`:
- Line 1820: Split the table cell into two lines: first state the default value
plainly (e.g., "Default: nemoclaw") and second give a short one-line purpose
like "NemoClaw instance identity for multi-instance hosts." Then remove the
multi-sentence scoping details from the cell and add a brief paragraph below the
table that explains the scoping behavior (how the state root is ~/.nemoclaw for
the default instance and ~/.nemoclaw-<instance> otherwise, and how it affects
the OpenShell gateway name) so the table follows the one-sentence-per-line
style; update the `NEMOCLAW_INSTANCE` cell accordingly.
- Line 1343: Edit the sentence "Snapshots are stored in
`~/.nemoclaw/rebuild-backups/<name>/` for the default instance, or
`~/.nemoclaw-<instance>/rebuild-backups/<name>/` when `NEMOCLAW_INSTANCE` is set
to a non-default value." to use active voice; for example change it to "Nemoclaw
stores snapshots in ~/.nemoclaw/rebuild-backups/<name>/ for the default
instance, or ~/.nemoclaw-<instance>/rebuild-backups/<name>/ when
NEMOCLAW_INSTANCE is set to a non-default value." ensuring punctuation and
backtick formatting remain consistent with surrounding docs.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 6971dc46-e1ee-4f96-a226-c822ab2bb9ce
📒 Files selected for processing (10)
docs/reference/commands-nemohermes.mdxdocs/reference/commands.mdxsrc/lib/core/instance.test.tssrc/lib/inference/local-adapter-lifecycle.tssrc/lib/inference/local.tssrc/lib/onboard/gateway-binding.test.tssrc/lib/onboard/gateway-binding.tssrc/lib/state/paths.test.tssrc/lib/state/paths.tssrc/lib/state/sandbox.ts
🚧 Files skipped from review as they are similar to previous changes (6)
- src/lib/state/paths.test.ts
- src/lib/state/paths.ts
- src/lib/state/sandbox.ts
- src/lib/onboard/gateway-binding.ts
- src/lib/core/instance.test.ts
- src/lib/onboard/gateway-binding.test.ts
Selective E2E Results — ❌ Some jobs failedRun: 27355266706
|
…per style guide Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
Selective E2E Results — ❌ Some jobs failedRun: 27356811342
|
…; document shields Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/lib/core/instance-runtime.test.ts`:
- Around line 97-114: The tests currently assert only helper composition
(resolveNemoclawHomeDir / resolveNemoclawStateDir) but must import the
downstream modules that capture those values at module load to validate
call-site capture; after setting process.env.HOME and
process.env.NEMOCLAW_INSTANCE in each test, import the actual modules that close
over the paths (e.g., import the module that exports REBUILD_BACKUPS_DIR from
state/sandbox.ts and import shields/index / shields/audit / shields/timer or
whichever shield modules export STATE_DIR/AUDIT_DIR) and assert those exported
constants equal the expected path (path.join(home, ".nemoclaw-agent-a",
"rebuild-backups") and path.join(home, ".nemoclaw-agent-a", "state")
respectively) so the tests verify runtime capture rather than only helper
composition.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: a50e015d-a779-4372-92f2-1b11b50ff68b
📒 Files selected for processing (12)
docs/reference/commands-nemohermes.mdxdocs/reference/commands.mdxsrc/lib/core/instance-runtime.test.tssrc/lib/core/instance.test.tssrc/lib/core/instance.tssrc/lib/inference/local-adapter-lifecycle.tssrc/lib/inference/local.tssrc/lib/onboard/gateway-binding.test.tssrc/lib/onboard/gateway-binding.tssrc/lib/state/paths.test.tssrc/lib/state/paths.tssrc/lib/state/sandbox.ts
✅ Files skipped from review due to trivial changes (2)
- docs/reference/commands.mdx
- docs/reference/commands-nemohermes.mdx
🚧 Files skipped from review as they are similar to previous changes (9)
- src/lib/state/paths.test.ts
- src/lib/inference/local.ts
- src/lib/state/sandbox.ts
- src/lib/inference/local-adapter-lifecycle.ts
- src/lib/core/instance.test.ts
- src/lib/core/instance.ts
- src/lib/onboard/gateway-binding.ts
- src/lib/state/paths.ts
- src/lib/onboard/gateway-binding.test.ts
Selective E2E Results — ❌ Some jobs failedRun: 27358369617
|
…drop issue refs Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
Selective E2E Results — ✅ All requested jobs passedRun: 27396170990
|
Selective E2E Results — ✅ All requested jobs passedRun: 27396548980
|
…rd ports Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
Selective E2E Results — ✅ All requested jobs passedRun: 27403018750
|
Selective E2E Results — ❌ Some jobs failedRun: 27403627726
|
…artup Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
Selective E2E Results — ❌ Some jobs failedRun: 27407973323
|
…ox-target Slack conflict Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
Selective E2E Results — ❌ Some jobs failedRun: 27440031931
|
Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
Selective E2E Results — ❌ Some jobs failedRun: 27442417753
|
Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
Selective E2E Results — ❌ Some jobs failedRun: 27457295814
|
…sed on cross-port Docker-driver recovery Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
Summary
Selected sandbox-scoped lifecycle and recovery paths now resolve the OpenShell gateway from the sandbox's persisted registry entry instead of the hardcoded
nemoclawliteral. A sandbox onboarded with a non-defaultNEMOCLAW_GATEWAY_PORTis registered againstnemoclaw-<port>; the prior behaviour talked to the wrong gateway and surfaced spurioussandbox has no specerrors.Destructive operations (
destroy,snapshot, gateway cleanup) and gateway-state probes now fail closed when a sandbox's persisted gateway binding is present but invalid, rather than silently rewriting the target to the default gateway. The gateway-name lifecycle regex is anchored on a boundary sonemoclaw-8081no longer matchesGateway: nemoclaw-80810, and the sandbox-list recovery pipeline accepts aconnected_otherlifecycle so a foreign-active gateway is selected back to the sandbox's recorded one before recovery gives up.Multi-instance dashboard-port allocation and the remaining
nemoclaw list/ parallel-gateway polish are out of scope for this change and tracked separately in follow-up work.Related Issue
Fixes #4985
Supersedes #4987
Refs #4865, #5359 — the multi-instance routing symptoms surfaced there benefit from the per-port resolution landed here, but their dashboard-port allocation and parallel-routing coverage will land in a follow-up PR.
Changes
src/lib/onboard/gateway-binding.ts—resolveSandboxGatewayNamenow fails closed when a persistedgatewayPortorgatewayNameis present but invalid (out-of-range, non-integer, out-of-namespace, malformed). The bareBASE_GATEWAY_NAMElegacy fallback still applies when both fields are absent. A tampered or corrupted registry row can no longer redirect destroy/snapshot/cleanup to the default gateway.src/lib/actions/sandbox/gateway-target.ts—gatewayNamePatternanchors the gateway-name match with a(?=\s|$)lookahead sonemoclaw-8081does not matchGateway: nemoclaw-80810. Prevents misclassification of a sibling per-port gateway as the sandbox's target.src/lib/openshell-sandbox-list.ts—captureSandboxListWithGatewayRecoveryaddsconnected_otherto its recoverable-states list so when a best-effortgateway selectdoes not stick,recoverNamedGatewayRuntimecan still re-select or start the sandbox's recorded gateway before returning the failed list.src/lib/actions/sandbox/{connect,destroy,doctor,snapshot,rebuild}.ts,src/lib/actions/sandbox/{sandbox-gateway-routing,gateway-state}.ts,src/lib/gateway-runtime-action.ts,src/lib/onboard/gateway-recovery.ts— sandbox-scoped command paths and gateway lifecycle/recovery APIs resolve the gateway throughresolveSandboxGatewayName/getNamedGatewayLifecycleState(gatewayName)/recoverNamedGatewayRuntime({ gatewayName })instead of the bare literal. The Docker-driver Linux recovery path now also receives the resolved per-port target so package-managed gateways get the runtime-marker, registration, and bridge-reachability handling.printWrongGatewayActiveGuidancealready surfaces the recorded-vs-active gateway diff with agateway select <recorded>retry hint when the active gateway differs from the sandbox's.src/lib/actions/sandbox/gateway-state.ts— afteropenshell gateway selectmoves the active gateway off, the missing-sandbox reconciler inspects the post-select lifecycle and returnsgateway_missing_after_restart/gateway_unreachable_after_restartso callers emit restart guidance rather than a stalewrong_gateway_activepointing at the now-irrelevant pre-select active gateway.src/lib/actions/sandbox/destroy-gateway.ts—cleanupGatewayAfterLastSandboxderives a per-gateway-name Docker-driver state directory and passes it (with the matching pid file) tostopHostGatewayProcesses, so destroying a sandbox bound tonemoclaw-<port>reads its own pid file rather than defaulting to the bare instance's.src/lib/actions/sandbox/policy-channel.ts—checkSlackSocketModeGatewayConflictnow targetsgetSandboxTargetGatewayName(sandboxName)(the same gateway the provider mutation runs against) so a second Slack sandbox on a non-default gateway is detected before provider upsert instead of slipping through as a false negative against the default name.src/lib/onboard.ts— recovery-target helper cluster extracted tosrc/lib/onboard/gateway-recovery.ts(startGatewayForRecovery,resolveGatewayRecoveryTarget, etc.) to keeponboard.tsunder the codebase-size budget.src/lib/onboard/gateway-binding.test.ts— drops the silent-fallback assertions and pins the new fail-closed contract for invalid persisted port/name values. The valid-port-with-invalid-name case still falls through to the port-derived name.src/lib/actions/sandbox/{rebuild-gateway-drift,gateway-drift-preflight}.test.ts— fixtures aligned with the four-state recoverable set (now includesconnected_other); a new case covers stale-recovery against a sandbox bound tonemoclaw-12345so the non-default routing path is regression-locked.src/lib/actions/sandbox/policy-channel-conflict.test.ts— adds a regression case where two Slack sandboxes sharenemoclaw-8090, blocking the second add before provider upsert (a different non-default gateway is still allowed).test/cli/destroy-gateway-cleanup.test.ts— adds a Linux-only regression fordestroy --cleanup-gatewayofnemoclaw-8081, asserting only the per-port pid file is cleared and the default instance's pid file is left untouched.test/cli/doctor-gateway-token.test.ts— asserts against the raw call array rather than a joined string so a trailing default-gateway probe is detected instead of false-passing.test/e2e/test-double-onboard.sh,test/e2e/test-onboard-repair.sh— phase diagnostics now surface the capturednemoclaw onboardstdout/stderr on failure so a CI regression no longer hides the actual onboard error.Type of Change
Verification
npx prek run --all-filespassesnpm testpassesnpm run docsbuilds without warnings (doc changes only)Signed-off-by: Tinson Lai tinsonl@nvidia.com
Summary by CodeRabbit
New Features
Bug Fixes
Improvements
Tests