Skip to content

fix(hermes): install safety-net + ciao NODE_OPTIONS preloads so recover can relaunch#5416

Open
hizrianraz wants to merge 2 commits into
NVIDIA:mainfrom
hizrianraz:fix/hermes-recover-node-guards-2478
Open

fix(hermes): install safety-net + ciao NODE_OPTIONS preloads so recover can relaunch#5416
hizrianraz wants to merge 2 commits into
NVIDIA:mainfrom
hizrianraz:fix/hermes-recover-node-guards-2478

Conversation

@hizrianraz

@hizrianraz hizrianraz commented Jun 14, 2026

Copy link
Copy Markdown

Summary

The Hermes entrypoint (agents/hermes/start.sh) never installed the sandbox-safety-net and ciao-network-guard NODE_OPTIONS preloads that the OpenClaw entrypoint (scripts/nemoclaw-start.sh) installs. As a result the gateway-recovery path refuses to relaunch a stopped Hermes gateway ("proxy-env present but NODE_OPTIONS missing safety-net preload or ciao preload — refusing unguarded gateway relaunch"), and the @homebridge/ciao networkInterfaces() crash (#2478) can hit Hermes sandboxes. This ports the guard install to the Hermes entrypoint, hardened so it can never abort the entrypoint.

Related Issue

Relates to #2478 — the ciao-network-guard preload this ports to the Hermes entrypoint.

Changes

  • agents/hermes/start.sh: add install_nemoclaw_node_guards() — copies sandbox-safety-net.js + ciao-network-guard.js to /tmp/nemoclaw-*.js and adds them to NODE_OPTIONS (mirrors scripts/nemoclaw-start.sh, sourcing from the Hermes image's /opt/nemoclaw-blueprint/scripts).
  • Re-export both guards into the sourced proxy-env (/tmp/nemoclaw-proxy-env.sh) so connect/recovery sessions see them — this is what satisfies the recovery guard-check.
  • Hardened for the set -euo pipefail entrypoint: invoked … || true (errexit suppressed through the whole function), every fallible step inside an if, all vars ${x:-}-safe, and a guard added to NODE_OPTIONS only after its /tmp copy exists and is non-empty ([ -s ]). Worst case it logs a warning and the gateway still starts.

Type of Change

  • Code change (feature, bug fix, or refactor)
  • Code change with doc updates
  • Doc only (prose changes, no code sample modifications)
  • Doc only (includes code sample changes)

Verification

  • npx prek run --all-files passes
  • npm test passes
  • Tests added or updated for new or changed behavior
  • No secrets, API keys, or credentials committed
  • Docs updated for user-facing behavior changes
  • npm run docs builds without warnings (doc changes only)
  • Doc pages follow the style guide (doc changes only)
  • New doc pages include SPDX header and frontmatter (new pages only)

The prek/npm boxes are left for CI — this environment lacked the npm/prek toolchain. Verified manually: bash -n clean; an offline harness extracted the real install_nemoclaw_node_guards and ran it under set -euo pipefail across happy / emit-fails / missing-dir / empty-write — the entrypoint continued in every mode (an unhardened version aborted on emit-fail), and the happy path put both guard substrings in NODE_OPTIONS; a throwaway container from the built Hermes sandbox image confirmed the real emit_sandbox_sourced_file writes both /tmp guards. In production on a DGX Spark, rebuilding a Hermes sandbox with the patched entrypoint came up healthy and recover then relaunched cleanly (no #2478 refusal). Happy to add a regression test in your preferred harness — the change is in the shell entrypoint, which doesn't fit the vitest / Brev-e2e suites cleanly.


Signed-off-by: Hizrian Raz hizrian@ainfera.ai

Summary by CodeRabbit

  • Chores
    • Added a best-effort “Node preload guards” setup for Hermes connection and recovery sessions to improve startup safety.
    • Updated the generated runtime environment so NODE_OPTIONS preloads are conditionally re-exported for both connect and recovery flows when guards are present.
    • Ensured Hermes recovery paths have consistent, detected guard availability without failing startup.

@copy-pr-bot

copy-pr-bot Bot commented Jun 14, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai

coderabbitai Bot commented Jun 14, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 07c081e7-3b89-453a-833d-3f9eed523dee

📥 Commits

Reviewing files that changed from the base of the PR and between 0ccc925 and 8c9a3f7.

📒 Files selected for processing (1)
  • agents/hermes/start.sh
🚧 Files skipped from review as they are similar to previous changes (1)
  • agents/hermes/start.sh

📝 Walkthrough

Walkthrough

Adds install_nemoclaw_node_guards to agents/hermes/start.sh, which stages sandbox-safety-net.js and ciao-network-guard.js into /tmp and appends --require entries to NODE_OPTIONS. Extends write_runtime_shell_env to re-export those directives in /tmp/nemoclaw-proxy-env.sh for connect and recovery sessions; failures never abort startup.

Changes

Hermes Node Preload Guard Installation and Proxy Env Wiring

Layer / File(s) Summary
Guard staging, NODE_OPTIONS wiring, and proxy env emission
agents/hermes/start.sh
install_nemoclaw_node_guards locates and stages sandbox-safety-net.js and ciao-network-guard.js via emit_sandbox_sourced_file, appends --require entries to NODE_OPTIONS per successfully staged script, and is invoked with || true so failures only log warnings. write_runtime_shell_env is updated to conditionally emit export NODE_OPTIONS=...--require <staged guard> lines into /tmp/nemoclaw-proxy-env.sh, gated on whether _SANDBOX_SAFETY_NET and _CIAO_GUARD_SCRIPT were staged, ensuring recovery and connect sessions inherit the guard preloads.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related issues

Possibly related PRs

  • NVIDIA/NemoClaw#5321: Implements Hermes startup/recovery logic that stages NemoClaw safety-net and ciao guard scripts and conditionally appends NODE_OPTIONS --require entries into /tmp/nemoclaw-proxy-env.sh, directly overlapping with this PR's implementation.
  • NVIDIA/NemoClaw#5342: Hardens the recovery/proxy-env sourcing logic that conditionally generates and sources a minimal recovery env when the Node preload guards staged by this PR are present.

Suggested labels

bug-fix, security, area: sandbox

Suggested reviewers

  • cv
  • sandl99

Poem

🐇 Hop, hop — the guards are staged in /tmp tonight,
--require stitched to NODE_OPTIONS just right,
A safety net spread with a || true grace,
The proxy env whispers each guard into place.
No crash shall escape this rabbit's warm embrace! 🌟

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: installing NODE_OPTIONS preloads for safety-net and ciao guards in the Hermes entrypoint to enable recovery functionality.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@agents/hermes/start.sh`:
- Around line 767-770: Remove the sandbox path
`/sandbox/.nemoclaw/blueprints/*/scripts` from the default search list in the
NEMOCLAW_GUARD_DIRS parameter expansion. The for loop iterating over _gd should
only default to image-owned preload directories
(`/opt/nemoclaw-blueprint/scripts` and `/usr/local/lib/nemoclaw/preloads`) when
NEMOCLAW_GUARD_DIRS is not explicitly set, allowing users to add sandbox paths
only through explicit environment variable overrides for test/dev purposes
rather than by default.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: fdeffcd8-d465-4534-8380-0d54923f18db

📥 Commits

Reviewing files that changed from the base of the PR and between 1f75d03 and 0ccc925.

📒 Files selected for processing (1)
  • agents/hermes/start.sh

Comment thread agents/hermes/start.sh Outdated
…er can relaunch

The OpenClaw entrypoint (scripts/nemoclaw-start.sh) installs the
sandbox-safety-net and ciao-network-guard NODE_OPTIONS preloads, but the
Hermes entrypoint (agents/hermes/start.sh) did not. The gateway-recovery path
refuses to relaunch unless NODE_OPTIONS advertises both guards ("proxy-env
present but NODE_OPTIONS missing safety-net preload or ciao preload - refusing
unguarded gateway relaunch"), so a stopped Hermes gateway can never be brought
back with `recover`. The ciao guard is also what prevents the @homebridge/ciao
networkInterfaces() crash (NVIDIA#2478) on Hermes sandboxes.

Port the guard install to the Hermes entrypoint. Hardened so a guard-install
failure can never abort the `set -euo pipefail` entrypoint (an unguarded version
left the gateway dead): the install runs as `... || true` (errexit suppressed
through the whole function), every fallible command sits inside an `if`, all
vars are `${x:-}`-safe, and a guard is only added to NODE_OPTIONS once its /tmp
copy exists and is non-empty. Worst case it warns and the gateway still starts.
The guards are also re-exported into the sourced proxy-env so connect/recovery
sessions see them.

Signed-off-by: Hizrian Raz <hizrian@ainfera.ai>
@hizrianraz hizrianraz force-pushed the fix/hermes-recover-node-guards-2478 branch from 0ccc925 to 8c9a3f7 Compare June 14, 2026 16:09
@hizrianraz

Copy link
Copy Markdown
Author

Thanks @coderabbitai — good catch. Removed the sandbox-writable /sandbox/.nemoclaw/blueprints/*/scripts from the default search list; the default is now image-owned dirs only (/opt/nemoclaw-blueprint/scripts, /usr/local/lib/nemoclaw/preloads), with NEMOCLAW_GUARD_DIRS reserved for explicit test/dev overrides. Force-pushed.

@coderabbitai

coderabbitai Bot commented Jun 14, 2026

Copy link
Copy Markdown
Contributor

Tip

For best results, initiate chat on the files or code changes.

@hizrianraz: Perfect — restricting the defaults to image-owned, immutable directories (/opt/nemoclaw-blueprint/scripts, /usr/local/lib/nemoclaw/preloads) and making NEMOCLAW_GUARD_DIRS the only path to include sandbox-writable locations is exactly the right hardening posture. That way the trust boundary is explicit and opt-in rather than on by default. Thanks for the quick turnaround!

(ᵔᴥᵔ)🐇

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants