fix(recovery): restore OpenClaw guard chain by jyaunches · Pull Request #5340 · NVIDIA/NemoClaw

jyaunches · 2026-06-12T16:22:35Z

Summary

Fixes #2701 with a narrow OpenClaw recovery change: when /tmp/nemoclaw-proxy-env.sh is missing after a pod/container recreate, the recovery script re-emits the guard preload files from the baked /usr/local/lib/nemoclaw/preloads/ directory and regenerates /tmp/nemoclaw-proxy-env.sh before launching the gateway.

This preserves the failing-test-first guard from #5049 without adding new framework, registry, workflow, or migration infrastructure.

Related Issues

Fixes #2701
Refs #2478
Refs #5049
Refs #5098

Simplicity / #5098 alignment

Test shape: focused existing Vitest unit coverage in src/lib/agent/runtime.test.ts.
New shared helpers/framework/registry/ledger: none.
Workflow changes: none.
Legacy bash changes: none.
Production change: one localized shell fragment inside buildOpenClawRecoveryScript().

Verification

npm run build:cli
sh -n against generated buildOpenClawRecoveryScript(18789) output
npx vitest run src/lib/agent/runtime.test.ts

Note: pre-push full Test (CLI) is still noisy locally with existing environment/timeouts (json5 fixture path, local OpenShell/gateway state, shell supervisor signal assertions); targeted validation above passed.

Summary by CodeRabbit

Bug Fixes
- Enhanced OpenClaw gateway recovery with stricter validation and improved error handling for missing proxy environment files.
Tests
- Added regression tests for gateway recovery scenarios to ensure proper guard chain restoration and execution order.

coderabbitai · 2026-06-12T16:22:49Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 8dfed37f-9bec-43ec-92ef-3a02945ddb53

📥 Commits

Reviewing files that changed from the base of the PR and between 7104567 and 9c7b497.

📒 Files selected for processing (2)

src/lib/agent/runtime.test.ts
src/lib/agent/runtime.ts

📝 Walkthrough

Walkthrough

This PR fixes a crash loop on aarch64 systems after pod recreate by implementing automatic re-emission of OpenClaw guard preload chains in the recovery path. A new buildOpenClawGuardRestoreCommand() helper generates shell to restore guard files from bundled sources, integrated into buildOpenClawRecoveryScript() with stricter failure checks, and validated by updated and new tests.

Changes

Guard Chain Restoration for Crash Loop Fix

Layer / File(s)	Summary
Guard restoration helper and preload restoration `src/lib/agent/runtime.ts`	New `buildOpenClawGuardRestoreCommand(port)` generates shell to restore `/tmp/nemoclaw-*.js` guard preloads from `/usr/local/lib/nemoclaw/preloads/`, validates sources, re-creates `/tmp/nemoclaw-proxy-env.sh` with gateway environment and `NODE_OPTIONS` preload chain, enforces permissions, and hard-fails with gateway-log entries on validation errors.
Recovery script integration and stricter failure handling `src/lib/agent/runtime.ts`	`buildOpenClawRecoveryScript(port)` calls the new guard-restore command and adds explicit post-restoration checks: missing proxy-env file becomes a hard error, and presence of the env file without required `NODE_OPTIONS` guards becomes a hard error (preventing unguarded gateway relaunch).
Test updates and regression test `src/lib/agent/runtime.test.ts`	Updated existing test to assert guard-chain restoration marker emission and log tail. New regression test verifies correct execution order (log selection → guard-chain restore → sourcing → nohup) when `proxy-env.sh` is missing.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related issues

nightly-e2e: gateway recovery does not regenerate proxy-env.sh guard chain (#2701) #5262: Directly addressed by implementing guard-chain regeneration inside buildOpenClawRecoveryScript and tightening failure handling to ensure NODE_OPTIONS preload guards are restored before gateway relaunch.

Possibly related PRs

NVIDIA/NemoClaw#5049: Adds E2E and recovery helper tests that validate guard-chain restoration and marker/log contracts modified in this PR.

Suggested labels

bug-fix, integration: openclaw, area: sandbox, platform: dgx-spark, v0.0.64

Suggested reviewers

cv

Poem

🐰 When /tmp guards are swept away by a pod's final breath,
Our recovery script now restores them before gateway's death.
No more crash loops on aarch64 shores—
We rebuild the chains and then relaunch floors! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title 'fix(recovery): restore OpenClaw guard chain' directly describes the main change: adding guard-chain restoration to the recovery script to fix issue `#2701`.
Linked Issues check	✅ Passed	The PR implements the core recovery fix from `#2701` by adding guard-restore logic and stricter failure handling to buildOpenClawRecoveryScript, plus comprehensive unit tests validating re-emission behavior and execution order.
Out of Scope Changes check	✅ Passed	All changes are scoped to test and recovery-script generation; no modifications to unrelated files, CLI flags, nemoclaw-start.sh, or permission models.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

fix(recovery): restore OpenClaw guard chain

9c7b497

jyaunches closed this Jun 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(recovery): restore OpenClaw guard chain#5340

fix(recovery): restore OpenClaw guard chain#5340
jyaunches wants to merge 1 commit into
NVIDIA:mainfrom
jyaunches:fix/2701-guard-reemit-simple

jyaunches commented Jun 12, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 12, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jyaunches commented Jun 12, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Related Issues

Simplicity / #5098 alignment

Verification

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jyaunches commented Jun 12, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 12, 2026 •

edited

Loading