fix(agent): regenerate proxy-env.sh guard chain during recovery (#2701)#5259
fix(agent): regenerate proxy-env.sh guard chain during recovery (#2701)#5259hunglp6d wants to merge 2 commits into
Conversation
Commit 27ae4c3 extracted patchStagedDockerfile() from onboard.ts into sandbox-dockerfile-patch-flow.ts but dropped the design-intent comment that documents why darwinVmCompat=false for Docker builds. The openshell-gateway-upgrade-e2e test greps for this comment in onboard.ts as a design guard. Restore the comment in the new file and update the test to grep the correct path. Signed-off-by: Hung Le <hple@nvidia.com>
When /tmp/nemoclaw-proxy-env.sh is missing at recovery time (e.g. after a pod recreate), the recovery script now scans for preload .js files that still exist on disk and regenerates a minimal proxy-env.sh with the corresponding NODE_OPTIONS --require entries. This prevents the @homebridge/ciao crash loop on aarch64 / DGX Spark where the gateway respawns without the ciao-network-guard preload and hits an unhandled os.networkInterfaces() exception. Both the OpenClaw and non-OpenClaw agent recovery paths are fixed. Signed-off-by: Hung Le <hple@nvidia.com>
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
E2E Advisor RecommendationRequired E2E: Dispatch hint: Full advisor summaryE2E Recommendation AdvisorBase: Required E2E
Optional E2E
New E2E recommendations
Dispatch hint
|
Vitest E2E Scenario RecommendationRequired Vitest E2E scenarios: Dispatch required Vitest E2E scenarios:
Full Vitest E2E advisor summaryVitest E2E Scenario AdvisorBase: Required Vitest E2E scenarios
Optional Vitest E2E scenarios
Relevant changed files
|
PR Review AdvisorFindings: 4 needs attention, 2 worth checking, 0 nice ideas Review findings🛠️ Needs attention
🔎 Worth checking
🌱 Nice ideas
Consider writing more tests for
This is an automated advisory review. A human maintainer must make the final merge decision. |
|
Closing this in favor of #5321. The replacement keeps the missing |
|
@cv Thanks, this is AI-generated PR, will close the associated issue. |
## Summary This PR addresses the shared gateway recovery-state failure from #2701 by restoring the guard chain from trusted packaged preload sources when `/tmp/nemoclaw-proxy-env.sh` is missing or incomplete. Recovery now recreates a minimal proxy-env file for the critical safety-net and ciao preloads, validates exact `--require` entries, and refuses an unguarded relaunch if trusted staging fails. ## Related Issue Addresses the shared guard-chain recovery failure in #2701; does not close the broader hardware/provider/recreate-trigger validation matrix. Refs #2701 Refs #2478 Supersedes #5259 Supersedes #5265 ## Changes - Added a shared recovery helper that stages safety-net and ciao preloads from `/usr/local/lib/nemoclaw/preloads/` into hardened `/tmp` files. - Wired OpenClaw and agent-specific gateway recovery through the helper instead of the old warning-only missing-proxy-env path. - Added shell-executed tests for missing proxy-env restoration, exact `--require` matching, symlink replacement, missing trusted sources, and Hermes recovery harness integration. - Documented the live E2E coverage scope: production `connect --probe-only` / sandbox-exec recovery after a pod-recreate-equivalent guard-chain wipe, with DGX Spark / GB10 / aarch64, provider breadth, and destructive recreate triggers intentionally left for dedicated platform validation. ## Type of Change - [x] Code change (feature, bug fix, or refactor) - [ ] Code change with doc updates - [ ] Doc only (prose changes, no code sample modifications) - [ ] Doc only (includes code sample changes) ## Verification - [x] `npx prek run --all-files` passes - [x] `npm test` passes - [x] Tests added or updated for new or changed behavior - [x] No secrets, API keys, or credentials committed - [ ] Docs updated for user-facing behavior changes - [ ] `npm run docs` builds without warnings (doc changes only) - [ ] Doc pages follow the [style guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md) (doc changes only) - [ ] New doc pages include SPDX header and frontmatter (new pages only) --- <!-- DCO sign-off required by CI. Run: git config user.name && git config user.email --> Signed-off-by: Carlos Villela <cvillela@nvidia.com> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Recovery scripts now stage and validate trusted preload modules, restore guard chains early, and hard-fail if critical guards remain missing. * Added test harness helpers to simulate trusted preload sources and rewrite recovery scripts for E2E tests. * **Bug Fixes** * Hardened proxy-env handling: reject symlinks, enforce strict permissions/ownership, avoid sourcing attacker-controlled content, and prevent duplicate/unsafe preload entries. * **Tests** * Expanded E2E and unit coverage with ordering, permission/symlink scenarios, logging and PID-stability assertions. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Carlos Villela <cvillela@nvidia.com>
Fixes #5262
✨ [AI-generated issue]
Summary
When
/tmp/nemoclaw-proxy-env.shis missing at recovery time (e.g. after a pod recreate on DGX Spark), the recovery script now scans for preload.jsfiles that still exist on disk and regenerates a minimalproxy-env.shwith the correspondingNODE_OPTIONS --requireentries. Previously, the recovery script logged a WARNING and launched the gateway naked, which triggers the@homebridge/ciaocrash loop on aarch64 / DGX Spark (#2701).Related Issue
Changes
src/lib/agent/runtime.ts(buildOpenClawRecoveryScript+buildRecoveryScript): Replace the warn-and-proceed branch when_PE_MISSING=1with a regenerate-and-proceed branch that:/tmp/nemoclaw-*.js)NODE_OPTIONSstring from whichever files are present/tmp/nemoclaw-proxy-env.shwith mode 444Validation
Custom-e2e validation was not run — the available token lacks the
workflowscope required to push.github/workflows/custom-e2e.yaml. Manual validation: runissue-2478-crash-loop-recovery-e2eon this branch.a3ae21b6eed8399099fd390bd45ad43e78218258issue-2478-crash-loop-recovery-e2e / run (#80933852556)Type of Change
Verification
npx prek run --all-filespassesnpm testpassesAI Disclosure
Signed-off-by: Hung Le hple@nvidia.com