Skip to content

fix(openclaw): fail closed on config restore merge errors#5178

Merged
cv merged 12 commits into
mainfrom
codex/5174-openclaw-restore-unit-tests
Jun 13, 2026
Merged

fix(openclaw): fail closed on config restore merge errors#5178
cv merged 12 commits into
mainfrom
codex/5174-openclaw-restore-unit-tests

Conversation

@cv

@cv cv commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator

Summary

Follow-up to #5174 that pins the risky OpenClaw config restore fallback cases. The restore now fails closed when selective merging cannot read or parse the current rebuilt config, or cannot parse the backup, leaving the fresh runtime config intact instead of writing a stale sanitized backup wholesale.

Related Issue

Follow-up to #5174.

Changes

  • Treat OpenClaw config selective-merge failures as state-file restore failures rather than falling back to the backup contents.
  • Add focused fake-SSH coverage for missing/invalid current openclaw.json and invalid backed-up openclaw.json.
  • Document provider/plugin merge behavior when the rebuilt config lacks generated maps.
  • Update snapshot fake SSH fixtures to model absent openclaw.json state files instead of zero-byte successful reads.

Type of Change

  • Code change (feature, bug fix, or refactor)
  • Code change with doc updates
  • Doc only (prose changes, no code sample modifications)
  • Doc only (includes code sample changes)

Verification

  • npx prek run --all-files passes
  • npm test passes
  • Tests added or updated for new or changed behavior
  • No secrets, API keys, or credentials committed
  • Docs updated for user-facing behavior changes
  • npm run docs builds without warnings (doc changes only)
  • Doc pages follow the style guide (doc changes only)
  • New doc pages include SPDX header and frontmatter (new pages only)

Signed-off-by: Carlos Villela cvillela@nvidia.com

Summary by CodeRabbit

New Features

  • OpenClaw config hash now automatically refreshes after sandbox restore operations, ensuring configuration consistency

Tests

  • Added comprehensive test coverage for config restore failure scenarios
  • Added tests validating config hash refresh functionality
  • Extended regression test coverage for configuration integrity validation

@cv cv self-assigned this Jun 10, 2026
@copy-pr-bot

copy-pr-bot Bot commented Jun 10, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai

coderabbitai Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 6995fc7a-4f4f-4e4e-8e85-956246e772e9

📥 Commits

Reviewing files that changed from the base of the PR and between e9ad4c8 and 21e6841.

📒 Files selected for processing (1)
  • scripts/nemoclaw-start.sh
🚧 Files skipped from review as they are similar to previous changes (1)
  • scripts/nemoclaw-start.sh

📝 Walkthrough

Walkthrough

The PR adds OpenClaw .config-hash refresh validation: a post-restore step recomputes the SHA-256 digest of openclaw.json into .config-hash, gates rebuild success on verification, tests the refresh behavior and symlink refusal, adds fail-closed restore tests for invalid/missing configs, and provides test assertions to validate hash correctness after restore and guard execution.

Changes

OpenClaw config hash refresh validation and integration

Layer / File(s) Summary
Config hash refresh command definition
src/lib/actions/sandbox/rebuild.ts
buildRefreshMutableOpenClawConfigHashCommand generates a shell command that recomputes openclaw.json SHA-256 into .config-hash with symlink and root-ownership checks, and refreshMutableOpenClawConfigHashAfterPostRestoreWrites executes it, validates exit status, and logs warnings on failure.
Config hash refresh command test coverage
src/lib/actions/sandbox/rebuild-config-hash.test.ts
Linux-only integration tests verify .config-hash updates to match openclaw.json SHA-256, with exit status 0 and empty stderr, and that symlinked config files are refused (exit status 11, specific error message, hash unchanged).
Config hash refresh post-restore integration
src/lib/actions/sandbox/rebuild.ts
Post-restore sequence calls the refresh helper after potential openclaw.json rewrites; introduces mutableConfigHashRefreshUnverified flag to track verification failure; updates rebuild-success condition to require unverified flag to be false; adds completion warning when refresh fails.
Shell script inline config hash refresh
scripts/nemoclaw-start.sh
Config guard permission helper adds a conditional pre-step to recompute .config-hash from openclaw.json via sha256sum when the directory and files are regular (non-symlink) before continuing with existing permission repair logic.
Config restore failure mode test coverage
test/openclaw-config-restore.test.ts
New Vitest suite tests restoreSandboxState fail-closed behavior when current or backed-up openclaw.json is missing or invalid JSON, asserting no restored files, failedFiles containing openclaw.json, and current config remains parseable and untouched.
Config hash contract validation in restore tests
test/repro-4538-raw-doctor-perms.test.ts
Adds Node crypto SHA-256 helper functions to compute and verify .config-hash matches the digest of openclaw.json and contains the filename; updates restore and guard tests to assert hash contract correctness after permission repair; modifies simulated openclaw doctor --fix to rewrite config for hash assertions.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • NVIDIA/NemoClaw#5333: Updates the live Vitest "sandbox-rebuild" scenario to exercise behavior around nemoclaw <sandbox> rebuild, which the main PR changes by adding the mutable OpenClaw .config-hash refresh step and warning/failure semantics.
  • NVIDIA/NemoClaw#5101: The main PR's sandbox restore/rebuild flow now refreshes and validates the mutable openclaw.json config hash, while the retrieved PR ensures openclaw.json is preserved, restored, and repaired post-restore—both directly touch restore-time handling of openclaw.json in the rebuild pipeline.
  • NVIDIA/NemoClaw#5177: The main PR adds fail-closed OpenClaw openclaw.json restore failure-mode tests (missing/invalid current or backup) that directly validate the unsafe-fallback prevention introduced by the retrieved PR's new restore-input/merge logic.

Suggested labels

area: sandbox

Suggested reviewers

  • prekshivyas

Poem

A rabbit hops through config hashes bright,
SHA-256 dancing in the shell-script night,
Guard-lines refresh what doctors might distort,
While fail-closed tests keep data safe in port.
Symlinks refused, and .config-hash shines true! 🐰✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly addresses the main change: implementing fail-closed behavior for config restore merge errors, which is the core objective across all modified files.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex/5174-openclaw-restore-unit-tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

github-actions Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

PR Review Advisor

Findings: 0 needs attention, 1 worth checking, 0 nice ideas
Top item: PR review advisor unavailable

Review findings

🛠️ Needs attention

  • None.

🔎 Worth checking

  • PR review advisor unavailable: The automated advisor could not complete: Could not parse JSON from PR review advisor output; see /home/runner/work/NemoClaw/NemoClaw/artifacts/pr-review-advisor/pr-review-advisor-raw-output.txt
    • Recommendation: Re-run the PR Review Advisor or perform a manual review.
    • Evidence: Could not parse JSON from PR review advisor output; see /home/runner/work/NemoClaw/NemoClaw/artifacts/pr-review-advisor/pr-review-advisor-raw-output.txt

🌱 Nice ideas

  • None.
Consider writing more tests for
  • **Runtime validation** — Add or identify targeted runtime/integration validation for the changed behavior; do not report external E2E job pass/fail here.. Runtime/sandbox/infrastructure paths need behavioral runtime validation: scripts/nemoclaw-start.sh, src/lib/actions/sandbox/rebuild.ts.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

@github-actions

github-actions Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

E2E Advisor Recommendation

Required E2E: None
Optional E2E: None

Workflow run

Full advisor summary

E2E Recommendation Advisor

Failed: Could not parse JSON from advisor output; see /home/runner/work/NemoClaw/NemoClaw/artifacts/e2e-advisor/e2e-advisor-raw-output.txt

@github-actions

github-actions Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Vitest E2E Scenario Recommendation

Required Vitest E2E scenarios: None
Optional Vitest E2E scenarios: None

Workflow run

Full Vitest E2E advisor summary

Vitest E2E Scenario Advisor

Failed: Could not parse JSON from advisor output; see /home/runner/work/NemoClaw/NemoClaw/artifacts/e2e-advisor/e2e-scenario-advisor-raw-output.txt

@github-actions

Copy link
Copy Markdown
Contributor

Selective E2E Results — ❌ Some jobs failed

Run: 27307941381
Target ref: 73ab69ba93d69b158d0856ec63398afe9828830c
Workflow ref: hotfix/5101-openclaw-config-merge
Requested jobs: snapshot-commands-e2e,rebuild-openclaw-e2e
Summary: 1 passed, 1 failed, 0 skipped

Job Result
rebuild-openclaw-e2e ❌ failure
snapshot-commands-e2e ✅ success

Failed jobs: rebuild-openclaw-e2e. Check run artifacts for logs.

@cv cv requested a review from jyaunches June 10, 2026 23:52
@cv cv changed the base branch from hotfix/5101-openclaw-config-merge to main June 10, 2026 23:53
cv added 2 commits June 10, 2026 17:09
Signed-off-by: Carlos Villela <cvillela@nvidia.com>
Signed-off-by: Carlos Villela <cvillela@nvidia.com>
@github-actions

Copy link
Copy Markdown
Contributor

Selective E2E Results — ❌ Some jobs failed

Run: 27315161743
Target ref: dfc2f5587be9391751b85d1e0e1bbfb3bdf14c10
Workflow ref: main
Requested jobs: snapshot-commands-e2e,rebuild-openclaw-e2e
Summary: 1 passed, 1 failed, 0 skipped

Job Result
rebuild-openclaw-e2e ❌ failure
snapshot-commands-e2e ✅ success

Failed jobs: rebuild-openclaw-e2e. Check run artifacts for logs.

@cv cv added the v0.0.64 Release target label Jun 11, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/lib/actions/sandbox/rebuild.ts (1)

100-102: ⚡ Quick win

Consider atomic file operations to prevent hash file corruption.

Lines 100-102 have a narrow time-of-check-time-of-use window where openclaw.json could be deleted or sha256sum could fail after the file existence check. The > .config-hash redirection truncates the hash file before sha256sum runs, so failure leaves .config-hash empty or corrupted. While the non-zero exit status is detected and the user is warned (line 1290), atomic operations would eliminate this risk entirely.

🔄 Refactor to use atomic file operations
 '[ -f "$config_file" ] || exit 0',
 'cd "$config_dir" || exit 13',
-"sha256sum openclaw.json > .config-hash",
+"sha256sum openclaw.json > .config-hash.tmp && mv .config-hash.tmp .config-hash",
 "chmod 660 .config-hash 2>/dev/null || true",

This ensures .config-hash is never left in a partially-written state — either the hash is fully computed and atomically moved into place, or the original .config-hash remains untouched on failure.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/lib/actions/sandbox/rebuild.ts` around lines 100 - 102, Replace the
direct redirection "sha256sum openclaw.json > .config-hash" with an atomic
write: compute the hash into a temporary file (use mktemp or similar) while
ensuring you are in the directory pointed to by config_dir and that config_file
exists, then move the temp file into place with mv (atomic rename) only on
successful sha256sum; also ensure the temp file is removed on failure (use a
trap/cleanup). Target the block that checks '[ -f "$config_file" ] || exit 0',
the cd "$config_dir" || exit 13 step, and the "sha256sum openclaw.json >
.config-hash" command when making this change.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@src/lib/actions/sandbox/rebuild.ts`:
- Around line 100-102: Replace the direct redirection "sha256sum openclaw.json >
.config-hash" with an atomic write: compute the hash into a temporary file (use
mktemp or similar) while ensuring you are in the directory pointed to by
config_dir and that config_file exists, then move the temp file into place with
mv (atomic rename) only on successful sha256sum; also ensure the temp file is
removed on failure (use a trap/cleanup). Target the block that checks '[ -f
"$config_file" ] || exit 0', the cd "$config_dir" || exit 13 step, and the
"sha256sum openclaw.json > .config-hash" command when making this change.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 80e9a7a3-b9b9-4267-8523-41e415817c19

📥 Commits

Reviewing files that changed from the base of the PR and between dfc2f55 and ad73487.

📒 Files selected for processing (4)
  • scripts/nemoclaw-start.sh
  • src/lib/actions/sandbox/rebuild-config-hash.test.ts
  • src/lib/actions/sandbox/rebuild.ts
  • test/repro-4538-raw-doctor-perms.test.ts

@github-actions

Copy link
Copy Markdown
Contributor

Selective E2E Results — ✅ All requested jobs passed

Run: 27389516622
Target ref: ad73487bc54fa9ce37aed614ab3157f3557747f0
Workflow ref: main
Requested jobs: snapshot-commands-e2e,rebuild-openclaw-e2e
Summary: 2 passed, 0 failed, 0 cancelled, 0 skipped

Job Result
rebuild-openclaw-e2e ✅ success
snapshot-commands-e2e ✅ success

…estore-unit-tests

# Conflicts:
#	test/repro-4538-raw-doctor-perms.test.ts
@github-actions

Copy link
Copy Markdown
Contributor

Selective E2E Results — ✅ All requested jobs passed

Run: 27390976297
Target ref: e9ad4c86ac2e3cd179bfbecadb5aefa9ae57e6ce
Workflow ref: main
Requested jobs: snapshot-commands-e2e,rebuild-openclaw-e2e
Summary: 2 passed, 0 failed, 0 cancelled, 0 skipped

Job Result
rebuild-openclaw-e2e ✅ success
snapshot-commands-e2e ✅ success

@cv cv added v0.0.65 Release target and removed v0.0.64 Release target labels Jun 12, 2026
@github-actions

Copy link
Copy Markdown
Contributor

Selective E2E Results — ✅ All requested jobs passed

Run: 27399232492
Target ref: 1ae464011c3e69e3221fa96a2493f991c80c3cb0
Workflow ref: main
Requested jobs: rebuild-openclaw-e2e,openclaw-onboard-security-posture-e2e
Summary: 2 passed, 0 failed, 0 cancelled, 0 skipped

Job Result
openclaw-onboard-security-posture-e2e ✅ success
rebuild-openclaw-e2e ✅ success

@wscurran wscurran added bug-fix PR fixes a bug or regression integration: openclaw OpenClaw integration behavior labels Jun 12, 2026
@wscurran wscurran added the v0.0.64 Release target label Jun 12, 2026
@wscurran

Copy link
Copy Markdown
Contributor

@cv cv removed the v0.0.64 Release target label Jun 12, 2026
@cv cv merged commit 55d26b7 into main Jun 13, 2026
35 checks passed
@cv cv deleted the codex/5174-openclaw-restore-unit-tests branch June 13, 2026 05:54
@miyoungc miyoungc mentioned this pull request Jun 16, 2026
13 tasks
cv pushed a commit that referenced this pull request Jun 17, 2026
## Summary
Refreshes release-prep documentation for NemoClaw v0.0.65.
Adds the v0.0.65 release-notes section and refreshes generated
`nemoclaw-user-*` skills from the Fern MDX source docs.

## Changes
- Added the v0.0.65 release notes to `docs/about/release-notes.mdx` with
links to the deeper docs pages for lifecycle, troubleshooting,
inference, CLI commands, messaging, credentials, network policy, Hermes,
and sub-agents.
- Regenerated the `nemoclaw-user-*` skills with
`scripts/docs-to-skills.py` so release-prep skill output matches the
merged source docs.
- Used the v0.0.65 announcement discussion as release context:
#5472.

## Source Summary
- #2492 -> `docs/about/release-notes.mdx`: Documents deadline-based
gateway wait reliability in the v0.0.65 recovery summary.
- #4958 -> `docs/about/release-notes.mdx`: Documents re-execed OpenClaw
gateway health check recovery in the sandbox recovery summary.
- #5163 -> `docs/about/release-notes.mdx`: Documents safer uninstall TTY
confirmation behavior in the day-two CLI summary.
- #5178 -> `docs/about/release-notes.mdx`: Documents fail-closed config
restore merge behavior in the rebuild and restore summary.
- #5179 -> `docs/about/release-notes.mdx`: Documents WeChat QR token
redaction in the messaging summary.
- #5182 -> `docs/about/release-notes.mdx`: Documents sustained gateway
serving checks in the recovery summary.
- #5194 -> `docs/about/release-notes.mdx`: Documents model-router
teardown during uninstall in the day-two CLI summary.
- #5195 -> `docs/about/release-notes.mdx`: Documents Shields
auto-restore lock reconfirmation in the rebuild and restore summary.
- #5198 -> `docs/about/release-notes.mdx`: Documents Docker Desktop WSL
CDI injection failure handling in the onboarding diagnostics summary.
- #5201 -> `docs/about/release-notes.mdx`: Documents sandbox
download/upload wrappers and sessions export in the day-two CLI summary.
- #5205 -> `docs/about/release-notes.mdx`: Documents reporter-owned
model metadata preservation in the rebuild and restore summary.
- #5214 -> `docs/about/release-notes.mdx`: Documents managed vLLM model
preflight before side effects in the inference setup summary.
- #5215 -> `docs/about/release-notes.mdx`: Documents managed vLLM extra
serve arguments in the inference setup summary.
- #5216 -> `docs/about/release-notes.mdx`: Documents silent OpenClaw
runtime fallback surfacing in the onboarding diagnostics summary.
- #5225 -> `docs/about/release-notes.mdx`: Documents persisted sandbox
gateway lookup in the gateway recovery summary.
- #5238 -> `docs/about/release-notes.mdx`: Documents sub-agent gateway
dial-back through the sandbox interface in the Hermes and sub-agent
summary.
- #5248 -> `docs/about/release-notes.mdx`: Documents Discord per-account
proxy resolution in the messaging summary.
- #5264 -> `docs/about/release-notes.mdx`: Documents reserved Hermes
port `8642` handling in the Hermes compatibility summary.
- #5267 -> `docs/about/release-notes.mdx`: Documents the narrower Hermes
baseline policy in the Hermes compatibility summary.
- #5321 -> `docs/about/release-notes.mdx`: Documents restored gateway
guard chains in the gateway recovery summary.
- #5328 -> `docs/about/release-notes.mdx`: Documents compact persisted
messaging plans in the messaging summary.
- #5338 -> `docs/about/release-notes.mdx`: Documents manifest channel
migration in the messaging summary.
- #5352 -> `docs/about/release-notes.mdx`: Documents persisted agent
preservation through registry recovery in the rebuild and restore
summary.
- #5371 ->
`.agents/skills/nemoclaw-user-reference/references/commands.md`:
Refreshes generated skill output for custom build cache and
layer-ordering source docs.
- #5379 -> `docs/about/release-notes.mdx`: Documents dashboard port
allocation across multiple NemoClaw gateways in the recovery summary.
- #5382 -> `docs/about/release-notes.mdx`: Documents recovery when an
active gateway has no sandbox spec in the recovery summary.
- #5389 ->
`.agents/skills/nemoclaw-user-reference/references/troubleshooting.md`:
Refreshes generated skill output for declared agent `forward_ports`
recovery source docs.
- #5400 -> `docs/about/release-notes.mdx`: Documents bounded compatible
endpoint probes in the inference setup summary.
- #5410 -> `docs/about/release-notes.mdx`: Documents provider credential
hash removal from sandbox registry entries in the messaging summary.
- #5418 -> `docs/about/release-notes.mdx`: Documents summarized
inference validation failures in the onboarding diagnostics summary.
- #5457 -> `docs/about/release-notes.mdx`: Documents context-window
recomputation after runtime model switches in the inference setup
summary.
- #5463 -> `docs/about/release-notes.mdx`: Documents cleanup of
hard-coded messaging channel stragglers in the messaging summary.

## Skipped
- #5366 matched `docs/.docs-skip` entries through skipped experimental
paths, so this PR does not add new release-note text for that commit.

## Type of Change
- [ ] Code change (feature, bug fix, or refactor)
- [ ] Code change with doc updates
- [ ] Doc only (prose changes, no code sample modifications)
- [x] Doc only (includes code sample changes)

## Verification
- [x] Git hooks passed during commit and push, or `npx prek run
--from-ref main --to-ref HEAD` passes
- [ ] Targeted tests pass for changed behavior
- [ ] Full `npm test` passes (broad runtime changes only)
- [ ] Tests added or updated for new or changed behavior
- [x] No secrets, API keys, or credentials committed
- [x] Docs updated for user-facing behavior changes
- [ ] `npm run docs` builds without warnings (doc changes only)
- [x] Doc pages follow the [style
guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md)
(doc changes only)
- [ ] New doc pages include SPDX header and frontmatter (new pages only)

Verification notes:
- `npm run docs` passed after rerunning outside the sandbox. Fern
reported 0 errors and 1 hidden warning.
- The first sandboxed `npm run docs` attempt failed before validation
because `tsx` could not create its local IPC pipe under sandbox
restrictions.
- `npm run build:cli` passed before push to refresh the local `dist/`
artifacts used by the CLI typecheck hook.
- `npm test` was not run because this is a docs-only release refresh.

---
Signed-off-by: Miyoung Choi <miyoungc@nvidia.com>

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Released NemoClaw v0.0.65 with improved gateway/sandbox recovery,
safer day-two workflows, and enhanced Hermes compatibility.
* Added managed vLLM extra-arguments configuration via
`NEMOCLAW_VLLM_EXTRA_ARGS_JSON`.
* Added Hermes troubleshooting guidance for port forwarding and health
checks.

* **Documentation**
* Updated NVIDIA Endpoints/NIM setup and examples to use
`NVIDIA_INFERENCE_API_KEY`.
* Refined NVIDIA network policy and Model Router API base configuration.
* Expanded CLI/environment variable documentation (including sub-agent
gateway connectivity) and plugin build performance tips.

* **Tests**
  * Expanded Vitest-backed E2E release validation coverage.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug-fix PR fixes a bug or regression integration: openclaw OpenClaw integration behavior v0.0.65 Release target

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants