Skip to content

fix(state): preserve reporter-owned model metadata on openclaw.json rebuild restore#5205

Merged
cv merged 2 commits into
NVIDIA:mainfrom
yimoj:fix/5202-openclaw-config-restore-metadata
Jun 13, 2026
Merged

fix(state): preserve reporter-owned model metadata on openclaw.json rebuild restore#5205
cv merged 2 commits into
NVIDIA:mainfrom
yimoj:fix/5202-openclaw-config-restore-metadata

Conversation

@yimoj

@yimoj yimoj commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Summary

After a NemoClaw upgrade + nemoclaw <name> rebuild, an OpenClaw sandbox's reporter-tuned openclaw.json model metadata reset to regenerated defaults. Fixing this required two layers: (1) the rebuild restore merge replaced matching models.providers entries wholesale, and (2) even after a correct merge, OpenClaw's own config-integrity protection reverted the restored config to its .last-good baseline. This PR fixes both, verified end-to-end through the real nemoclaw CLI on a live sandbox.

Related Issue

Fixes #5202

Changes

  • src/lib/state/openclaw-config-merge.ts: reconcile matching models.providers entries by ownership instead of replacing them wholesale. The fresh rebuild keeps runtime-owned routing/credentials (baseUrl, api, apiKey) and a model's identity (id, name); the backup restores user-owned non-secret tuning (reasoning, cost, contextWindow, maxTokens, compat, input). Fresh rebuild stays authoritative for the provider/model set and order; backup-only/id-less stale models are not resurrected; backup-only providers are inherited. No raw secrets restored.
  • src/lib/state/sandbox.ts: buildStateFileRestoreCommand now refreshes OpenClaw's .last-good recovery anchor for the openclaw.json merge restore — staged through a temp + atomic rename and failing closed before the live config is swapped. Without this, OpenClaw archives the restored config as openclaw.json.clobbered.<ts> and reverts to the freshly generated baseline.
  • Tests: reporter-shaped unit regression (openclaw-config-merge.test.ts), a command-string unit test for the anchor refresh + ordering + fail-closed (state-file-restore-command.test.ts), and an extended backup/restore integration test through the real backupSandboxState()/restoreSandboxState() path asserting both the merged metadata, mcp.servers retention, and the .last-good anchor (openclaw-config-snapshot.test.ts).

Notes on mcp.servers

The reporter also saw mcp.servers become {}. The fresh generated config emits no mcp section, so the backed-up mcp survives the generic object merge — confirmed live (fresh in-sandbox config showed mcp: null) and locked in by the tests. The merge layer needed no change for it; the .last-good revert was what dropped it end-to-end, which layer 2 fixes.

Type of Change

  • Code change (feature, bug fix, or refactor)

Verification

  • Reporter-workflow E2E on a live OpenClaw sandbox via the worktree CLI (no mocks):
    • node ./bin/nemoclaw.js onboard --non-interactive --yes --no-gpu --name i5202e2e — created a real OpenClaw sandbox (cloud inference, no GPU).
    • node ./bin/nemoclaw.js i5202e2e exec -- ... — tuned the in-sandbox /sandbox/.openclaw/openclaw.json to the reporter state (reasoning:true, custom cost, maxTokens:32768, compat, mcp.servers.filesystem).
    • node ./bin/nemoclaw.js i5202e2e rebuild --yes — the user-facing operation under test (backup → recreate → restore-merge → .last-good refresh).
    • Post-rebuild inspection of /sandbox/.openclaw/openclaw.json: kept reasoning:true, maxTokens:32768, cost {0.5,1.5,0.1,0.2}, compat, input ["text","image"], mcp.servers.filesystem; runtime fields fresh (baseUrl https://inference.local/v1, apiKey unused, fresh id/name); zero openclaw.json.clobbered.* files. Survived a subsequent container restart (OpenClaw integrity check re-ran). Reproduced cleanly across two consecutive fixed rebuilds.
  • Pre-fix, the same workflow showed the bug: restored config reverted to reasoning:false/maxTokens:4096/no compat/no mcp, with the correctly-merged config archived as openclaw.json.clobbered.* — proving the merge worked but OpenClaw reverted it.
  • npx vitest run on the affected unit + integration tests — passing.
  • tsc -p tsconfig.cli.json typecheck passes.
  • npx @biomejs/biome check clean on changed files.
  • codex review --uncommitted clean.
  • Tests added for new/changed behavior.
  • No secrets, API keys, or credentials committed; only the already-sanitized backup is merged.

Signed-off-by: Yimo Jiang yimoj@nvidia.com

Summary by CodeRabbit

  • Bug Fixes
    • Improved restore behavior to keep user-provided model tuning/metadata and durable server settings across rebuilds while ensuring fresh runtime credentials and routing take precedence; backup/restore now strips secrets but preserves non-sensitive config.
  • Tests
    • Added coverage validating config merge, durable server preservation, reporter-owned metadata restoration, and atomic state-file restore behavior (.last-good anchor).
  • Chores
    • Exposed state-restore command builder for safer restore orchestration.

@coderabbitai

coderabbitai Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 0333c18e-2f6d-4997-bf42-598aa067ef75

📥 Commits

Reviewing files that changed from the base of the PR and between df06f5a and b623586.

📒 Files selected for processing (5)
  • src/lib/state/openclaw-config-merge.test.ts
  • src/lib/state/openclaw-config-merge.ts
  • src/lib/state/sandbox.ts
  • test/openclaw-config-snapshot.test.ts
  • test/state-file-restore-command.test.ts
🚧 Files skipped from review as they are similar to previous changes (3)
  • test/state-file-restore-command.test.ts
  • src/lib/state/openclaw-config-merge.ts
  • test/openclaw-config-snapshot.test.ts

📝 Walkthrough

Walkthrough

Enforces fresh runtime-owned provider/model fields while restoring reporter-owned model tuning and durable MCP server settings, exports a safer state-file restore command with staged .last-good anchoring, and adds unit/integration tests plus a sandbox test helper for backup/restore flows.

Changes

OpenClaw Config Restore Ownership and Merge

Layer / File(s) Summary
Ownership contract and runtime field definitions
src/lib/state/openclaw-config-merge.ts
OPENCLAW_CONFIG_RESTORE_OWNERSHIP updated with narrowed generated maps, adjusted durable sections, and new providerRuntimeOwnedFields / modelRuntimeOwnedFields.
Provider and model reconciliation merge helpers
src/lib/state/openclaw-config-merge.ts
New helpers reconcile providers and models by id, enforce runtime-owned fields from the fresh config, avoid resurrecting backup-only id-less models, and mergeOpenClawModels now uses provider reconciliation.
Config merge unit tests
src/lib/state/openclaw-config-merge.test.ts
Vitest case verifying merged result preserves fresh provider routing/credentials, retains fresh model identity while restoring backup tuning/metadata, preserves fresh gateway token, and keeps durable mcp.servers settings.
State restore command adjustments
src/lib/state/sandbox.ts, test/state-file-restore-command.test.ts
Exported buildStateFileRestoreCommand and updated its OpenClaw path to stage and atomically install dst.last-good from a parent-dir temp file; added tests validating staging, symlink refusal, atomic install ordering, and hash refresh behavior.
Integration tests & sandbox helper
test/openclaw-config-snapshot.test.ts
Added writeFakeSandboxBins to emit fake openshell/ssh scripts; refactored existing test to use it and added #5202 integration test verifying backup sanitization and restore precedence across rebuild.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • NVIDIA/NemoClaw#5177: Related changes to mergeOpenClawRestoredConfig addressing provider/model ownership semantics.
  • NVIDIA/NemoClaw#5174: Related ownership-aware merge updates and wiring into restore flows.

Suggested labels

integration: openclaw, area: sandbox, bug-fix

Suggested reviewers

  • cv
  • prekshivyas

Poem

🐰 I hopped through backups, with a curious nose,
Kept the fresh keys shining where the runtime wind blows,
Restored the quiet tuning that reporters compose,
Anchored the last-good safe where durable moss grows,
Rebuilds hum along while old wisdom still shows.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 35.71% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the main fix: preserving reporter-owned model metadata during openclaw.json rebuild restore, matching the primary change objective.
Linked Issues check ✅ Passed The PR comprehensively addresses all coding requirements from #5202: reconciles matching providers by ownership instead of wholesale replacement, refreshes .last-good anchor to prevent reversion, restores user-owned tuning while keeping runtime-owned fields from fresh rebuilds, and includes unit/integration tests validating merged metadata and mcp.servers retention.
Out of Scope Changes check ✅ Passed All changes are directly scoped to fixing #5202: config-merge logic updates, sandbox restore command refresh, and related test additions. The refactoring of test helpers (writeFakeSandboxBins) is necessary infrastructure for new integration tests and remains within scope.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@yimoj yimoj added v0.0.64 Release target and removed v0.0.64 Release target labels Jun 11, 2026
@prekshivyas prekshivyas self-assigned this Jun 11, 2026

@prekshivyas prekshivyas left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed (code + 9-cat security), verified against source at the head SHA.

Approve — correct, well-scoped fix for #5202. Reconciles models.providers by ownership during rebuild restore (fresh rebuild keeps routing/credentials + model id/name; sanitized backup restores non-secret tuning like reasoning/cost/maxTokens/compat), and refreshes OpenClaw's .last-good recovery anchor before the live config swap so the integrity watcher doesn't revert the merge.

Verified: mergeOpenClawProviderMap/...Entry/...ModelArray exist and replace the prior wholesale-replace behavior that was the real cause of the bug; restoreRuntimeOwnedFields deletes a field when absent from current, so a stale backup apiKey/baseUrl can't resurrect; the backup is sanitized at backup time (sanitizeConfigFile) so adding mcp to durable sections inherits already-scrubbed content (the snapshot test asserts GITHUB_TOKEN === [STRIPPED_BY_MIGRATION]); and the .last-good anchor is staged via temp + atomic same-dir rename, symlink-rejected, fail-closed before the swap.

Security: all 9 pass (symlink rejection fail-closed; no secret leakage; anchor-before-swap closes the integrity-watcher race). Nits: .last-good chmod 660 is looser than the config's 640 (single-tenant, negligible — align for least-privilege); a redundant models merge before the explicit array overwrite (harmless). Tests adequate (ownership-merge unit, command-ordering + symlink-rejection + non-OpenClaw negative, full backup→restore integration).

yimoj added 2 commits June 11, 2026 17:17
…ebuild restore

OpenClaw's openclaw.json is backed up and selectively merged on rebuild
(NVIDIA#5027), but matching `models.providers` entries were replaced wholesale by
the freshly generated provider block. After a version upgrade + rebuild the
user's tuned model metadata reset to regenerated defaults: `reasoning` flipped
to false, `cost` zeroed, `maxTokens` fell back to 4096, and `compat`
disappeared.

Reconcile matching provider entries by ownership instead: the fresh rebuild
keeps runtime-owned routing/credential fields (baseUrl, api, apiKey) and a
model's routing identity (id/name), while backed-up non-secret model tuning
(reasoning, cost, contextWindow, maxTokens, compat, input, …) is restored.
Provider/model sets and order stay authoritative from the fresh rebuild;
backup-only and id-less stale models are not resurrected. No raw secrets are
restored — only the already-sanitized backup is merged.

Adds a reporter-shaped regression at the merge-helper level and through the
real backupSandboxState/restoreSandboxState path (the functions
`nemoclaw <name> rebuild` invokes), covering provider model metadata and
top-level `mcp.servers`. The regression confirms `mcp.servers` is already
preserved by the generic object merge (the fresh config emits no mcp section),
so no restore/sanitization change was needed for it.

Fixes NVIDIA#5202

Signed-off-by: Yimo Jiang <yimoj@nvidia.com>
…ld restore

Real-CLI E2E (onboard -> tune in-sandbox config -> `nemoclaw <name> rebuild`)
revealed a second layer of NVIDIA#5202 that the merge fix alone did not cover: even
when the restore writes the correctly merged config, OpenClaw's own
config-integrity protection reverts it.

OpenClaw guards openclaw.json with a `.last-good` recovery anchor. On its
integrity check it archives any live config that differs from `.last-good` as
`openclaw.json.clobbered.<ts>` and reverts to `.last-good`. The rebuild restore
wrote openclaw.json (and `.config-hash`) but never refreshed `.last-good`, so
OpenClaw reverted the restored user config to the freshly generated baseline
captured at first boot. The archived `.clobbered` file held the correct merged
config, confirming the merge worked but was undone downstream.

`buildStateFileRestoreCommand` now refreshes the `.last-good` anchor for the
openclaw.json merge restore, staged through a temp + atomic rename and failing
closed BEFORE the live config is swapped, so OpenClaw's integrity watcher never
observes a config that disagrees with its recovery anchor and a partial/failed
anchor write can never leave a stale revert target.

Verified through the real worktree CLI on a live OpenClaw sandbox: after
`node ./bin/nemoclaw.js <name> rebuild`, /sandbox/.openclaw/openclaw.json keeps
the reporter's reasoning/cost/maxTokens/compat/input and mcp.servers with no
`.clobbered` revert, and survives a subsequent container restart. Adds a
command-string unit test (anchor refresh + ordering + fail-closed) and extends
the backup/restore integration test to assert the `.last-good` anchor.

Fixes NVIDIA#5202

Signed-off-by: Yimo Jiang <yimoj@nvidia.com>
@yimoj yimoj force-pushed the fix/5202-openclaw-config-restore-metadata branch from df06f5a to b623586 Compare June 11, 2026 17:28
@cv cv added v0.0.65 Release target and removed v0.0.64 Release target labels Jun 12, 2026
@wscurran wscurran added area: sandbox OpenShell sandbox lifecycle, runtime, config, or recovery bug-fix PR fixes a bug or regression integration: openclaw OpenClaw integration behavior v0.0.64 Release target labels Jun 12, 2026
@wscurran

Copy link
Copy Markdown
Contributor

@cv cv assigned cv and unassigned prekshivyas Jun 12, 2026
@cv cv merged commit c9415e6 into NVIDIA:main Jun 13, 2026
34 checks passed
@miyoungc miyoungc mentioned this pull request Jun 16, 2026
13 tasks
cv pushed a commit that referenced this pull request Jun 17, 2026
## Summary
Refreshes release-prep documentation for NemoClaw v0.0.65.
Adds the v0.0.65 release-notes section and refreshes generated
`nemoclaw-user-*` skills from the Fern MDX source docs.

## Changes
- Added the v0.0.65 release notes to `docs/about/release-notes.mdx` with
links to the deeper docs pages for lifecycle, troubleshooting,
inference, CLI commands, messaging, credentials, network policy, Hermes,
and sub-agents.
- Regenerated the `nemoclaw-user-*` skills with
`scripts/docs-to-skills.py` so release-prep skill output matches the
merged source docs.
- Used the v0.0.65 announcement discussion as release context:
#5472.

## Source Summary
- #2492 -> `docs/about/release-notes.mdx`: Documents deadline-based
gateway wait reliability in the v0.0.65 recovery summary.
- #4958 -> `docs/about/release-notes.mdx`: Documents re-execed OpenClaw
gateway health check recovery in the sandbox recovery summary.
- #5163 -> `docs/about/release-notes.mdx`: Documents safer uninstall TTY
confirmation behavior in the day-two CLI summary.
- #5178 -> `docs/about/release-notes.mdx`: Documents fail-closed config
restore merge behavior in the rebuild and restore summary.
- #5179 -> `docs/about/release-notes.mdx`: Documents WeChat QR token
redaction in the messaging summary.
- #5182 -> `docs/about/release-notes.mdx`: Documents sustained gateway
serving checks in the recovery summary.
- #5194 -> `docs/about/release-notes.mdx`: Documents model-router
teardown during uninstall in the day-two CLI summary.
- #5195 -> `docs/about/release-notes.mdx`: Documents Shields
auto-restore lock reconfirmation in the rebuild and restore summary.
- #5198 -> `docs/about/release-notes.mdx`: Documents Docker Desktop WSL
CDI injection failure handling in the onboarding diagnostics summary.
- #5201 -> `docs/about/release-notes.mdx`: Documents sandbox
download/upload wrappers and sessions export in the day-two CLI summary.
- #5205 -> `docs/about/release-notes.mdx`: Documents reporter-owned
model metadata preservation in the rebuild and restore summary.
- #5214 -> `docs/about/release-notes.mdx`: Documents managed vLLM model
preflight before side effects in the inference setup summary.
- #5215 -> `docs/about/release-notes.mdx`: Documents managed vLLM extra
serve arguments in the inference setup summary.
- #5216 -> `docs/about/release-notes.mdx`: Documents silent OpenClaw
runtime fallback surfacing in the onboarding diagnostics summary.
- #5225 -> `docs/about/release-notes.mdx`: Documents persisted sandbox
gateway lookup in the gateway recovery summary.
- #5238 -> `docs/about/release-notes.mdx`: Documents sub-agent gateway
dial-back through the sandbox interface in the Hermes and sub-agent
summary.
- #5248 -> `docs/about/release-notes.mdx`: Documents Discord per-account
proxy resolution in the messaging summary.
- #5264 -> `docs/about/release-notes.mdx`: Documents reserved Hermes
port `8642` handling in the Hermes compatibility summary.
- #5267 -> `docs/about/release-notes.mdx`: Documents the narrower Hermes
baseline policy in the Hermes compatibility summary.
- #5321 -> `docs/about/release-notes.mdx`: Documents restored gateway
guard chains in the gateway recovery summary.
- #5328 -> `docs/about/release-notes.mdx`: Documents compact persisted
messaging plans in the messaging summary.
- #5338 -> `docs/about/release-notes.mdx`: Documents manifest channel
migration in the messaging summary.
- #5352 -> `docs/about/release-notes.mdx`: Documents persisted agent
preservation through registry recovery in the rebuild and restore
summary.
- #5371 ->
`.agents/skills/nemoclaw-user-reference/references/commands.md`:
Refreshes generated skill output for custom build cache and
layer-ordering source docs.
- #5379 -> `docs/about/release-notes.mdx`: Documents dashboard port
allocation across multiple NemoClaw gateways in the recovery summary.
- #5382 -> `docs/about/release-notes.mdx`: Documents recovery when an
active gateway has no sandbox spec in the recovery summary.
- #5389 ->
`.agents/skills/nemoclaw-user-reference/references/troubleshooting.md`:
Refreshes generated skill output for declared agent `forward_ports`
recovery source docs.
- #5400 -> `docs/about/release-notes.mdx`: Documents bounded compatible
endpoint probes in the inference setup summary.
- #5410 -> `docs/about/release-notes.mdx`: Documents provider credential
hash removal from sandbox registry entries in the messaging summary.
- #5418 -> `docs/about/release-notes.mdx`: Documents summarized
inference validation failures in the onboarding diagnostics summary.
- #5457 -> `docs/about/release-notes.mdx`: Documents context-window
recomputation after runtime model switches in the inference setup
summary.
- #5463 -> `docs/about/release-notes.mdx`: Documents cleanup of
hard-coded messaging channel stragglers in the messaging summary.

## Skipped
- #5366 matched `docs/.docs-skip` entries through skipped experimental
paths, so this PR does not add new release-note text for that commit.

## Type of Change
- [ ] Code change (feature, bug fix, or refactor)
- [ ] Code change with doc updates
- [ ] Doc only (prose changes, no code sample modifications)
- [x] Doc only (includes code sample changes)

## Verification
- [x] Git hooks passed during commit and push, or `npx prek run
--from-ref main --to-ref HEAD` passes
- [ ] Targeted tests pass for changed behavior
- [ ] Full `npm test` passes (broad runtime changes only)
- [ ] Tests added or updated for new or changed behavior
- [x] No secrets, API keys, or credentials committed
- [x] Docs updated for user-facing behavior changes
- [ ] `npm run docs` builds without warnings (doc changes only)
- [x] Doc pages follow the [style
guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md)
(doc changes only)
- [ ] New doc pages include SPDX header and frontmatter (new pages only)

Verification notes:
- `npm run docs` passed after rerunning outside the sandbox. Fern
reported 0 errors and 1 hidden warning.
- The first sandboxed `npm run docs` attempt failed before validation
because `tsx` could not create its local IPC pipe under sandbox
restrictions.
- `npm run build:cli` passed before push to refresh the local `dist/`
artifacts used by the CLI typecheck hook.
- `npm test` was not run because this is a docs-only release refresh.

---
Signed-off-by: Miyoung Choi <miyoungc@nvidia.com>

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Released NemoClaw v0.0.65 with improved gateway/sandbox recovery,
safer day-two workflows, and enhanced Hermes compatibility.
* Added managed vLLM extra-arguments configuration via
`NEMOCLAW_VLLM_EXTRA_ARGS_JSON`.
* Added Hermes troubleshooting guidance for port forwarding and health
checks.

* **Documentation**
* Updated NVIDIA Endpoints/NIM setup and examples to use
`NVIDIA_INFERENCE_API_KEY`.
* Refined NVIDIA network policy and Model Router API base configuration.
* Expanded CLI/environment variable documentation (including sub-agent
gateway connectivity) and plugin build performance tips.

* **Tests**
  * Expanded Vitest-backed E2E release validation coverage.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: sandbox OpenShell sandbox lifecycle, runtime, config, or recovery bug-fix PR fixes a bug or regression integration: openclaw OpenClaw integration behavior v0.0.64 Release target v0.0.65 Release target

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Openclaw config partially restored after nemoclaw <name> rebuild after upgrading nemoclaw version

4 participants