fix(state): preserve reporter-owned model metadata on openclaw.json rebuild restore#5205
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (5)
🚧 Files skipped from review as they are similar to previous changes (3)
📝 WalkthroughWalkthroughEnforces fresh runtime-owned provider/model fields while restoring reporter-owned model tuning and durable MCP server settings, exports a safer state-file restore command with staged ChangesOpenClaw Config Restore Ownership and Merge
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
prekshivyas
left a comment
There was a problem hiding this comment.
Reviewed (code + 9-cat security), verified against source at the head SHA.
✅ Approve — correct, well-scoped fix for #5202. Reconciles models.providers by ownership during rebuild restore (fresh rebuild keeps routing/credentials + model id/name; sanitized backup restores non-secret tuning like reasoning/cost/maxTokens/compat), and refreshes OpenClaw's .last-good recovery anchor before the live config swap so the integrity watcher doesn't revert the merge.
Verified: mergeOpenClawProviderMap/...Entry/...ModelArray exist and replace the prior wholesale-replace behavior that was the real cause of the bug; restoreRuntimeOwnedFields deletes a field when absent from current, so a stale backup apiKey/baseUrl can't resurrect; the backup is sanitized at backup time (sanitizeConfigFile) so adding mcp to durable sections inherits already-scrubbed content (the snapshot test asserts GITHUB_TOKEN === [STRIPPED_BY_MIGRATION]); and the .last-good anchor is staged via temp + atomic same-dir rename, symlink-rejected, fail-closed before the swap.
Security: all 9 pass (symlink rejection fail-closed; no secret leakage; anchor-before-swap closes the integrity-watcher race). Nits: .last-good chmod 660 is looser than the config's 640 (single-tenant, negligible — align for least-privilege); a redundant models merge before the explicit array overwrite (harmless). Tests adequate (ownership-merge unit, command-ordering + symlink-rejection + non-OpenClaw negative, full backup→restore integration).
…ebuild restore OpenClaw's openclaw.json is backed up and selectively merged on rebuild (NVIDIA#5027), but matching `models.providers` entries were replaced wholesale by the freshly generated provider block. After a version upgrade + rebuild the user's tuned model metadata reset to regenerated defaults: `reasoning` flipped to false, `cost` zeroed, `maxTokens` fell back to 4096, and `compat` disappeared. Reconcile matching provider entries by ownership instead: the fresh rebuild keeps runtime-owned routing/credential fields (baseUrl, api, apiKey) and a model's routing identity (id/name), while backed-up non-secret model tuning (reasoning, cost, contextWindow, maxTokens, compat, input, …) is restored. Provider/model sets and order stay authoritative from the fresh rebuild; backup-only and id-less stale models are not resurrected. No raw secrets are restored — only the already-sanitized backup is merged. Adds a reporter-shaped regression at the merge-helper level and through the real backupSandboxState/restoreSandboxState path (the functions `nemoclaw <name> rebuild` invokes), covering provider model metadata and top-level `mcp.servers`. The regression confirms `mcp.servers` is already preserved by the generic object merge (the fresh config emits no mcp section), so no restore/sanitization change was needed for it. Fixes NVIDIA#5202 Signed-off-by: Yimo Jiang <yimoj@nvidia.com>
…ld restore Real-CLI E2E (onboard -> tune in-sandbox config -> `nemoclaw <name> rebuild`) revealed a second layer of NVIDIA#5202 that the merge fix alone did not cover: even when the restore writes the correctly merged config, OpenClaw's own config-integrity protection reverts it. OpenClaw guards openclaw.json with a `.last-good` recovery anchor. On its integrity check it archives any live config that differs from `.last-good` as `openclaw.json.clobbered.<ts>` and reverts to `.last-good`. The rebuild restore wrote openclaw.json (and `.config-hash`) but never refreshed `.last-good`, so OpenClaw reverted the restored user config to the freshly generated baseline captured at first boot. The archived `.clobbered` file held the correct merged config, confirming the merge worked but was undone downstream. `buildStateFileRestoreCommand` now refreshes the `.last-good` anchor for the openclaw.json merge restore, staged through a temp + atomic rename and failing closed BEFORE the live config is swapped, so OpenClaw's integrity watcher never observes a config that disagrees with its recovery anchor and a partial/failed anchor write can never leave a stale revert target. Verified through the real worktree CLI on a live OpenClaw sandbox: after `node ./bin/nemoclaw.js <name> rebuild`, /sandbox/.openclaw/openclaw.json keeps the reporter's reasoning/cost/maxTokens/compat/input and mcp.servers with no `.clobbered` revert, and survives a subsequent container restart. Adds a command-string unit test (anchor refresh + ordering + fail-closed) and extends the backup/restore integration test to assert the `.last-good` anchor. Fixes NVIDIA#5202 Signed-off-by: Yimo Jiang <yimoj@nvidia.com>
df06f5a to
b623586
Compare
## Summary Refreshes release-prep documentation for NemoClaw v0.0.65. Adds the v0.0.65 release-notes section and refreshes generated `nemoclaw-user-*` skills from the Fern MDX source docs. ## Changes - Added the v0.0.65 release notes to `docs/about/release-notes.mdx` with links to the deeper docs pages for lifecycle, troubleshooting, inference, CLI commands, messaging, credentials, network policy, Hermes, and sub-agents. - Regenerated the `nemoclaw-user-*` skills with `scripts/docs-to-skills.py` so release-prep skill output matches the merged source docs. - Used the v0.0.65 announcement discussion as release context: #5472. ## Source Summary - #2492 -> `docs/about/release-notes.mdx`: Documents deadline-based gateway wait reliability in the v0.0.65 recovery summary. - #4958 -> `docs/about/release-notes.mdx`: Documents re-execed OpenClaw gateway health check recovery in the sandbox recovery summary. - #5163 -> `docs/about/release-notes.mdx`: Documents safer uninstall TTY confirmation behavior in the day-two CLI summary. - #5178 -> `docs/about/release-notes.mdx`: Documents fail-closed config restore merge behavior in the rebuild and restore summary. - #5179 -> `docs/about/release-notes.mdx`: Documents WeChat QR token redaction in the messaging summary. - #5182 -> `docs/about/release-notes.mdx`: Documents sustained gateway serving checks in the recovery summary. - #5194 -> `docs/about/release-notes.mdx`: Documents model-router teardown during uninstall in the day-two CLI summary. - #5195 -> `docs/about/release-notes.mdx`: Documents Shields auto-restore lock reconfirmation in the rebuild and restore summary. - #5198 -> `docs/about/release-notes.mdx`: Documents Docker Desktop WSL CDI injection failure handling in the onboarding diagnostics summary. - #5201 -> `docs/about/release-notes.mdx`: Documents sandbox download/upload wrappers and sessions export in the day-two CLI summary. - #5205 -> `docs/about/release-notes.mdx`: Documents reporter-owned model metadata preservation in the rebuild and restore summary. - #5214 -> `docs/about/release-notes.mdx`: Documents managed vLLM model preflight before side effects in the inference setup summary. - #5215 -> `docs/about/release-notes.mdx`: Documents managed vLLM extra serve arguments in the inference setup summary. - #5216 -> `docs/about/release-notes.mdx`: Documents silent OpenClaw runtime fallback surfacing in the onboarding diagnostics summary. - #5225 -> `docs/about/release-notes.mdx`: Documents persisted sandbox gateway lookup in the gateway recovery summary. - #5238 -> `docs/about/release-notes.mdx`: Documents sub-agent gateway dial-back through the sandbox interface in the Hermes and sub-agent summary. - #5248 -> `docs/about/release-notes.mdx`: Documents Discord per-account proxy resolution in the messaging summary. - #5264 -> `docs/about/release-notes.mdx`: Documents reserved Hermes port `8642` handling in the Hermes compatibility summary. - #5267 -> `docs/about/release-notes.mdx`: Documents the narrower Hermes baseline policy in the Hermes compatibility summary. - #5321 -> `docs/about/release-notes.mdx`: Documents restored gateway guard chains in the gateway recovery summary. - #5328 -> `docs/about/release-notes.mdx`: Documents compact persisted messaging plans in the messaging summary. - #5338 -> `docs/about/release-notes.mdx`: Documents manifest channel migration in the messaging summary. - #5352 -> `docs/about/release-notes.mdx`: Documents persisted agent preservation through registry recovery in the rebuild and restore summary. - #5371 -> `.agents/skills/nemoclaw-user-reference/references/commands.md`: Refreshes generated skill output for custom build cache and layer-ordering source docs. - #5379 -> `docs/about/release-notes.mdx`: Documents dashboard port allocation across multiple NemoClaw gateways in the recovery summary. - #5382 -> `docs/about/release-notes.mdx`: Documents recovery when an active gateway has no sandbox spec in the recovery summary. - #5389 -> `.agents/skills/nemoclaw-user-reference/references/troubleshooting.md`: Refreshes generated skill output for declared agent `forward_ports` recovery source docs. - #5400 -> `docs/about/release-notes.mdx`: Documents bounded compatible endpoint probes in the inference setup summary. - #5410 -> `docs/about/release-notes.mdx`: Documents provider credential hash removal from sandbox registry entries in the messaging summary. - #5418 -> `docs/about/release-notes.mdx`: Documents summarized inference validation failures in the onboarding diagnostics summary. - #5457 -> `docs/about/release-notes.mdx`: Documents context-window recomputation after runtime model switches in the inference setup summary. - #5463 -> `docs/about/release-notes.mdx`: Documents cleanup of hard-coded messaging channel stragglers in the messaging summary. ## Skipped - #5366 matched `docs/.docs-skip` entries through skipped experimental paths, so this PR does not add new release-note text for that commit. ## Type of Change - [ ] Code change (feature, bug fix, or refactor) - [ ] Code change with doc updates - [ ] Doc only (prose changes, no code sample modifications) - [x] Doc only (includes code sample changes) ## Verification - [x] Git hooks passed during commit and push, or `npx prek run --from-ref main --to-ref HEAD` passes - [ ] Targeted tests pass for changed behavior - [ ] Full `npm test` passes (broad runtime changes only) - [ ] Tests added or updated for new or changed behavior - [x] No secrets, API keys, or credentials committed - [x] Docs updated for user-facing behavior changes - [ ] `npm run docs` builds without warnings (doc changes only) - [x] Doc pages follow the [style guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md) (doc changes only) - [ ] New doc pages include SPDX header and frontmatter (new pages only) Verification notes: - `npm run docs` passed after rerunning outside the sandbox. Fern reported 0 errors and 1 hidden warning. - The first sandboxed `npm run docs` attempt failed before validation because `tsx` could not create its local IPC pipe under sandbox restrictions. - `npm run build:cli` passed before push to refresh the local `dist/` artifacts used by the CLI typecheck hook. - `npm test` was not run because this is a docs-only release refresh. --- Signed-off-by: Miyoung Choi <miyoungc@nvidia.com> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Released NemoClaw v0.0.65 with improved gateway/sandbox recovery, safer day-two workflows, and enhanced Hermes compatibility. * Added managed vLLM extra-arguments configuration via `NEMOCLAW_VLLM_EXTRA_ARGS_JSON`. * Added Hermes troubleshooting guidance for port forwarding and health checks. * **Documentation** * Updated NVIDIA Endpoints/NIM setup and examples to use `NVIDIA_INFERENCE_API_KEY`. * Refined NVIDIA network policy and Model Router API base configuration. * Expanded CLI/environment variable documentation (including sub-agent gateway connectivity) and plugin build performance tips. * **Tests** * Expanded Vitest-backed E2E release validation coverage. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
Summary
After a NemoClaw upgrade +
nemoclaw <name> rebuild, an OpenClaw sandbox's reporter-tunedopenclaw.jsonmodel metadata reset to regenerated defaults. Fixing this required two layers: (1) the rebuild restore merge replaced matchingmodels.providersentries wholesale, and (2) even after a correct merge, OpenClaw's own config-integrity protection reverted the restored config to its.last-goodbaseline. This PR fixes both, verified end-to-end through the realnemoclawCLI on a live sandbox.Related Issue
Fixes #5202
Changes
src/lib/state/openclaw-config-merge.ts: reconcile matchingmodels.providersentries by ownership instead of replacing them wholesale. The fresh rebuild keeps runtime-owned routing/credentials (baseUrl,api,apiKey) and a model's identity (id,name); the backup restores user-owned non-secret tuning (reasoning,cost,contextWindow,maxTokens,compat,input). Fresh rebuild stays authoritative for the provider/model set and order; backup-only/id-less stale models are not resurrected; backup-only providers are inherited. No raw secrets restored.src/lib/state/sandbox.ts:buildStateFileRestoreCommandnow refreshes OpenClaw's.last-goodrecovery anchor for theopenclaw.jsonmerge restore — staged through a temp + atomic rename and failing closed before the live config is swapped. Without this, OpenClaw archives the restored config asopenclaw.json.clobbered.<ts>and reverts to the freshly generated baseline.openclaw-config-merge.test.ts), a command-string unit test for the anchor refresh + ordering + fail-closed (state-file-restore-command.test.ts), and an extended backup/restore integration test through the realbackupSandboxState()/restoreSandboxState()path asserting both the merged metadata,mcp.serversretention, and the.last-goodanchor (openclaw-config-snapshot.test.ts).Notes on
mcp.serversThe reporter also saw
mcp.serversbecome{}. The fresh generated config emits nomcpsection, so the backed-upmcpsurvives the generic object merge — confirmed live (fresh in-sandbox config showedmcp: null) and locked in by the tests. The merge layer needed no change for it; the.last-goodrevert was what dropped it end-to-end, which layer 2 fixes.Type of Change
Verification
node ./bin/nemoclaw.js onboard --non-interactive --yes --no-gpu --name i5202e2e— created a real OpenClaw sandbox (cloud inference, no GPU).node ./bin/nemoclaw.js i5202e2e exec -- ...— tuned the in-sandbox/sandbox/.openclaw/openclaw.jsonto the reporter state (reasoning:true, customcost,maxTokens:32768,compat,mcp.servers.filesystem).node ./bin/nemoclaw.js i5202e2e rebuild --yes— the user-facing operation under test (backup → recreate → restore-merge →.last-goodrefresh)./sandbox/.openclaw/openclaw.json: keptreasoning:true,maxTokens:32768,cost {0.5,1.5,0.1,0.2},compat,input ["text","image"],mcp.servers.filesystem; runtime fields fresh (baseUrl https://inference.local/v1,apiKey unused, freshid/name); zeroopenclaw.json.clobbered.*files. Survived a subsequent container restart (OpenClaw integrity check re-ran). Reproduced cleanly across two consecutive fixed rebuilds.reasoning:false/maxTokens:4096/nocompat/nomcp, with the correctly-merged config archived asopenclaw.json.clobbered.*— proving the merge worked but OpenClaw reverted it.npx vitest runon the affected unit + integration tests — passing.tsc -p tsconfig.cli.jsontypecheck passes.npx @biomejs/biome checkclean on changed files.codex review --uncommittedclean.Signed-off-by: Yimo Jiang yimoj@nvidia.com
Summary by CodeRabbit