Skip to content

fix(onboard): auto-detect resume from an in_progress session#5487

Open
Dongni-Yang wants to merge 5 commits into
mainfrom
fix/5470-onboard-resume-autodetect
Open

fix(onboard): auto-detect resume from an in_progress session#5487
Dongni-Yang wants to merge 5 commits into
mainfrom
fix/5470-onboard-resume-autodetect

Conversation

@Dongni-Yang

@Dongni-Yang Dongni-Yang commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Summary

Resume mode was keyed only off the --resume flag, so re-running nemoclaw onboard after an interrupted attempt — where ~/.nemoclaw/onboard-session.json has status:"in_progress" — started over instead of resuming (no (resume mode) banner, no resume preflight). This auto-detects resume from the persisted in_progress status.

Related Issue

Refs #5470

Changes

  • src/lib/onboard/entry-options.ts: resolveOnboardEntryOptions now auto-detects resume from a persisted in_progress session (new optional persistedSessionStatus input) when --fresh is not set. --fresh always wins; the explicit --resume/--fresh conflict guard is unchanged (applies only to the explicit flags).
  • src/lib/onboard.ts: pass the persisted session status into the resolver.
  • src/lib/onboard/entry-options.test.ts: red→green coverage for auto-detect, plus no-regression guards (--fresh wins; non-in_progress statuses).
  • test/e2e/test-onboard-resume.sh: add a Phase 3.5 implicit-resume scenario (plain onboard on an in_progress session) and a --fresh suppression check.

Type of Change

  • Code change (feature, bug fix, or refactor)
  • Code change with doc updates
  • Doc only (prose changes, no code sample modifications)
  • Doc only (includes code sample changes)

Verification

  • Tests added or updated for new or changed behavior
  • No secrets, API keys, or credentials committed
Layer Result
Unit (entry-options.test.ts) Resume auto-detect → true for in_progress; --fresh and non-in_progress (incl. pending / "") → no resume. Core case is a true red→green. ✅
Real openshell CLI gateway --help lacks start / destroysupportsLifecycleCommands=false (host-process model — the bug-capable branch). ✅
In-process (real fns) classifyGatewayPortReuse(healthy, lifecycle=false, container=missing)reuse; container never probed, no cleanup; reuse is openshell status-driven, no containerId read. ✅
Live onboard (Ubuntu 20.04, containerized-compat gateway) Plain onboard (no --resume) on an in_progress session prints (resume mode) (the fix), then [resume] Skipping gateway (running) against a healthy :8080 gateway with no stale-cleanup sequence — reproducing both the fix and the unfixed half end-to-end. ✅
onboard-resume-e2e (Ubuntu 24.04 CI) Explicit --resume plus the new Phase 3.5 implicit-resume + --fresh-suppression assertions all pass. ✅

Scope / notes: NemoClaw writes/reads no gateway.json or containerId in the onboard reuse path (grep-confirmed) — that artifact is OpenShell-owned, so the stale-gateway cleanup symptom is upstream. It also can't be tightened naively without regressing #4520 (the inverse "false-stale → :8080 collision" bug, whose fix deliberately made reuse more trusting). Hence Refs, not Closes. The cloud-onboard-e2e failure is the inference-provider credential/env issue the report itself calls "environmental and separate from this bug" (it fails on a fresh onboard at [3/8], where this change's resume path never runs).


Signed-off-by: Dongni Yang dongniy@nvidia.com

Summary by CodeRabbit

  • New Features
    • Onboarding now infers resume mode from a persisted session state of in_progress, reducing the need for manual --resume.
    • --fresh continues to override and prevents auto-resume even when a persisted in_progress session exists.
  • Tests / Quality
    • Added unit tests for resume auto-detection across persisted session states and for --fresh suppression.
    • Added an end-to-end “implicit resume” scenario, including validation that cached steps are skipped in resume mode and that --fresh fails early during preflight without entering resume mode.

Resume mode was keyed only off the --resume flag (entry-options.ts), so
re-running `nemoclaw onboard` after an interrupted attempt — where
~/.nemoclaw/onboard-session.json has status:"in_progress" — started over
instead of resuming: no "(resume mode)" banner and no resume preflight.

Auto-detect resume from the persisted in_progress status when --fresh is not
set. --fresh always wins, and the explicit --resume/--fresh conflict guard is
unchanged (it applies only to the explicit flags), so no existing flow regresses.

Scope: this resolves the resume-detection symptom of #5470. The stale
gateway.json reuse symptom is rooted in OpenShell's package-managed gateway
lifecycle — gateway.json is OpenShell-owned and NemoClaw reuses on live
`openshell status` health, so forcing a teardown would regress the legitimate
host-process reuse path. Left out deliberately; hence Refs (not Closes).

Local pre-commit Test (CLI) hook skipped via --no-verify (operator-approved):
it fails only on environment-flaky tests (ollama proxy 5s timeout, /tmp symlink)
unrelated to this change; entry-options 9/9 and the onboard suite 1876/1876 pass,
typecheck and biome clean. CI re-runs all hooks on the PR.

Refs #5470

Signed-off-by: Dongni Yang <dongniy@nvidia.com>
@coderabbitai

coderabbitai Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

OnboardEntryOptionsInput gains an optional persistedSessionStatus field. resolveOnboardEntryOptions now auto-enables resume when that field equals "in_progress" and --fresh is not set, adjusting the --resume/--fresh conflict guard accordingly. onboard() supplies the value from onboardSession.loadSession()?.status. Unit and e2e tests cover all new auto-resume paths.

Changes

Persisted Session Auto-Resume

Layer / File(s) Summary
Input contract and resume resolution logic
src/lib/onboard/entry-options.ts
Adds optional persistedSessionStatus?: string | null to OnboardEntryOptionsInput. Updates resolveOnboardEntryOptions to compute explicitResume, restricts the --resume/--fresh conflict guard to when both are explicitly set, and infers resume: true from persistedSessionStatus === "in_progress" when fresh is absent.
Wire persistedSessionStatus in onboard()
src/lib/onboard.ts
Passes persistedSessionStatus: onboardSession.loadSession()?.status ?? null into the resolveOnboardEntryOptions call inside onboard().
Unit tests for auto-resume scenarios
src/lib/onboard/entry-options.test.ts
Adds three test cases: "in_progress" triggers auto-resume without explicit --resume; --fresh suppresses auto-resume even when status is "in_progress"; statuses "complete", "failed", null, and undefined do not trigger auto-resume. Reorders import destructuring alongside test additions.
E2E tests for auto-resume behavior
test/e2e/test-onboard-resume.sh
Adds Phase 3.5 coverage verifying implicit resume activation when the persisted session status is "in_progress" without requiring explicit --resume, asserts cached steps are skipped, and confirms --fresh suppresses auto-resume behavior even with "in_progress" status.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related issues

Possibly related PRs

  • NVIDIA/NemoClaw#5161: Modifies the same onboard.tsresolveOnboardEntryOptions delegation path and extends the --resume/--fresh mutual-exclusion logic that this PR further updates.
  • NVIDIA/NemoClaw#5158: Refactors onboard() to use the resolveOnboardEntryOptions resolver that this PR extends with persistedSessionStatus support.

Suggested labels

area: onboarding, area: e2e, v0.0.64

🐇 A session was sleeping, its status half-done,
So I peeked at its state and let resume be won.
If --fresh waves its flag, auto-resume must yield,
But "in_progress" whispers: the path stands revealed!
Hop, hop — the onboarding flows smooth as a bun! 🥕

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'fix(onboard): auto-detect resume from an in_progress session' accurately and concisely summarizes the main change: implementing auto-detection of resume mode from persisted in_progress session status.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/5470-onboard-resume-autodetect

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-code-quality

github-code-quality Bot commented Jun 16, 2026

Copy link
Copy Markdown

Code Coverage Overview

Languages: TypeScript

TypeScript / code-coverage/plugin

The overall coverage in the branch is 96%. Coverage data for the branch is not yet available.

Show a code coverage summary of the most covered files.
File 970a832 +/-
nemoclaw/src/se...cret-scanner.ts 100%
nemoclaw/src/commands/slash.ts 100%
nemoclaw/src/li...bprocess-env.ts 100%
nemoclaw/src/bl...eprint/state.ts 98%
nemoclaw/src/onboard/config.ts 98%
nemoclaw/src/bl...int/snapshot.ts 97%
nemoclaw/src/bl...print/runner.ts 95%
nemoclaw/src/co...ration-state.ts 94%
nemoclaw/src/bl...ate-networks.ts 94%
nemoclaw/src/index.ts 94%

TypeScript / code-coverage/cli

The overall coverage in the branch is 46%. Coverage data for the branch is not yet available.

Show a code coverage summary of the most covered files.
File 970a832 +/-
src/lib/state/o...oard-session.ts 90%
src/lib/inference/local.ts 76%
src/lib/sandbox/config.ts 72%
src/lib/actions...dbox/rebuild.ts 67%
src/lib/onboard/preflight.ts 64%
src/lib/actions...licy-channel.ts 56%
src/lib/state/sandbox.ts 55%
src/lib/policy/index.ts 49%
src/lib/onboard...er-gpu-patch.ts 44%
src/lib/onboard.ts 18%

Updated June 16, 2026 03:27 UTC
Code Coverage is in Public Preview. Learn more and provide us with your feedback.

@github-actions

github-actions Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

E2E Advisor Recommendation

Required E2E: onboard-resume-e2e
Optional E2E: onboard-negative-paths-e2e

Dispatch hint: onboard-resume-e2e,onboard-negative-paths-e2e

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

  • onboard-resume-e2e (medium; ubuntu-latest, timeout 60 minutes, uses hosted inference secret): This is the direct E2E coverage for interrupted onboarding resume. The PR modifies resume detection and extends test/e2e/test-onboard-resume.sh with implicit plain-onboard resume and --fresh suppression checks, so this job should be merge-blocking.

Optional E2E

  • onboard-negative-paths-e2e (low-to-medium; ubuntu-latest negative-path onboarding checks): Useful adjacent confidence because resolveOnboardEntryOptions also owns CLI flag/env validation for non-interactive onboarding, --from, --name, reserved names, and mutually exclusive flags. Not required because the changed path is specifically resume-session behavior covered by onboard-resume-e2e.

New E2E recommendations

  • None.

Dispatch hint

  • Workflow: .github/workflows/nightly-e2e.yaml
  • jobs input: onboard-resume-e2e,onboard-negative-paths-e2e

@github-actions

github-actions Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Vitest E2E Scenario Recommendation

Required Vitest E2E scenarios: ubuntu-repo-cloud-openclaw
Optional Vitest E2E scenarios: None

Dispatch required Vitest E2E scenarios:

  • gh workflow run e2e-vitest-scenarios.yaml --ref <pr-head-ref> --field scenarios=ubuntu-repo-cloud-openclaw

Workflow run

Full Vitest E2E advisor summary

Vitest E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: medium

Required Vitest E2E scenarios

  • ubuntu-repo-cloud-openclaw: The PR changes onboard entry-option resolution and passes persisted onboard session status into the live onboard path. The smallest live-supported typed Vitest scenario that exercises the affected onboarding path on the supported Ubuntu Docker OpenClaw surface is ubuntu-repo-cloud-openclaw. The resume-specific typed scenario is not live-supported by the trusted runtime-support configuration, and the existing free-standing onboard-resume Vitest file is not wired into e2e-vitest-scenarios.yaml, so it cannot be used as the required dispatch.
    • Dispatch: gh workflow run e2e-vitest-scenarios.yaml --ref <pr-head-ref> --field scenarios=ubuntu-repo-cloud-openclaw

Optional Vitest E2E scenarios

  • None.

Relevant changed files

  • src/lib/onboard.ts
  • src/lib/onboard/entry-options.ts

@github-actions

github-actions Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

PR Review Advisor

Findings: 1 needs attention, 3 worth checking, 1 nice ideas
Since last review: 0 prior items resolved, 4 still apply, 0 new items found

Review findings

🛠️ Needs attention

  • Offset the onboard.ts monolith growth (src/lib/onboard.ts:4749): This PR changes the active onboarding hotspot while the branch increases src/lib/onboard.ts by 39 lines. The functional diff in this file is small, but the drift context shows this file is under substantial recent extraction/refactor pressure, so net growth conflicts with the current monolith ratchet.
    • Recommendation: Move any added helper or test-support logic out of src/lib/onboard.ts, or otherwise offset the branch's net growth in this file before merge.
    • Evidence: Drift context reports src/lib/onboard.ts baseLines 5469, headLines 5508, delta +39, severity blocker, with extensive recent onboarding refactor history. The current diff still touches onboard.ts to pass persistedSessionStatus into resolveOnboardEntryOptions.

🔎 Worth checking

  • Source-of-truth review needed: Auto-detected onboarding resume from persisted session status: The advisor marked localized patch analysis as needs_followup.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: entry-options.ts adds persistedSessionStatus and exact-match sessionInProgress; onboard.ts supplies onboardSession.loadSession()?.status ?? null; session-bootstrap.ts still contains the downstream checks for resumable sessions, conflicts, and recoverable sandbox names.
  • Clarify and pin the source of truth for implicit resume validity (src/lib/onboard/entry-options.ts:65): The resolver now selects resume mode from only persisted session status, while the actual validity checks remain later in session-bootstrap.ts. That split appears intentional and downstream guards still reject non-resumable sessions, config conflicts, and unrecoverable non-interactive resumes, but the PR does not fully pin the status-vs-validity contract for tampered or invalid persisted state.
    • Recommendation: Document in code or tests that persisted status only selects the resume path and that session-bootstrap remains authoritative for resume validity. Add regression coverage for invalid/tampered persisted records on the implicit path.
    • Evidence: entry-options.ts computes sessionInProgress from input.persistedSessionStatus === "in_progress" and sets resume = !fresh && (explicitResume || sessionInProgress). onboard.ts passes onboardSession.loadSession()?.status ?? null. session-bootstrap.ts still rejects !session or resumable === false and checks resume config conflicts, but the new implicit path lacks direct negative coverage for those guardrails.
  • Add negative coverage for implicit resume guard paths (src/lib/onboard.ts:4767): A plain nemoclaw onboard can now enter resume mode based on persisted local session state. Positive resolver tests and the shell E2E cover auto-resume and --fresh suppression, but the changed trust boundary still lacks targeted negative runtime or integration coverage for invalid/tampered persisted sessions and conflict handling.
    • Recommendation: Add targeted runtime or integration-style tests that exercise the implicit path through the existing downstream guards: non-resumable in_progress sessions, malformed/missing sessions, sandbox and Dockerfile conflicts, and legacy credential migration hash mismatches.
    • Evidence: New tests cover resolveOnboardEntryOptions for in_progress, --fresh, and non-in_progress statuses, plus an E2E positive implicit resume path. Existing session-bootstrap.ts guardrails remain downstream, but this PR does not directly test plain onboard with status:"in_progress", resumable:false, malformed/missing sessions, conflicting --name/--from, or credential cleanup hash mismatch.

🌱 Nice ideas

  • Update the credential migration comment for implicit resume (src/lib/onboard.ts:4807): The credential migration comment still says migration state is carried forward only when the user is explicitly continuing via --resume. After this PR, the same branch also runs when resume mode is auto-detected from a persisted in_progress session.
    • Recommendation: Revise the comment to describe resume mode generally, including auto-detected resume from an in-progress session, while preserving the explanation that hash validation and the cleanup gate prevent stale persisted state from satisfying credential cleanup.
    • Evidence: if (resume) now covers both explicit --resume and implicit resume from persistedSessionStatus === "in_progress", but the nearby comment still says "explicitly continuing the same attempt via `--resume`".
Consider writing more tests for
  • **Runtime validation** — Plain `onboard` with persisted `status:"in_progress", resumable:false` exits through the no-resumable-session guard and does not create or reuse a sandbox.. The changed behavior routes process-level onboarding and sandbox/session recovery based on persisted local state. Unit tests cover resolver semantics and the shell E2E covers the positive happy path, but negative runtime/integration validation would better pin the security-sensitive boundary.
  • **Runtime validation** — Plain `onboard` with a missing or malformed onboard session file stays out of resume mode and starts a fresh session.. The changed behavior routes process-level onboarding and sandbox/session recovery based on persisted local state. Unit tests cover resolver semantics and the shell E2E covers the positive happy path, but negative runtime/integration validation would better pin the security-sensitive boundary.
  • **Runtime validation** — Plain `onboard` with persisted `status:"in_progress"` and a conflicting requested sandbox name exits through resume conflict reporting.. The changed behavior routes process-level onboarding and sandbox/session recovery based on persisted local state. Unit tests cover resolver semantics and the shell E2E covers the positive happy path, but negative runtime/integration validation would better pin the security-sensitive boundary.
  • **Runtime validation** — Plain `onboard` with persisted `status:"in_progress"` and a conflicting `--from` Dockerfile exits through resume conflict reporting.. The changed behavior routes process-level onboarding and sandbox/session recovery based on persisted local state. Unit tests cover resolver semantics and the shell E2E covers the positive happy path, but negative runtime/integration validation would better pin the security-sensitive boundary.
  • **Runtime validation** — Implicit resume with staged legacy credentials removes `~/.nemoclaw/credentials.json` only when persisted `migratedLegacyValueHashes` match the current staged values.. The changed behavior routes process-level onboarding and sandbox/session recovery based on persisted local state. Unit tests cover resolver semantics and the shell E2E covers the positive happy path, but negative runtime/integration validation would better pin the security-sensitive boundary.
  • **Add negative coverage for implicit resume guard paths** — Add targeted runtime or integration-style tests that exercise the implicit path through the existing downstream guards: non-resumable in_progress sessions, malformed/missing sessions, sandbox and Dockerfile conflicts, and legacy credential migration hash mismatches.
  • **Acceptance clause:** Refs [All Platforms][Onboard] Onboard resume mode fails to clean up stale gateway metadata when host-level gateway still owns port 8080 #5470 — add test evidence or identify existing coverage. The deterministic validation context did not include issue [All Platforms][Onboard] Onboard resume mode fails to clean up stale gateway metadata when host-level gateway still owns port 8080 #5470 body or comments, so literal linked-issue acceptance clauses could not be extracted. The PR diff and tests do provide evidence for the PR-described resume-detection behavior.
  • **Auto-detected onboarding resume from persisted session status** — Positive implicit resume and --fresh suppression are covered, but negative tests for non-resumable in_progress records, malformed/missing sessions, config conflicts, and credential migration hash mismatch are still missing on the implicit path.. entry-options.ts adds persistedSessionStatus and exact-match sessionInProgress; onboard.ts supplies onboardSession.loadSession()?.status ?? null; session-bootstrap.ts still contains the downstream checks for resumable sessions, conflicts, and recoverable sandbox names.
Since last review details

Current findings:

  • Source-of-truth review needed: Auto-detected onboarding resume from persisted session status: The advisor marked localized patch analysis as needs_followup.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: entry-options.ts adds persistedSessionStatus and exact-match sessionInProgress; onboard.ts supplies onboardSession.loadSession()?.status ?? null; session-bootstrap.ts still contains the downstream checks for resumable sessions, conflicts, and recoverable sandbox names.
  • Offset the onboard.ts monolith growth (src/lib/onboard.ts:4749): This PR changes the active onboarding hotspot while the branch increases src/lib/onboard.ts by 39 lines. The functional diff in this file is small, but the drift context shows this file is under substantial recent extraction/refactor pressure, so net growth conflicts with the current monolith ratchet.
    • Recommendation: Move any added helper or test-support logic out of src/lib/onboard.ts, or otherwise offset the branch's net growth in this file before merge.
    • Evidence: Drift context reports src/lib/onboard.ts baseLines 5469, headLines 5508, delta +39, severity blocker, with extensive recent onboarding refactor history. The current diff still touches onboard.ts to pass persistedSessionStatus into resolveOnboardEntryOptions.
  • Clarify and pin the source of truth for implicit resume validity (src/lib/onboard/entry-options.ts:65): The resolver now selects resume mode from only persisted session status, while the actual validity checks remain later in session-bootstrap.ts. That split appears intentional and downstream guards still reject non-resumable sessions, config conflicts, and unrecoverable non-interactive resumes, but the PR does not fully pin the status-vs-validity contract for tampered or invalid persisted state.
    • Recommendation: Document in code or tests that persisted status only selects the resume path and that session-bootstrap remains authoritative for resume validity. Add regression coverage for invalid/tampered persisted records on the implicit path.
    • Evidence: entry-options.ts computes sessionInProgress from input.persistedSessionStatus === "in_progress" and sets resume = !fresh && (explicitResume || sessionInProgress). onboard.ts passes onboardSession.loadSession()?.status ?? null. session-bootstrap.ts still rejects !session or resumable === false and checks resume config conflicts, but the new implicit path lacks direct negative coverage for those guardrails.
  • Add negative coverage for implicit resume guard paths (src/lib/onboard.ts:4767): A plain nemoclaw onboard can now enter resume mode based on persisted local session state. Positive resolver tests and the shell E2E cover auto-resume and --fresh suppression, but the changed trust boundary still lacks targeted negative runtime or integration coverage for invalid/tampered persisted sessions and conflict handling.
    • Recommendation: Add targeted runtime or integration-style tests that exercise the implicit path through the existing downstream guards: non-resumable in_progress sessions, malformed/missing sessions, sandbox and Dockerfile conflicts, and legacy credential migration hash mismatches.
    • Evidence: New tests cover resolveOnboardEntryOptions for in_progress, --fresh, and non-in_progress statuses, plus an E2E positive implicit resume path. Existing session-bootstrap.ts guardrails remain downstream, but this PR does not directly test plain onboard with status:"in_progress", resumable:false, malformed/missing sessions, conflicting --name/--from, or credential cleanup hash mismatch.
  • Update the credential migration comment for implicit resume (src/lib/onboard.ts:4807): The credential migration comment still says migration state is carried forward only when the user is explicitly continuing via --resume. After this PR, the same branch also runs when resume mode is auto-detected from a persisted in_progress session.
    • Recommendation: Revise the comment to describe resume mode generally, including auto-detected resume from an in-progress session, while preserving the explanation that hash validation and the cleanup gate prevent stale persisted state from satisfying credential cleanup.
    • Evidence: if (resume) now covers both explicit --resume and implicit resume from persistedSessionStatus === "in_progress", but the nearby comment still says "explicitly continuing the same attempt via `--resume`".

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
src/lib/onboard/entry-options.ts (1)

21-21: ⚡ Quick win

Tighten persistedSessionStatus to a literal union instead of string.

Using string | null weakens the cross-file contract and allows typos to compile. Narrow this to known session status literals (or a shared exported status type) so auto-resume conditions stay type-safe.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/lib/onboard/entry-options.ts` at line 21, The persistedSessionStatus
field is typed as string | null, which is too permissive and allows any string
value, including typos. Replace this broad string type with a literal union of
specific session status values (for example, 'active' | 'paused' | 'resumed') or
reference a shared exported status type that captures the valid session states.
This ensures type safety at compile time and prevents invalid status values from
being assigned to persistedSessionStatus.
src/lib/onboard.ts (1)

4750-4752: Run the onboarding E2E suite for this core-flow change before merge.

Given this touches onboarding entry behavior, run the recommended targeted nightly E2E jobs to validate resume/fresh interactions across lifecycle paths.

gh workflow run nightly-e2e.yaml --ref <branch> -f jobs=cloud-e2e,sandbox-operations-e2e,rebuild-openclaw-e2e,channels-stop-start-e2e,messaging-compatible-endpoint-e2e,bedrock-runtime-compatible-anthropic-e2e,hermes-discord-e2e,hermes-slack-e2e,openshell-gateway-upgrade-e2e,issue-3600-gpu-proof-optional-e2e

As per coding guidelines, src/lib/onboard.ts changes should run the recommended onboarding E2E coverage set.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/lib/onboard.ts` around lines 4750 - 4752, Before merging the changes to
src/lib/onboard.ts that modify the onboarding entry behavior with
persistedSessionStatus initialization for resume detection, you must run the
recommended onboarding E2E test coverage to validate that resume and fresh
interaction paths work correctly across different lifecycle scenarios. Execute
the provided gh workflow command to run the nightly E2E jobs (cloud-e2e,
sandbox-operations-e2e, rebuild-openclaw-e2e, channels-stop-start-e2e,
messaging-compatible-endpoint-e2e, bedrock-runtime-compatible-anthropic-e2e,
hermes-discord-e2e, hermes-slack-e2e, openshell-gateway-upgrade-e2e, and
issue-3600-gpu-proof-optional-e2e) on your branch to ensure the onboarding
resume/fresh interaction changes are validated before merge.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/lib/onboard.ts`:
- Around line 4750-4751: The file `src/lib/onboard.ts` has exceeded the growth
guard threshold with a net addition of 2 lines, which blocks CI from passing. To
fix this, you need to offset the 2 newly added lines (the comment and the
persistedSessionStatus assignment) by removing or trimming 2 equivalent lines of
non-functional content elsewhere in the same file, such as unnecessary comments,
blank lines, or redundant whitespace. Ensure the deletions do not affect any
functional code logic.

---

Nitpick comments:
In `@src/lib/onboard.ts`:
- Around line 4750-4752: Before merging the changes to src/lib/onboard.ts that
modify the onboarding entry behavior with persistedSessionStatus initialization
for resume detection, you must run the recommended onboarding E2E test coverage
to validate that resume and fresh interaction paths work correctly across
different lifecycle scenarios. Execute the provided gh workflow command to run
the nightly E2E jobs (cloud-e2e, sandbox-operations-e2e, rebuild-openclaw-e2e,
channels-stop-start-e2e, messaging-compatible-endpoint-e2e,
bedrock-runtime-compatible-anthropic-e2e, hermes-discord-e2e, hermes-slack-e2e,
openshell-gateway-upgrade-e2e, and issue-3600-gpu-proof-optional-e2e) on your
branch to ensure the onboarding resume/fresh interaction changes are validated
before merge.

In `@src/lib/onboard/entry-options.ts`:
- Line 21: The persistedSessionStatus field is typed as string | null, which is
too permissive and allows any string value, including typos. Replace this broad
string type with a literal union of specific session status values (for example,
'active' | 'paused' | 'resumed') or reference a shared exported status type that
captures the valid session states. This ensures type safety at compile time and
prevents invalid status values from being assigned to persistedSessionStatus.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 93057bc6-750d-48d8-9238-4ef0d841519d

📥 Commits

Reviewing files that changed from the base of the PR and between 4a7b4c8 and 6b0241d.

📒 Files selected for processing (3)
  • src/lib/onboard.ts
  • src/lib/onboard/entry-options.test.ts
  • src/lib/onboard/entry-options.ts

Comment thread src/lib/onboard.ts Outdated
@github-actions

Copy link
Copy Markdown
Contributor

Selective E2E Results — ✅ All requested jobs passed

Run: 27589221225
Target ref: fix/5470-onboard-resume-autodetect
Requested jobs: onboard-resume-e2e,onboard-repair-e2e,double-onboard-e2e,onboard-negative-paths-e2e
Summary: 4 passed, 0 failed, 0 cancelled, 0 skipped

Job Result
double-onboard-e2e ✅ success
onboard-negative-paths-e2e ✅ success
onboard-repair-e2e ✅ success
onboard-resume-e2e ✅ success

…ests

- Offset the codebase-growth-guardrails check: src/lib/onboard.ts must be
  net-neutral. Drop the inline comment (the rationale already lives on
  entry-options.ts) and a decorative blank line so the file is +1/-1 (net 0)
  while keeping the persistedSessionStatus wiring.
- Widen the non-resume boundary test to cover "pending" and "" in addition to
  complete/failed/null/undefined, so only an exact "in_progress" auto-resumes.

Refs #5470

Signed-off-by: Dongni Yang <dongniy@nvidia.com>
test-onboard-resume.sh only exercised explicit `nemoclaw onboard --resume`.
Add a phase that re-marks the session in_progress and runs a plain `onboard`
(no --resume), asserting the "(resume mode)" banner and skipped cached steps,
then verifies `--fresh` suppresses the auto-resume (fail-fast at preflight to
stay cheap). Closes the implicit-resume gap flagged by the e2e advisor on #5470.

Refs #5470

Signed-off-by: Dongni Yang <dongniy@nvidia.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/e2e/test-onboard-resume.sh`:
- Around line 375-390: The test in the --fresh branch injects a preflight
failure using NEMOCLAW_E2E_FORCE_FAIL_AT_STEP=preflight but never verifies the
injection actually occurred. The current code only checks that "(resume mode)"
is absent, which is insufficient. Modify the test to assert both that the
command exits with a non-zero status (indicating the preflight failure) and that
the expected failure marker appears in the output. Remove the || true from the
nemoclaw.js invocation to capture the exit code, then add assertions to verify
the non-zero exit code and check for the expected preflight failure marker in
fresh_output to ensure the injection regressed detection works correctly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: a093d1cc-2baf-4383-b3a0-75555d7a0f63

📥 Commits

Reviewing files that changed from the base of the PR and between 6b0241d and 55a0a12.

📒 Files selected for processing (3)
  • src/lib/onboard.ts
  • src/lib/onboard/entry-options.test.ts
  • test/e2e/test-onboard-resume.sh
💤 Files with no reviewable changes (1)
  • src/lib/onboard.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/lib/onboard/entry-options.test.ts

Comment thread test/e2e/test-onboard-resume.sh
…heck

CodeRabbit: the --fresh branch only checked that "(resume mode)" was absent,
which could pass vacuously if the run failed for an unrelated reason or never
reached preflight. Drop the `|| true`, capture the exit code, and assert the run
exited non-zero with the injected "Forced onboarding failure at step 'preflight'"
marker, so the banner-absence assertion is meaningful.

Refs #5470

Signed-off-by: Dongni Yang <dongniy@nvidia.com>
@github-actions

Copy link
Copy Markdown
Contributor

Selective E2E Results — ❌ Some jobs failed

Run: 27591288160
Target ref: fix/5470-onboard-resume-autodetect
Requested jobs: onboard-resume-e2e,cloud-onboard-e2e
Summary: 0 passed, 2 failed, 0 cancelled, 0 skipped

Job Result
cloud-onboard-e2e ❌ failure
onboard-resume-e2e ❌ failure

Failed jobs: cloud-onboard-e2e, onboard-resume-e2e. Check run artifacts for logs.

The Phase 3.5 implicit-resume check flipped the post-Phase-3 session's status
to in_progress but left resumable=false (set when onboarding completes), so the
resume machine correctly rejected it as "no resumable session" and exited before
the banner — failing the assertions. A genuinely interrupted session is
resumable; reset resumable=true so the synthesized state matches and plain
onboard auto-resumes it. (The fix itself was sound — auto-detect fired, which is
why the resume guard ran at all.)

Refs #5470

Signed-off-by: Dongni Yang <dongniy@nvidia.com>
@github-actions

Copy link
Copy Markdown
Contributor

Selective E2E Results — ✅ All requested jobs passed

Run: 27591945815
Target ref: fix/5470-onboard-resume-autodetect
Requested jobs: onboard-resume-e2e
Summary: 1 passed, 0 failed, 0 cancelled, 0 skipped

Job Result
onboard-resume-e2e ✅ success

@Dongni-Yang Dongni-Yang added the v0.0.66 Release target label Jun 16, 2026
@Dongni-Yang

Copy link
Copy Markdown
Contributor Author

E2E status

  • onboard-resume-e2e ✅ green (run 27591945815) — covers explicit --resume plus the new Phase 3.5 implicit-resume assertions (plain onboard auto-detecting an in_progress session) and --fresh suppression.

  • cloud-onboard-e2e ❌ — pre-existing/environmental, not from this PR. It fails at [3/8] Configuring inference provider ("Validation details omitted… Public install failed") on a fresh onboard. This change only takes effect when a persisted onboard-session.json is in_progress (→ resume mode); a fresh onboard has resume=false, so the new code path never executes and cannot produce a [3/8] inference-credential failure. The issue itself notes this [3/8] failure is "environmental and separate from this bug." It's a CI inference-credential/endpoint matter, not a code change in scope here.

Scope reminder

Closing keyword stays Refs #5470 (not Closes): this fixes the resume-detection symptom (the (resume mode) banner now auto-engages, verified live). The stale-gateway / port-8080 reuse symptom is rooted in OpenShell's package-managed gateway lifecycle (NemoClaw reuses on live openshell status health and reads no gateway.json/containerId), and tightening it risks regressing #4520 (the inverse "false-stale → :8080 collision" bug).

Signed-off-by: Dongni Yang dongniy@nvidia.com

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

v0.0.66 Release target

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant