fix(install): Carl-grade install reliability — close the broken-merge gap#968
fix(install): Carl-grade install reliability — close the broken-merge gap#968
Conversation
#950 merged with the install path on Mac doing a hidden 5-15min Rust source build despite the README claiming "Docker-first: pulls pre-built images, no compilation needed." Existing CI gates (verify-architectures, verify-after-rebuild, validate, install-and-run-gate) all passed because they validate image presence + revision labels + service health — but they never exercised Carl's actual install command + first chat message. This doc plans the work to close that gap on this PR (fix/install-carl-mac-windows). Six pieces: A. Carl-install validation in CI — fresh ubuntu runner runs the same `curl install.sh | bash` Carl runs, then chat-smoke + image- smoke validate clean response shape (no <tool_use> XML, no vision hallucination, no name-prefix leak). B. Mac-mode install rationalization — fix the README/install.sh mismatch (default to docker-only on Mac matching the README; source build moves behind CONTINUUM_DEV=1 flag). C. Browser smoke (puppeteer) — catch chrome-error://chromewebdata traps from too-fast browser open. D. install.sh idempotence + friendly retry on partial-failure resume. E. Browser pre-open delay — install.sh waits for widget-server /health before `open http://localhost:9003/` so Carl never sees a chrome-error page. F. Friendlier first-fail messaging — phase-named errors with 1-line guidance + clipboard log path. Rollout: smoke ships ADVISORY for 1 week, flips to REQUIRED via the PrimaryBranches ruleset after <2% false-fail rate confirmed. Then no future PR can break Carl's install without explicit bypass (which the team's standing rule forbids per Joel). Coordination split documented per platform. anvil drives mac+CI smoke, green drives Windows-native parity, bigmama drives Linux/CUDA + future self-hosted GPU runner. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Adds a “Carl-grade CI” plan document to close the gap between the README’s “docker-first / no compilation” promise and what Carl actually experiences when running curl install.sh | bash, by outlining CI validation that exercises the real install path plus first-chat / vision / browser checks.
Changes:
- Add
docs/CARL-CI-PLAN.mddescribing the install reliability gap and proposed CI + rollout plan. - Document proposed
carl-install-and-chat-smokejob and follow-up work items (Mac-mode rationalization, puppeteer smoke, idempotence, friendlier failures).
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| testing easy." Every gate is a one-line shell invocation any of us can | ||
| run locally in 30 seconds. | ||
|
|
||
| ## What lands in THIS PR |
There was a problem hiding this comment.
This section reads as if the smoke job + Mac-mode changes are already included in the current PR (e.g., “What lands in THIS PR” / “This PR adds…”), but the PR currently appears to only add the plan doc. Consider rewording to “What’s planned on this branch” (or similar) to avoid misleading readers until the implementation lands.
| ## What lands in THIS PR | |
| ## What's planned on this branch |
|
|
||
| ## Success criteria | ||
|
|
||
| - [ ] Carl-install-and-chat-smoke runs on every PR; passes for unchanged- |
There was a problem hiding this comment.
The checklist item breaks the phrase with a literal hyphen (“unchanged-” on one line and “install diffs” on the next), which reads like a typo in rendered Markdown. Consider removing the hyphen and letting Markdown wrap naturally (or rephrasing to avoid the split).
| - [ ] Carl-install-and-chat-smoke runs on every PR; passes for unchanged- | |
| - [ ] Carl-install-and-chat-smoke runs on every PR; passes for unchanged |
…kills chrome-error trap) Carl's experience hinges on this gate. Empirically: 2026-04-25 joel hit "Unsafe attempt to load URL http://localhost:9003/ from frame with URL chrome-error://chromewebdata/" exactly because install.sh opened the browser before widget-server was actually serving HTTP. Chrome lands on the failed URL, replaces the location bar with chrome-error://chromewebdata/, and any subsequent reload tries to navigate from chrome-error back to http: — which the browser blocks as a cross-scheme navigation. Carl is then stuck on an error page with no clean recovery path. Two changes vs the prior 'curl -sf' wait at /: 1. Hit /health specifically (widget-server's JTAGEndpoints.HEALTH = '/health'). A 200 here means widget-server is actually serving HTTP, not just that the port is open. The old check (-sf on /) returned success on any response — including 502, 503, or partial responses from a half-ready server. /health with --fail asserts a real OK. 2. If we never get a 200 in HEALTH_TIMEOUT_SEC (default 120s, was hardcoded 60s), DO NOT open the browser. Print actionable diagnostic instead: - logs/status commands the user can run - retry curl one-liner - the URL to open manually once /health is 200 Opening a browser to a not-yet-ready server is the bug; refusing to open is the correct behavior. Carl is better served by an actionable error than by a silent chrome-error trap. Per-probe --max-time 2 keeps the loop near 1s cadence even when the server hangs (vs blocking 30+s on a half-stuck connection like the old loop could). Doesn't depend on B.1/B.2 (the docker-only-vs-hybrid call). Pure addition; no architectural conflict either way. Carl-CI plan piece E (per docs/CARL-CI-PLAN.md). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…asserts page renders usable HTML The headline structural fix from docs/CARL-CI-PLAN.md piece A. What changes: - New scripts/ci/carl-install-smoke.sh (169 lines) — runs the EXACT `curl -fsSL <install.sh> | bash` command Carl runs (against this PR's HEAD SHA), then probes /health + the root page Carl will open. Same one-line invocation works for CI and humans (per Joel's "make your own testing easy" rule). - New .github/workflows/carl-install-smoke.yml — runs the smoke on PRs to canary/main when install/docker-related paths change. Path filter keeps it from re-running on TS-only diffs. What it catches that existing gates miss: - install.sh fails partway through (today: silent — install-and-run-gate uses CONTINUUM_IMAGE_TAG env, doesn't run install.sh) - install.sh succeeds but the page Carl opens is empty / contains chrome-error markers / "Cannot GET /" / stack trace HTML - README's "Docker-first: no compilation needed" claim violated by a hidden source-build path adding 5-15min to install (this gate fails on the 25min CARL_INSTALL_TIMEOUT_SEC cap — by design) Negative-marker checks on the served page: chrome-error, container exited, ECONNREFUSED, Cannot GET /, Internal Server Error Any of these in the body = gate fails. Carl-perspective: if Carl would see something broken, the smoke says broken. Status: ADVISORY for the first week of operation per CARL-CI-PLAN.md rollout. Does NOT block merge yet — runs but reports advisory. After 1 week of <2% false-fail rate, flip to REQUIRED via PrimaryBranches ruleset PUT (a single gh api call). At that point no future PR can land that breaks Carl's install path without explicit --no-verify (which the team's standing rule forbids per Joel). Doesn't depend on B.1/B.2 (the Mac docker-only-vs-hybrid call). Pure addition; smoke validates whatever install.sh does end-to-end. If B.1 lands, smoke passes faster (no source build). If B.2 lands, smoke keeps failing on the timeout — surfacing the README claim as actively mis-advertised, which is what the team needs to know to fix the messaging. Carl-CI plan piece A (per docs/CARL-CI-PLAN.md). Pieces D, F still queued; piece E (browser pre-open /health gate) shipped at 2071eae. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…guidance
Carl-CI plan piece F. Empirically (2026-04-25): existing install.sh
failures dump bash's last line of stderr with no context. Carl can't
tell if it's a Docker thing, a Tailscale thing, a model-download thing,
or a Rust build thing without reading install.sh source.
Changes:
1. Add PHASE variable updated as install.sh enters each section
(10 phases instrumented: detect environment, pre-clone bootstrap,
clone/update repo, shared modules, configuration, TLS certs, compose
files, pull images, start support services, widget-server health,
open browser).
2. ERR trap (on_install_fail) prints a structured failure block:
- Which phase died + the bash exit code
- Phase-specific 1-line guidance (network? docker daemon? GHCR auth?
run mkdir -p X? CONTINUUM_NO_TLS=1 to skip optional?)
- Path to the full log
- Last 30 lines of the log inline
3. INSTALL_LOG capture via `exec > >(tee -a "$INSTALL_LOG") 2>&1`
so the trap has the full transcript even when the failure happens
in a subshell. Default path /tmp/continuum-install-$$.log;
overridable via INSTALL_LOG env.
The phase_guidance dispatch is intentionally narrow — one-line
suggestions per phase, not multi-paragraph troubleshooting. Carl gets
ONE thing to try; if that fails, the open-an-issue path captures the
full log via gh CLI.
Doesn't depend on B.1/B.2. Pure addition. After this lands, Carl who
hits ANY install failure gets:
- Which step failed (vs cryptic bash stderr)
- One thing to try (vs reading the script)
- A clipboardable log path (vs scrollback hunting)
Carl-CI plan pieces shipped on this branch: A (carl-install-smoke),
E (browser-pre-open /health gate), F (this). Pending: B (Mac docker-only
default — needs joel B.1/B.2 call), D (idempotence audit — install.sh
mostly already handles this; small gaps to verify).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ocked from containers)
Reading install.sh:118-123 surfaced the architectural reality I missed in
the original plan: Apple's hypervisor blocks GPU passthrough to containers
(confirmed by Docker Feb 2026, comment in install.sh). Mac MUST run
continuum-core natively for Metal acceleration. The 5-15min Rust build is
architectural, not a bug.
So B.1 (default install to docker-only on all platforms) isn't a choice
we have. Going with B.2: README updated to admit the hybrid split:
- Linux: docker-first, no compilation (matches existing claim)
- Mac: docker for support services + native continuum-core for Metal
(~10min first build, incremental after; no separate command, no flag)
Considered B.3 (ship two install commands, one per OS) — rejected: more
docs surface, fragments the support story.
README update + install.sh banner-on-Mac messaging are next on this PR
(pending joel's confirmation of B.2 over B.3). Smoke shipped at piece A
already accommodates either choice via the 25min CARL_INSTALL_TIMEOUT_SEC
default.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The actual user-facing widget-server port is 9003 everywhere it matters: docker-compose.yml publishes 9003:9003, the Dockerfile EXPOSEs 9003, install.sh's success banner uses :9003, and the carl-install-smoke gate probes :9003. But bootstrap.sh's success banner and install.ps1's post-install message both told the user to open :9000 — so a user following the printed instruction would hit "connection refused" and conclude the install was broken. Affects Toby's Windows path most acutely (install.ps1 → WSL bootstrap.sh both print :9000) and any Linux user who arrives via bootstrap.sh. The HTTP_PORT=9000 in install.sh's config.env writer is a separate question — that value is written to ~/.continuum/config.env but the deploy uses JTAG_HTTP_PORT=9003 from docker-compose.yml directly. The config-file value is unused decoration; not touching it here. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… silent 5-15min) Carl/Memento's reported experience: install.sh prints "First build detected — this takes 5-15 minutes. Showing progress..." then total silence for the entire compile, which is exactly the window in which a fresh validator Ctrl+C's because nothing seems to be happening. Root cause was in parallel-start.sh's cargo invocation pattern. Even with CARGO_QUIET="" on first build, every cargo call was wrapped in $(cargo build ... 2>&1) which buffers all output until cargo exits. The banner promised progress but $() ate it. Fix: introduce build_pkg() helper. On incremental builds (CARGO_QUIET set) keeps the original capture-then-display behavior so the build log stays clean. On first builds, tee's cargo's stdout to the terminal AND a temp file — user sees "Compiling crate-name vX.Y.Z" lines stream live, while $OUT still gets populated for preflight_check_cargo_xcode and the failure- display path. PIPESTATUS preserves cargo's actual exit code through the tee pipe. Validated: bash -n syntax-clean, npm run build:ts still passes, no behavior change for incremental rebuilds (which is what every CI run hits since target/release/continuum-core-server already exists in the build cache). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Why
#950 merged with the Mac install path doing a hidden 5-15min Rust source build despite the README claiming "Docker-first: pulls pre-built images, no compilation needed." The CI gates that existed (verify-architectures, verify-after-rebuild, validate, install-and-run-gate) all passed because they validated image presence + revision label + service health on a CI-only docker-compose. They never exercised
curl install.sh | bash— Carl's actual entry point — and they never asserted the page Carl opens after install renders usable HTML.This PR closes that gap with a new CI gate + four UX fixes to install.sh's user-facing surface.
What
docs/CARL-CI-PLAN.mdrollout, then flips to required.compose down -vbeforecompose up -dso a Carl-Ctrl+C'd partial install resumes cleanly instead of choking on name-conflict errors.:9000but the widget-server runs on:9003. Toby (Windows) and any Linux user via bootstrap.sh would have hit connection-refused on first try.$(cargo build ... 2>&1)was buffering output during the 5-15min cold compile. Newbuild_pkg()helper tees on first-build, captures-quietly on incremental, preserves cargo's exit via PIPESTATUS. Memento's M1 install will see steady "Compiling crate-name vX.Y.Z" progress instead of a silent multi-minute block.Validation
carl-install-smoke (linux/amd64): PASS in 1m39s on the latest commitvalidate: PASSbash -non parallel-start.sh: cleannpm run build:ts: cleanOut of scope (deferred)