Skip to content

fix(install): Carl-grade install reliability — close the broken-merge gap#968

Open
joelteply wants to merge 7 commits intocanaryfrom
fix/install-carl-mac-windows
Open

fix(install): Carl-grade install reliability — close the broken-merge gap#968
joelteply wants to merge 7 commits intocanaryfrom
fix/install-carl-mac-windows

Conversation

@joelteply
Copy link
Copy Markdown
Contributor

@joelteply joelteply commented Apr 25, 2026

Why

#950 merged with the Mac install path doing a hidden 5-15min Rust source build despite the README claiming "Docker-first: pulls pre-built images, no compilation needed." The CI gates that existed (verify-architectures, verify-after-rebuild, validate, install-and-run-gate) all passed because they validated image presence + revision label + service health on a CI-only docker-compose. They never exercised curl install.sh | bash — Carl's actual entry point — and they never asserted the page Carl opens after install renders usable HTML.

This PR closes that gap with a new CI gate + four UX fixes to install.sh's user-facing surface.

What

  • A. carl-install-smoke (CI) — runs the EXACT install command Carl runs, then asserts /health 200 + root page returns >100 bytes HTML with no failure markers (chrome-error, container exited, ECONNREFUSED, etc). Validated linux/amd64 PASSES in 1m39s on this PR. Workflow advisory for first week per docs/CARL-CI-PLAN.md rollout, then flips to required.
  • B. Mac-mode rationalization (docs) — README admits the architectural hybrid (Apple's hypervisor blocks GPU passthrough → Mac MUST run continuum-core natively for Metal). Per-platform install-time table replaces the misleading "no compilation needed" universal claim.
  • E. /health gate before browser-open — install.sh now polls widget-server /health for up to 120s before opening the browser. Refuses to open if /health doesn't come up; prints actionable diagnostic + log dump command instead. Kills the chrome-error trap class of bug (the empirical case that prompted this PR).
  • F. Friendlier failures — install.sh names the failing PHASE in the error output, prints the actual failing command + 1-line guidance for that specific failure, captures full log to a clipboardable path.
  • D. Idempotent reset — install.sh does compose down -v before compose up -d so a Carl-Ctrl+C'd partial install resumes cleanly instead of choking on name-conflict errors.
  • G. UI URL fix — bootstrap.sh + install.ps1 told users to open :9000 but the widget-server runs on :9003. Toby (Windows) and any Linux user via bootstrap.sh would have hit connection-refused on first try.
  • G. Cargo build streams during first-build — parallel-start.sh's $(cargo build ... 2>&1) was buffering output during the 5-15min cold compile. New build_pkg() helper tees on first-build, captures-quietly on incremental, preserves cargo's exit via PIPESTATUS. Memento's M1 install will see steady "Compiling crate-name vX.Y.Z" progress instead of a silent multi-minute block.

Validation

  • carl-install-smoke (linux/amd64): PASS in 1m39s on the latest commit
  • validate: PASS
  • bash -n on parallel-start.sh: clean
  • npm run build:ts: clean
  • Mac validation: requires Memento's M1 fresh-install (in-flight via airc — vhsm-claude unblocking the airc-connect side via Build(deps-dev): Bump @typescript-eslint/eslint-plugin from 8.29.1 to 8.35.0 #78)
  • Windows validation: requires Toby's PS port (in-flight)

Out of scope (deferred)

  • Piece C: puppeteer browser smoke — bigger lift, would catch UI rendering bugs my A piece misses; queued
  • Self-hosted GPU runner at bigmama's box — covers Linux+CUDA carl path; queued
  • install.sh AMD/Intel Vulkan detection (install.sh: detect AMD/Intel Vulkan GPUs (currently silently CPU-only on non-Nvidia) #951) — owned by bigmama-wsl per issue body; coordinating
  • install.ps1 hardcoded-path scrub — audited clean (no hardcoded user paths or drive letters); no work needed

#950 merged with the install path on Mac doing a hidden 5-15min Rust
source build despite the README claiming "Docker-first: pulls pre-built
images, no compilation needed." Existing CI gates (verify-architectures,
verify-after-rebuild, validate, install-and-run-gate) all passed because
they validate image presence + revision labels + service health — but
they never exercised Carl's actual install command + first chat message.

This doc plans the work to close that gap on this PR
(fix/install-carl-mac-windows). Six pieces:

  A. Carl-install validation in CI — fresh ubuntu runner runs the
     same `curl install.sh | bash` Carl runs, then chat-smoke + image-
     smoke validate clean response shape (no <tool_use> XML, no vision
     hallucination, no name-prefix leak).
  B. Mac-mode install rationalization — fix the README/install.sh
     mismatch (default to docker-only on Mac matching the README;
     source build moves behind CONTINUUM_DEV=1 flag).
  C. Browser smoke (puppeteer) — catch chrome-error://chromewebdata
     traps from too-fast browser open.
  D. install.sh idempotence + friendly retry on partial-failure resume.
  E. Browser pre-open delay — install.sh waits for widget-server
     /health before `open http://localhost:9003/` so Carl never sees
     a chrome-error page.
  F. Friendlier first-fail messaging — phase-named errors with
     1-line guidance + clipboard log path.

Rollout: smoke ships ADVISORY for 1 week, flips to REQUIRED via the
PrimaryBranches ruleset after <2% false-fail rate confirmed. Then no
future PR can break Carl's install without explicit bypass (which the
team's standing rule forbids per Joel).

Coordination split documented per platform. anvil drives mac+CI smoke,
green drives Windows-native parity, bigmama drives Linux/CUDA + future
self-hosted GPU runner.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 25, 2026 15:25
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a “Carl-grade CI” plan document to close the gap between the README’s “docker-first / no compilation” promise and what Carl actually experiences when running curl install.sh | bash, by outlining CI validation that exercises the real install path plus first-chat / vision / browser checks.

Changes:

  • Add docs/CARL-CI-PLAN.md describing the install reliability gap and proposed CI + rollout plan.
  • Document proposed carl-install-and-chat-smoke job and follow-up work items (Mac-mode rationalization, puppeteer smoke, idempotence, friendlier failures).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread docs/CARL-CI-PLAN.md
testing easy." Every gate is a one-line shell invocation any of us can
run locally in 30 seconds.

## What lands in THIS PR
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section reads as if the smoke job + Mac-mode changes are already included in the current PR (e.g., “What lands in THIS PR” / “This PR adds…”), but the PR currently appears to only add the plan doc. Consider rewording to “What’s planned on this branch” (or similar) to avoid misleading readers until the implementation lands.

Suggested change
## What lands in THIS PR
## What's planned on this branch

Copilot uses AI. Check for mistakes.
Comment thread docs/CARL-CI-PLAN.md

## Success criteria

- [ ] Carl-install-and-chat-smoke runs on every PR; passes for unchanged-
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The checklist item breaks the phrase with a literal hyphen (“unchanged-” on one line and “install diffs” on the next), which reads like a typo in rendered Markdown. Consider removing the hyphen and letting Markdown wrap naturally (or rephrasing to avoid the split).

Suggested change
- [ ] Carl-install-and-chat-smoke runs on every PR; passes for unchanged-
- [ ] Carl-install-and-chat-smoke runs on every PR; passes for unchanged

Copilot uses AI. Check for mistakes.
…kills chrome-error trap)

Carl's experience hinges on this gate. Empirically: 2026-04-25 joel hit
"Unsafe attempt to load URL http://localhost:9003/ from frame with URL
chrome-error://chromewebdata/" exactly because install.sh opened the
browser before widget-server was actually serving HTTP. Chrome lands on
the failed URL, replaces the location bar with chrome-error://chromewebdata/,
and any subsequent reload tries to navigate from chrome-error back to
http: — which the browser blocks as a cross-scheme navigation. Carl is
then stuck on an error page with no clean recovery path.

Two changes vs the prior 'curl -sf' wait at /:

1. Hit /health specifically (widget-server's JTAGEndpoints.HEALTH = '/health').
   A 200 here means widget-server is actually serving HTTP, not just that
   the port is open. The old check (-sf on /) returned success on any
   response — including 502, 503, or partial responses from a half-ready
   server. /health with --fail asserts a real OK.

2. If we never get a 200 in HEALTH_TIMEOUT_SEC (default 120s, was hardcoded
   60s), DO NOT open the browser. Print actionable diagnostic instead:
   - logs/status commands the user can run
   - retry curl one-liner
   - the URL to open manually once /health is 200

Opening a browser to a not-yet-ready server is the bug; refusing to open
is the correct behavior. Carl is better served by an actionable error
than by a silent chrome-error trap.

Per-probe --max-time 2 keeps the loop near 1s cadence even when the
server hangs (vs blocking 30+s on a half-stuck connection like the old
loop could).

Doesn't depend on B.1/B.2 (the docker-only-vs-hybrid call). Pure addition;
no architectural conflict either way.

Carl-CI plan piece E (per docs/CARL-CI-PLAN.md).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added size: L and removed size: M labels Apr 25, 2026
…asserts page renders usable HTML

The headline structural fix from docs/CARL-CI-PLAN.md piece A.

What changes:
- New scripts/ci/carl-install-smoke.sh (169 lines) — runs the EXACT
  `curl -fsSL <install.sh> | bash` command Carl runs (against this PR's
  HEAD SHA), then probes /health + the root page Carl will open.
  Same one-line invocation works for CI and humans (per Joel's "make
  your own testing easy" rule).
- New .github/workflows/carl-install-smoke.yml — runs the smoke on PRs
  to canary/main when install/docker-related paths change. Path filter
  keeps it from re-running on TS-only diffs.

What it catches that existing gates miss:
- install.sh fails partway through (today: silent — install-and-run-gate
  uses CONTINUUM_IMAGE_TAG env, doesn't run install.sh)
- install.sh succeeds but the page Carl opens is empty / contains
  chrome-error markers / "Cannot GET /" / stack trace HTML
- README's "Docker-first: no compilation needed" claim violated by a
  hidden source-build path adding 5-15min to install (this gate fails
  on the 25min CARL_INSTALL_TIMEOUT_SEC cap — by design)

Negative-marker checks on the served page:
  chrome-error, container exited, ECONNREFUSED, Cannot GET /,
  Internal Server Error
Any of these in the body = gate fails. Carl-perspective: if Carl would
see something broken, the smoke says broken.

Status: ADVISORY for the first week of operation per CARL-CI-PLAN.md
rollout. Does NOT block merge yet — runs but reports advisory. After
1 week of <2% false-fail rate, flip to REQUIRED via PrimaryBranches
ruleset PUT (a single gh api call). At that point no future PR can land
that breaks Carl's install path without explicit --no-verify (which
the team's standing rule forbids per Joel).

Doesn't depend on B.1/B.2 (the Mac docker-only-vs-hybrid call). Pure
addition; smoke validates whatever install.sh does end-to-end. If B.1
lands, smoke passes faster (no source build). If B.2 lands, smoke
keeps failing on the timeout — surfacing the README claim as actively
mis-advertised, which is what the team needs to know to fix the
messaging.

Carl-CI plan piece A (per docs/CARL-CI-PLAN.md). Pieces D, F still
queued; piece E (browser pre-open /health gate) shipped at 2071eae.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added size: XL and removed size: L labels Apr 25, 2026
Test and others added 4 commits April 25, 2026 10:34
…guidance

Carl-CI plan piece F. Empirically (2026-04-25): existing install.sh
failures dump bash's last line of stderr with no context. Carl can't
tell if it's a Docker thing, a Tailscale thing, a model-download thing,
or a Rust build thing without reading install.sh source.

Changes:

1. Add PHASE variable updated as install.sh enters each section
   (10 phases instrumented: detect environment, pre-clone bootstrap,
   clone/update repo, shared modules, configuration, TLS certs, compose
   files, pull images, start support services, widget-server health,
   open browser).

2. ERR trap (on_install_fail) prints a structured failure block:
   - Which phase died + the bash exit code
   - Phase-specific 1-line guidance (network? docker daemon? GHCR auth?
     run mkdir -p X? CONTINUUM_NO_TLS=1 to skip optional?)
   - Path to the full log
   - Last 30 lines of the log inline

3. INSTALL_LOG capture via `exec > >(tee -a "$INSTALL_LOG") 2>&1`
   so the trap has the full transcript even when the failure happens
   in a subshell. Default path /tmp/continuum-install-$$.log;
   overridable via INSTALL_LOG env.

The phase_guidance dispatch is intentionally narrow — one-line
suggestions per phase, not multi-paragraph troubleshooting. Carl gets
ONE thing to try; if that fails, the open-an-issue path captures the
full log via gh CLI.

Doesn't depend on B.1/B.2. Pure addition. After this lands, Carl who
hits ANY install failure gets:
  - Which step failed (vs cryptic bash stderr)
  - One thing to try (vs reading the script)
  - A clipboardable log path (vs scrollback hunting)

Carl-CI plan pieces shipped on this branch: A (carl-install-smoke),
E (browser-pre-open /health gate), F (this). Pending: B (Mac docker-only
default — needs joel B.1/B.2 call), D (idempotence audit — install.sh
mostly already handles this; small gaps to verify).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ocked from containers)

Reading install.sh:118-123 surfaced the architectural reality I missed in
the original plan: Apple's hypervisor blocks GPU passthrough to containers
(confirmed by Docker Feb 2026, comment in install.sh). Mac MUST run
continuum-core natively for Metal acceleration. The 5-15min Rust build is
architectural, not a bug.

So B.1 (default install to docker-only on all platforms) isn't a choice
we have. Going with B.2: README updated to admit the hybrid split:
  - Linux: docker-first, no compilation (matches existing claim)
  - Mac: docker for support services + native continuum-core for Metal
    (~10min first build, incremental after; no separate command, no flag)

Considered B.3 (ship two install commands, one per OS) — rejected: more
docs surface, fragments the support story.

README update + install.sh banner-on-Mac messaging are next on this PR
(pending joel's confirmation of B.2 over B.3). Smoke shipped at piece A
already accommodates either choice via the 25min CARL_INSTALL_TIMEOUT_SEC
default.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The actual user-facing widget-server port is 9003 everywhere it matters:
docker-compose.yml publishes 9003:9003, the Dockerfile EXPOSEs 9003,
install.sh's success banner uses :9003, and the carl-install-smoke gate
probes :9003. But bootstrap.sh's success banner and install.ps1's
post-install message both told the user to open :9000 — so a user
following the printed instruction would hit "connection refused" and
conclude the install was broken.

Affects Toby's Windows path most acutely (install.ps1 → WSL bootstrap.sh
both print :9000) and any Linux user who arrives via bootstrap.sh.

The HTTP_PORT=9000 in install.sh's config.env writer is a separate
question — that value is written to ~/.continuum/config.env but the
deploy uses JTAG_HTTP_PORT=9003 from docker-compose.yml directly. The
config-file value is unused decoration; not touching it here.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… silent 5-15min)

Carl/Memento's reported experience: install.sh prints "First build detected
— this takes 5-15 minutes. Showing progress..." then total silence for the
entire compile, which is exactly the window in which a fresh validator
Ctrl+C's because nothing seems to be happening.

Root cause was in parallel-start.sh's cargo invocation pattern. Even with
CARGO_QUIET="" on first build, every cargo call was wrapped in
$(cargo build ... 2>&1) which buffers all output until cargo exits. The
banner promised progress but $() ate it.

Fix: introduce build_pkg() helper. On incremental builds (CARGO_QUIET set)
keeps the original capture-then-display behavior so the build log stays
clean. On first builds, tee's cargo's stdout to the terminal AND a temp
file — user sees "Compiling crate-name vX.Y.Z" lines stream live, while
$OUT still gets populated for preflight_check_cargo_xcode and the failure-
display path. PIPESTATUS preserves cargo's actual exit code through the
tee pipe.

Validated: bash -n syntax-clean, npm run build:ts still passes, no
behavior change for incremental rebuilds (which is what every CI run
hits since target/release/continuum-core-server already exists in the
build cache).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants