feat: install reliability — skills + verify-personas + GPU/IPC/seed hardening#913
feat: install reliability — skills + verify-personas + GPU/IPC/seed hardening#913
Conversation
Existing cmd_update always ran `docker compose build --parallel` which takes 30+ minutes per variant (CUDA image was 1h42m on GHA). That's the wrong default for Carl: they installed via prebuilt `:latest` images from ghcr and just want to pull the newer ones, not rebuild from source. New behavior: - `continuum update` (default, Carl path): git pull + docker compose pull + up -d. ~30s on warm cache. Also refreshes the forged Qwen3.5 in DMR via `docker model pull` (idempotent, no-op if DMR CLI isn't present). This is the HF-card-to-working-chat user's natural update. - `continuum update --dev` (or `--build`): current behavior — git pull + docker compose build + up -d. Needed when touching Rust/TS source. `continuum update --help` prints the distinction. Header usage comment updated to reflect both paths. Fall-through failure on Carl pull prints the exact dev-path command as remediation so a dev who ran it by mistake has an immediate next step. Branch: test/install-e2e-mac (post-PR891 e2e dogfood + surfaced fixes). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Ship the first continuum Claude Code skill alongside the CLI update change, so AI-in-IDE users can invoke `/continuum:update` from any project without context-switching to a terminal. What's new: 1. skills/continuum-update/SKILL.md — the slash-command skill. Name `continuum:update`, user-invocable, allowed-tools: Bash. Body tells the AI to just run `continuum update` (or pass --dev) and report outcomes in user-facing prose. Includes guidance on when NOT to suggest --dev (30+ min rebuild Carl doesn't need), when to defer the update (live persona sessions in progress), and pointers to related skills (doctor, status, airc:connect). 2. install.sh § 3c — opt-in skill installer. If `~/.claude/skills/` exists (user has Claude Code), copy everything in `$INSTALL_DIR/skills/*/` into it. Silent no-op if Claude Code isn't installed — continuum core functionality doesn't require it. Mirrors the airc pattern. The "skills as additive, optional" framing from Joel: continuum works standalone, skills are the bridge for users who also run Claude Code and want to invoke continuum operations from their IDE. More skills to come in this branch: continuum:status, continuum:doctor, continuum:chat (send to a persona). Each is a thin wrapper around the existing CLI with AI-facing prose explaining when to use / when not to. Branch: test/install-e2e-mac (post-PR891 e2e work + surfaced features). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three more Claude-Code-invokable skills that wrap existing CLI commands with AI-facing prose. All sit on top of `bin/continuum` (single source of truth); the skills are the interaction layer that tells an IDE Claude when to call them and how to interpret the output for the user. - /continuum:status — runs `continuum status`, translates "all healthy" / "some unhealthy" / "nothing running" into specific next-step recommendations instead of dumping raw output. - /continuum:doctor — runs `continuum doctor`, narrows the diagnostic output to THE root cause the user cares about. Includes failure-mode prose for the exact symptoms users hit tonight: DMR latest-cpu vs latest-metal, host-side TCP closed, Qwen missing from catalog, submodules uninitialized, disk low, AIProviderDaemon stuck (false positive when chats work). - /continuum:chat — send a message to a persona without leaving the IDE. Wraps `continuum cli collaboration/chat/send` + /export to fetch the reply. The dev's IDE Claude talks to the user's continuum persona over the shared chat log. Bridge pattern: obsolete once continuum's own persona layer replaces Claude Code workflows, but the on-ramp for users migrating today. Each SKILL.md includes: - When to use - When NOT to use - Failure-mode prose for common states - Links to sibling skills and docs/SETUP.md - Note that skill wraps CLI (not re-implementing) — SSOT is `bin/continuum` Install.sh § 3c already wires these — anyone running `curl install.sh` with `~/.claude/skills/` present gets them installed automatically. Silent no-op for Carl without Claude Code. Steady-state vision: continuum's OWN persona layer replaces the need for these skills. Today they're the retention lever for devs whose IDE is Claude Code — the difference between "bounced" and "stayed." Branch: test/install-e2e-mac. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The skill layer I added yesterday (/continuum:update, :status, :doctor, :chat) was invisible to anyone reading the repo's entry docs. Fixing. README §Getting Started → added a "Claude Code users — bonus skills" collapsed details block below the existing Development details block. Table of the 4 skills + what each does. Explicit note that continuum does NOT require Claude Code — skills are opt-in additive for the dev audience. Carl (end-user) uses the widget; skills are invisible to them. docs/SETUP.md §Skills+helpers → replaced the thin "airc intro + doctor alias" section with a proper Continuum-skills table (same 4 entries) plus the airc mesh section intact. Framing: skills are the transition- period bridge for devs on Claude Code. Steady state is continuum's own persona layer replaces that workflow, so the skills become redundant naturally; they aren't the primary product surface. Install.sh already hooks the skill drop (§3c opt-in copy into ~/.claude/skills/). These doc updates just make sure a user landing on the README or SETUP.md knows the skill layer exists. Branch: test/install-e2e-mac. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…+ post-pull catalog check Install reliability hardening — the previous setup.sh completed "successfully" in cases where the user would still land on a broken first-chat experience: 1. Docker Desktop AI toggle for "Enable host-side TCP support" wasn't enabled → continuum-core containers couldn't reach DMR → silent fall back to Candle CPU inference (10 tok/s). 2. Docker Desktop AI toggle for "Enable GPU-backed inference" wasn't enabled → DMR ran llama.cpp latest-cpu on a machine with a perfectly good GPU → 5-10× slower than users expect from a "local GPU" install. 3. `docker model pull` reported success but the model didn't show in `docker model ls` (rare but caught on BigMama during pr-891 work). Three fixes applied, all pre-compose-up so failures surface immediately instead of during the first chat: - **TCP endpoint enable via CLI** (mirrors root install.sh behavior): `docker desktop enable model-runner --tcp=12434 --cors=all`. Falls through to a loud GUI-instruction prompt if the CLI command isn't available on this Docker Desktop version. - **Post-pull catalog verification**: after `docker model pull` reports success, verify the model is actually listed in `docker model ls`. If not, tell the user exactly what to retry. - **GPU backend detection**: parse `docker model status`, detect if backend is `latest-cpu` on a machine that should have a GPU. Yell loudly (❗) with the specific Settings → AI toggle to flip + the exact expected outcome after flipping (swap to latest-metal / latest-cuda). If backend is one of the GPU variants, report it positively so the user sees their install is in the fast-path. Branch: test/install-e2e-mac (obsessed-with-install-reliability lane per Joel's split — memento on livekit + voice, m5-test on install bulletproofing). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
"All containers healthy" isn't the same as "user can actually chat."
The previous setup.sh claimed success when compose-healthy, but users
could still land on a broken first-chat (Candle CPU fallback, DMR
wrong model, GPU toggle off silently). This adds a real inference
probe AFTER services come up — before the success message + browser
open — so the user learns of any remaining issue with specific
remediation instead of discovering it mid-chat.
What the probe does:
1. POSTs a 1-word-reply inference request to DMR's OpenAI-compat
endpoint (localhost:12434)
2. Parses the response's usage.completion_tokens + timings.
predicted_per_second
3. Classifies by tok/s:
- 0 tokens → model failed to load / DMR broken. Prints debug cmds.
- <15 tok/s → CPU-tier speed. Loud ❗ telling user the exact
Settings → AI toggle to flip. Chat works but will be SLOW.
- 15-79 tok/s → Mac Metal GPU tier (acceptable for M1-M5).
- 80+ tok/s → Nvidia CUDA GPU tier.
Subtle bug caught during local testing + fixed: `echo "$JSON"` in
bash interprets literal \\n sequences inside the JSON content (the
model's <think>\\n... reasoning output) as real newlines, breaking
json.load. Switched to `printf '%s'` which preserves the string
verbatim. Kept the behavior in a comment so a future editor doesn't
undo the fix thinking `echo` is fine.
Joel's split: memento on voice/livekit + response-quality gate,
m5-test obsessed with install reliability. This commit is in that
lane — closes the "install succeeded but first chat is slow/broken"
class of failure that users can't self-diagnose from the widget.
Branch: test/install-e2e-mac.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Produces a pass/fail JSON transcript proving Helper AI and Teacher AI actually respond with coherent output via the local DMR path on a given system. Runs against a live install. Artifact attaches to PRs as per-system proof (M5 from my machine, BigMama from memento's, etc). What it does: 1. Sends a @persona probe with a unique marker to the General room 2. Polls `chat/export` every 2s for up to --timeout= seconds 3. Finds replies that (a) match the target persona in header, (b) don't contain the probe marker (excludes echoes of our own send), (c) have body > 30 chars (filters ultra-short non-replies) 4. Writes JSON transcript: environment (OS/arch/git sha/DMR backend/ GPU tier) + per-persona result (replied/timeout + excerpt) 5. Exits 0 if all requested personas replied, 1 if any timed out 3 bugs found + fixed during development on my own M5 install: - jtag CLI requires CWD=src/ (internal relative path to cli.ts). Script cd's before invocation. - `echo "$JSON"` interprets backslash-n inside model <think>\n... content as real newlines, breaking json.load. Using `printf '%s'`. - Markdown blocks start with a leading empty line, so `split('\n')[0]` returned '' instead of the `## #<id> - <name>` header. Walk lines looking for the first `## ` prefix. Also adds scripts/lib/repo-root.sh — shared helper for any script that needs the repo root regardless of CWD. Walks up from the script's own location looking for docker-compose.yml + src/. Other scripts can `source` it instead of reimplementing the find-root logic each time. docs/SETUP.md — Windows troubleshooting: added the WSL2 credsStore=desktop.exe gotcha that blocks `docker push` even after a successful `docker login` (known from memento's BigMama session earlier today). Verified on M5 Metal — both Helper + Teacher replied with coherent unit-test descriptions. Transcript shows environment metadata + per- persona excerpts. Exit code 0. Branch: test/install-e2e-mac (install-reliability lane). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Audit pass on my own recent commits for error-suppression smells. Three real fixes; some `2>/dev/null` legitimately suppress expected-noise (sysctl on non-Mac, nvidia-smi on non-Nvidia, openssl missing on bare images) and stay. Fixed: 1. setup.sh inference probe — was `python3 ... 2>/dev/null || echo "0"`, which silently returned 0 on any parse error and then triggered the "❗ CPU speed" warning incorrectly. Now: no suppression. If the JSON from DMR is malformed for an unexpected reason, Python's traceback prints to stderr where the user (and install-AI helping them) sees the actual problem. The .get() chains became required dict accesses for the same reason — predicted_per_second IS required for the GPU-tier classification we're about to print, so it should crash if missing rather than fall to "0". 2. setup.sh probe curl — removed `2>/dev/null || echo ""` on the curl itself. We already gated on `/v1/models` being reachable; if the /chat/completions call THEN errors, that's a real failure we want visible, not papered over with empty PROBE_RESPONSE leading to a misleading "couldn't reach DMR" message. 3. scripts/verify-personas.sh — replaced `try: ... except: pass` with honest dict access. If jtag's chat/send returns malformed JSON, Python crashes with a real traceback; MSG_ID stays empty; the caller's "send_failed" branch then prints SEND_RESULT for diagnosis. No silent `2>/dev/null`. Same in the export-poll detection block. Also added: scripts/push-image.sh --no-cache flag (gap #6 from memento's PR891 followup list). Position-independent arg parsing so order doesn't matter. Threaded into both Phase 1 (local build) and Phase 3 (multi-platform push) buildx invocations. NO CACHE indicator in the phase header so the user sees what's happening. Caught while Joel was actively pointing out that we (the AIs) hide errors at 5× the normal rate. Errors save time. Runtime failures suck. Required >>> optional. Branch: test/install-e2e-mac. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rcement Memento caught (and fixed by hand in VoiceStartTypes.ts) the pattern: result interface declares `roomId: string` (required) but factory data param emits `roomId?: string` with `roomId: data.roomId ?? ''` default. Compile guarantee thrown away — caller can omit roomId and get a runtime "" instead of a compile error. 452 instances across the command-types codebase, all generator-emitted from the same template. Root cause: TokenBuilder.buildResultFactoryDataType + buildResultFactoryDefaults unconditionally emitted `?:` and `?? <default>` for every result field, based on the assumption that "all other result fields are typically optional (for error cases)." That assumption violates the type-safety contract — required fields in the result interface should be required in the factory. Fix: 1. ResultSpec.required?: boolean — defaults to TRUE. JSDoc explains when to set it false (cursor on last page, warning on partial success) and why required-by-default is the safer convention per Joel's "required > optional" rule. 2. buildResultFactoryDataType: emit `?:` only when `result.required === false`. Required fields stay required. 3. buildResultFactoryDefaults: emit `?? <default>` only for optional fields. Required fields pass through `data.<field>` directly — the compile error fires in the data param type if the caller forgot. Note on rollout: existing 452 generated files still have the bad pattern. Re-running the generator on each spec will tighten the emitted types, and CALLERS (handlers) that previously got away with not setting required fields will start failing to compile. That's the goal — those failures point at real "we said we'd return X but we don't actually" bugs that hide today as runtime "" / [] / null. Memento's manual fix to VoiceStartTypes shows the right shape; the generator now emits that shape on the next regen. Branch: test/install-e2e-mac. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
) Linux-only install friction memento flagged in the PR891 followup gap list: Docker containers run as root by default, files they write into bind-mounted ~/.continuum/ end up root-owned and unreadable by the host user. Symptom users hit: ./jtag ping returns EACCES even though all services are healthy. Doc prose for the Linux+Nvidia "If it breaks" section: shows the chown reclaim + the PUID/PGID env vars to set in config.env so future container writes use the host UID/GID instead of root. Notes that Mac and Windows don't hit this (Docker Desktop's VM handles UID translation). Code-side fix (run container as host UID by default) is tracked for follow-up. The doc is the immediate-relief patch. Branch: test/install-e2e-mac. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…(gap #3) Old shape: setTimeout(seedDatabase, 3000) then console.warn on failure. If IPC wasn't connected by t+3000ms (which happens on slow boots, on docker startup ordering quirks, or when continuum-core is rebuilding ports), the seed silently failed and the server continued running with NO personas, NO rooms, NO recipes. Carl opens the widget, types hello, nobody replies. "All containers healthy" but the install is broken in exactly the way the user can't self-diagnose. New shape: retry up to 30 × 1s = 30s total budget. Each iteration naturally exercises the IPC connection (Commands.execute throws if the daemon isn't reachable, the catch sleeps and retries). When it succeeds, we move on. When 30s elapses without a successful seed, we log .error (not .warn) with a structured diagnostic + remediation: - is jtag ai/status responsive - is ~/.continuum/database/ writable - npm run data:reseed once the root cause is fixed Memento hit the symptom on stuck-IPC restarts on his M1 today — continuum-core's PID was alive at 2.6GB but IPC times out, so any seed-on-boot fired in that window would silently fail and we'd be debugging "why no personas" hours later. Branch: test/install-e2e-mac. From memento's PR891-followup gap #3. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
) Memento spent hours on PR891 chasing "why isn't my candle fix in the running binary?" before realizing the container was an April 6 image and his April 17 source was never built into it. Stale-image is the single most expensive class of debugging — symptoms look like real bugs but the actual cause is "you're not running the code you think you are." Detection: every image published via the docker/metadata-action gets an `org.opencontainers.image.revision` label with the git SHA it was built from. continuum doctor now reads that label off the running continuum-core container and compares to the local repo HEAD. Three states: - match → green check, "matches repo HEAD" - mismatch → yellow warning with concrete remediation: continuum update (pull latest published image) continuum update --dev (rebuild from THIS commit's source) - no label → dim note that the image was built without metadata-action (e.g., bare `docker build` from a dev box) — can't verify freshness, but at least the user knows we tried Doesn't replace verify-personas.sh (that proves the chat path actually works); this is the FIRST thing to check when a fix doesn't seem to land. "Are we even running the new code?" answered in one command. Branch: test/install-e2e-mac. Closes the visible-side of memento's PR891-followup gap #1. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nt (gap #2) Both ORMRustClient and AIProviderRustClient had: if (wasPreviouslyConnected) { this.scheduleReconnect(); } Boot-time race: TS daemon initializes the Rust IPC client at t=0; Rust core hasn't bound its socket yet → connect rejects → wasConnected stays false → reconnect NEVER scheduled. The pool sits permanently disconnected unless the calling code knows to retry connect() itself. Symptom on slow-boot Macs / WSL2 / cold containers: "all services healthy" but data daemon and AI provider can't reach core. continuum doctor shows the IPC socket present but every Commands.execute() goes to the ground. Fix: drop the `if (wasPreviouslyConnected)` guard. Always call scheduleReconnect() on close. The reconnect loop already has a maxAttempts cap (10 in ORM, 20 in AIProvider) with exponential backoff, then a loud console.error on final giveup — so this can't infinite-spin and won't hide a permanent failure. Closes memento's PR891-followup gap #2. Companion to the seed-on-boot retry fix (commit ef8d182) — same class of bug, same shape of fix (retry the connection naturally instead of betting on a single timing window). Branch: test/install-e2e-mac. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…view) Copilot's PR891 review flagged 5 [SUB-DEBUG], [MSG-DEBUG], [LOOP-DEBUG], [EVENT], [ADAPTER-DEBUG] console.log lines that were left in from mid-investigation work. Each fires on hot paths (every persona init, every chat message, every loop tick, every adapter selection, every ORM.store) and floods stdout in production with no env-flag gating. Removed: - src/daemons/ai-provider-daemon/shared/AIProviderDaemon.ts:597 [ADAPTER-DEBUG] selectAdapter trace - src/system/user/server/PersonaUser.ts:845, :847 [SUB-DEBUG] subscribe-path trace (kept the structured this.log.debug call right below — that's the right shape for keeping this signal) - src/system/user/server/PersonaUser.ts:1302 [MSG-DEBUG] handleChatMessage every-message trace - src/system/user/server/modules/PersonaAutonomousLoop.ts:160, :162 [LOOP-DEBUG] every-tick serviceCycleFull trace - src/daemons/data-daemon/server/ORM.ts:152 [EVENT] success-path emit trace Kept (with the [EVENT] prefix dropped, prose tightened): - src/daemons/data-daemon/server/ORM.ts:156 console.warn for null jtagContext — that's a real "events are being silently dropped" signal worth keeping loud. If the breadcrumbs end up needed again for a specific debug session, add them through the structured logger (this.log.debug, gated by category) instead of unconditional console.log. That's the right tool for "always-available, off by default" tracing. Branch: test/install-e2e-mac. Closes 5 of Copilot's 12 PR891 line comments. Real correctness bugs (compute_router overflow, sysctlbyname unchecked return, concurrency.rs docstring drift, ai_provider.rs priority comment vs code mismatch, CodebaseIndexer cache race) are separate commits — they need careful thought, not blanket cleanup. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Improves Continuum install reliability and developer ergonomics by adding Claude Code “skills”, hardening setup/verification flows, and tightening generator/runtime behavior around seeding, IPC reconnects, and CLI diagnostics.
Changes:
- Add Continuum Claude Code skills and installer/docs hooks to surface
/continuum:*commands for devs. - Harden install/setup validation (DMR model presence, backend checks, real inference probe) and add
scripts/verify-personas.shacceptance artifact. - Close PR891 follow-up gaps: generator result requiredness, IPC reconnect behavior, auto-seed retry logic, and
continuum doctorstale-image detection.
Reviewed changes
Copilot reviewed 20 out of 21 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| src/system/orchestration/SystemOrchestrator.ts | Replace single delayed seed attempt with retry loop and louder failure reporting. |
| src/generator/shared/specs/CommandSpec.ts | Add ResultSpec.required?: boolean documentation/shape for required-by-default result fields. |
| src/generator/TokenBuilder.ts | Change generator output for result factory param types/defaults to respect required. |
| src/daemons/data-daemon/server/ORMRustClient.ts | Remove “wasConnected” gate so reconnect is scheduled even on first connect failures. |
| src/daemons/ai-provider-daemon/server/AIProviderRustClient.ts | Mirror reconnect scheduling behavior change for AI provider IPC client. |
| skills/continuum-update/SKILL.md | Add Claude Code skill to run continuum update (Carl vs dev path). |
| skills/continuum-status/SKILL.md | Add Claude Code skill to summarize continuum status. |
| skills/continuum-doctor/SKILL.md | Add Claude Code skill to interpret continuum doctor. |
| skills/continuum-chat/SKILL.md | Add Claude Code skill to send a persona chat via CLI and fetch replies. |
| setup.sh | Add DMR host TCP enable attempt, model catalog verification, backend detection, and end-to-end inference probe. |
| scripts/verify-personas.sh | Add persona-level acceptance test that sends @persona probes and emits a JSON transcript. |
| scripts/push-image.sh | Add --no-cache flag parsing and propagation to buildx builds. |
| scripts/lib/repo-root.sh | Add reusable helper to locate repo root from arbitrary CWD. |
| install.sh | Optionally install skills into ~/.claude/skills/ when Claude Code is detected. |
| docs/SETUP.md | Document WSL2 GHCR auth gotcha and Linux UID/bind-mount permission remediation; document skills. |
| bin/continuum | Split continuum update into pull-vs-build (--dev) modes; add stale image revision check to doctor. |
| README.md | Surface Claude Code skills feature in README. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # but DMR has no models on a fresh install. Carl from HF expects to chat | ||
| # with the model whose card brought them here — so we pull it here, idempotent. | ||
| QWEN_MODEL="hf.co/continuum-ai/qwen3.5-4b-code-forged-GGUF" | ||
| QWEN_MODEL_LC="huggingface.co/continuum-ai/qwen3.5-4b-code-forged-gguf:latest" |
There was a problem hiding this comment.
QWEN_MODEL_LC is defined but never used. With set -e enabled, keeping unused variables around also increases the chance of model-name drift (the probe payload later hard-codes the LC name). Either remove it or use it consistently in the probe/request payload to keep a single source of truth for the model ID.
| QWEN_MODEL_LC="huggingface.co/continuum-ai/qwen3.5-4b-code-forged-gguf:latest" |
| echo "📡 Enabling Docker Model Runner host-side TCP endpoint..." | ||
| if docker desktop enable model-runner --tcp=12434 --cors=all 2>&1 | tail -3; then | ||
| echo " ✅ DMR TCP endpoint enabled on localhost:12434" | ||
| else |
There was a problem hiding this comment.
This if docker desktop enable ... | tail -3; then check is unreliable because the script only has set -e (no pipefail). In Bash, the pipeline exit status is tail's status, so a failing docker desktop enable ... can still be treated as success and print "DMR TCP endpoint enabled". Either enable set -o pipefail (or locally capture/check ${PIPESTATUS[0]}) so failures are detected correctly.
| else | ||
| # printf '%s' — DO NOT use echo. The JSON response contains literal | ||
| # backslash-n sequences inside the model's <think>\n... content, and | ||
| # bash's echo will interpret them as real newlines, breaking json.load. | ||
| # Don't suppress python errors — if json.load fails, the traceback prints | ||
| # to stderr where the user sees it. Empty result triggers a loud message | ||
| # below; silent "0" would falsely trip the CPU-speed warning. | ||
| PROBE_TPS=$(printf '%s' "$PROBE_RESPONSE" | python3 -c " | ||
| import sys, json | ||
| d = json.load(sys.stdin) | ||
| t = d['timings'] # required: GPU-tier classification depends on it | ||
| print(f'{t[\"predicted_per_second\"]:.0f}') | ||
| ") | ||
| PROBE_TOKENS=$(printf '%s' "$PROBE_RESPONSE" | python3 -c " | ||
| import sys, json | ||
| d = json.load(sys.stdin) | ||
| print(d['usage']['completion_tokens']) # required, not optional | ||
| ") | ||
|
|
||
| if [ "$PROBE_TOKENS" -eq 0 ]; then | ||
| echo " ⚠️ Probe returned zero tokens. Model may have failed to load or DMR routing is broken." | ||
| echo " Debug:" | ||
| echo " docker model status" | ||
| echo " docker model ls | grep qwen" | ||
| elif [ "$PROBE_TPS" -lt 15 ]; then | ||
| echo " ❗ Probe got $PROBE_TOKENS tokens at $PROBE_TPS tok/s — that's CPU speed." | ||
| echo " The inference probe SUCCEEDED but GPU acceleration isn't engaged." | ||
| echo " This is the Docker Desktop 'Enable GPU-backed inference' toggle (Settings → AI)." | ||
| echo " Chat will work but will be SLOW (5-10× slower than expected) until you flip it." | ||
| elif [ "$PROBE_TPS" -lt 80 ]; then | ||
| echo " ✅ Probe: $PROBE_TOKENS tokens at $PROBE_TPS tok/s (Metal GPU, Mac-tier speed)" | ||
| else | ||
| echo " ✅ Probe: $PROBE_TOKENS tokens at $PROBE_TPS tok/s (CUDA GPU, Nvidia-tier speed)" |
There was a problem hiding this comment.
The post-start inference probe runs under set -e but assumes python3 is present and that the response JSON always has timings.predicted_per_second and usage.completion_tokens. If python3 is missing or JSON parsing/field access fails, the setup script will exit early; if the values are empty/non-numeric, the -eq/-lt tests will error and also abort. Guard the probe with command -v python3, and treat parse/field failures as a probe failure (print a remediation message + continue install) instead of crashing setup.
| else | |
| # printf '%s' — DO NOT use echo. The JSON response contains literal | |
| # backslash-n sequences inside the model's <think>\n... content, and | |
| # bash's echo will interpret them as real newlines, breaking json.load. | |
| # Don't suppress python errors — if json.load fails, the traceback prints | |
| # to stderr where the user sees it. Empty result triggers a loud message | |
| # below; silent "0" would falsely trip the CPU-speed warning. | |
| PROBE_TPS=$(printf '%s' "$PROBE_RESPONSE" | python3 -c " | |
| import sys, json | |
| d = json.load(sys.stdin) | |
| t = d['timings'] # required: GPU-tier classification depends on it | |
| print(f'{t[\"predicted_per_second\"]:.0f}') | |
| ") | |
| PROBE_TOKENS=$(printf '%s' "$PROBE_RESPONSE" | python3 -c " | |
| import sys, json | |
| d = json.load(sys.stdin) | |
| print(d['usage']['completion_tokens']) # required, not optional | |
| ") | |
| if [ "$PROBE_TOKENS" -eq 0 ]; then | |
| echo " ⚠️ Probe returned zero tokens. Model may have failed to load or DMR routing is broken." | |
| echo " Debug:" | |
| echo " docker model status" | |
| echo " docker model ls | grep qwen" | |
| elif [ "$PROBE_TPS" -lt 15 ]; then | |
| echo " ❗ Probe got $PROBE_TOKENS tokens at $PROBE_TPS tok/s — that's CPU speed." | |
| echo " The inference probe SUCCEEDED but GPU acceleration isn't engaged." | |
| echo " This is the Docker Desktop 'Enable GPU-backed inference' toggle (Settings → AI)." | |
| echo " Chat will work but will be SLOW (5-10× slower than expected) until you flip it." | |
| elif [ "$PROBE_TPS" -lt 80 ]; then | |
| echo " ✅ Probe: $PROBE_TOKENS tokens at $PROBE_TPS tok/s (Metal GPU, Mac-tier speed)" | |
| else | |
| echo " ✅ Probe: $PROBE_TOKENS tokens at $PROBE_TPS tok/s (CUDA GPU, Nvidia-tier speed)" | |
| elif ! command -v python3 >/dev/null 2>&1; then | |
| echo " ⚠️ Probe response received, but python3 is not installed so the result could not be evaluated." | |
| echo " Install python3 or inspect the raw response manually if you need to debug inference performance." | |
| else | |
| # printf '%s' — DO NOT use echo. The JSON response contains literal | |
| # backslash-n sequences inside the model's <think>\n... content, and | |
| # bash's echo will interpret them as real newlines, breaking json.load. | |
| # Treat parse/field failures as a probe failure so setup continues under | |
| # set -e, but leave a clear remediation message for the user. | |
| if PROBE_METRICS=$(printf '%s' "$PROBE_RESPONSE" | python3 -c " | |
| import sys, json | |
| d = json.load(sys.stdin) | |
| tps = d['timings']['predicted_per_second'] | |
| tokens = d['usage']['completion_tokens'] | |
| print(f'{tokens} {tps:.0f}') | |
| "); then | |
| PROBE_TOKENS=${PROBE_METRICS%% *} | |
| PROBE_TPS=${PROBE_METRICS#* } | |
| if ! [[ "$PROBE_TOKENS" =~ ^[0-9]+$ && "$PROBE_TPS" =~ ^[0-9]+$ ]]; then | |
| echo " ⚠️ Probe returned metrics in an unexpected format and could not be evaluated." | |
| echo " Raw parsed values: tokens='$PROBE_TOKENS' tok/s='$PROBE_TPS'" | |
| echo " Try this manually to inspect the full response:" | |
| echo " curl -v http://localhost:12434/engines/v1/chat/completions ..." | |
| elif [ "$PROBE_TOKENS" -eq 0 ]; then | |
| echo " ⚠️ Probe returned zero tokens. Model may have failed to load or DMR routing is broken." | |
| echo " Debug:" | |
| echo " docker model status" | |
| echo " docker model ls | grep qwen" | |
| elif [ "$PROBE_TPS" -lt 15 ]; then | |
| echo " ❗ Probe got $PROBE_TOKENS tokens at $PROBE_TPS tok/s — that's CPU speed." | |
| echo " The inference probe SUCCEEDED but GPU acceleration isn't engaged." | |
| echo " This is the Docker Desktop 'Enable GPU-backed inference' toggle (Settings → AI)." | |
| echo " Chat will work but will be SLOW (5-10× slower than expected) until you flip it." | |
| elif [ "$PROBE_TPS" -lt 80 ]; then | |
| echo " ✅ Probe: $PROBE_TOKENS tokens at $PROBE_TPS tok/s (Metal GPU, Mac-tier speed)" | |
| else | |
| echo " ✅ Probe: $PROBE_TOKENS tokens at $PROBE_TPS tok/s (CUDA GPU, Nvidia-tier speed)" | |
| fi | |
| else | |
| echo " ⚠️ Probe response could not be parsed or did not contain the expected metrics." | |
| echo " Continuing setup. To debug, inspect the response manually:" | |
| echo " curl -v http://localhost:12434/engines/v1/chat/completions ..." |
| const { seedDatabase } = await import('../../server/seed-in-process'); | ||
| const MAX_ATTEMPTS = 30; | ||
| const BACKOFF_MS = 1000; | ||
| let lastError: unknown = null; | ||
|
|
||
| for (let attempt = 1; attempt <= MAX_ATTEMPTS; attempt++) { | ||
| try { |
There was a problem hiding this comment.
The async auto-seed task imports ../../server/seed-in-process outside of any try/catch. If that dynamic import fails (missing file, build artifact mismatch, runtime module error), this becomes an unhandled rejection and can crash the process or bypass the intended retry/error reporting. Wrap the import in the same retry/lastError flow (or a top-level try/catch inside the IIFE) so import failures surface via the final console.error instead of escaping.
| const { seedDatabase } = await import('../../server/seed-in-process'); | |
| const MAX_ATTEMPTS = 30; | |
| const BACKOFF_MS = 1000; | |
| let lastError: unknown = null; | |
| for (let attempt = 1; attempt <= MAX_ATTEMPTS; attempt++) { | |
| try { | |
| const MAX_ATTEMPTS = 30; | |
| const BACKOFF_MS = 1000; | |
| let lastError: unknown = null; | |
| for (let attempt = 1; attempt <= MAX_ATTEMPTS; attempt++) { | |
| try { | |
| const { seedDatabase } = await import('../../server/seed-in-process'); |
| results.forEach(result => { | ||
| const comment = result.description ? ` // ${result.description}\n` : ''; | ||
| fields.push(`${comment} ${result.name}?: ${result.type};`); | ||
| const optional = result.required === false ? '?' : ''; | ||
| fields.push(`${comment} ${result.name}${optional}: ${result.type};`); | ||
| }); |
There was a problem hiding this comment.
buildResultFactoryDataType()/buildResultFactoryDefaults() now read result.required, but ResultSpec imported from ./CommandNaming does not define a required property (see src/generator/CommandNaming.ts:28-32). This should currently be a TypeScript compile error. Either extend CommandNaming.ResultSpec to include required?: boolean (or re-export/use the shared specs/CommandSpec type) so the generator compiles.
| @@ -156,10 +155,16 @@ class IPCConnection { | |||
| } | |||
| this.pendingRequests.clear(); | |||
| this.pendingTimings.clear(); | |||
| // Auto-reconnect with exponential backoff if we were previously connected | |||
| if (wasPreviouslyConnected) { | |||
| this.scheduleReconnect(); | |||
| } | |||
| // Always schedule reconnect — even on FIRST-connect failures. | |||
| // The previous `if (wasPreviouslyConnected)` guard meant a boot-time | |||
| // race (Rust core not ready yet when TS data daemon starts) would | |||
| // cause connect() to reject ONCE and never retry — leaving the pool | |||
| // permanently disconnected unless the caller knew to retry. The | |||
| // scheduleReconnect() loop has its own maxAttempts cap (currently | |||
| // 20 × exponential backoff, max 30s between tries) so this can't | |||
| // spin forever; after the cap it logs loud and gives up. From | |||
| // memento's PR891-followup gap #2. | |||
| this.scheduleReconnect(); | |||
| }); | |||
There was a problem hiding this comment.
IPCConnection.disconnect() sets _connected = false before the socket emits close. With the new unconditional scheduleReconnect() in the close handler, an intentional disconnect will now trigger the reconnect loop and keep timers alive (and potentially keep the process from exiting / cause background reconnect spam). Add an explicit "shouldReconnect" flag (default true) that disconnect() sets false, and gate scheduleReconnect() on it, while still retrying first-connect failures.
| sys.exit(0) # jtag printed no json — caller will surface via SEND_RESULT | ||
| d = json.loads(raw[idx:]) # raise if malformed: traceback → stderr → user sees it | ||
| print(d.get('shortId', d.get('messageId', ''))) | ||
| ")" |
There was a problem hiding this comment.
With set -euo pipefail, a JSON parse error in this python3 -c (e.g., jtag prints non-JSON after the payload, or the sliced string still isn’t valid JSON) will terminate the script immediately, contradicting the comment that "MSG_ID stays empty" and bypassing the intended exit code handling/transcript output. If you want parse failures to be reported-but-nonfatal, wrap the python call with || true and handle empty MSG_ID; if you want them to be infra errors, catch and exit 2 explicitly.
| ")" | |
| " || true)" |
| cat > "$OUTPUT" <<EOF | ||
| { | ||
| "timestamp": "$(date -u +%Y-%m-%dT%H:%M:%SZ)", | ||
| "verdict": "$VERDICT", | ||
| "environment": { | ||
| "host_os": "$HOST_OS", | ||
| "host_arch": "$HOST_ARCH", | ||
| "git_sha": "$GIT_SHA", | ||
| "git_branch": "$GIT_BRANCH", | ||
| "dmr_backend": "$DMR_BACKEND", | ||
| "gpu_tier": "$GPU_TIER" | ||
| }, | ||
| "room": "$ROOM", | ||
| "timeout_seconds": $TIMEOUT_SEC, | ||
| "results": $RESULTS | ||
| } | ||
| EOF | ||
|
|
There was a problem hiding this comment.
The transcript JSON is assembled via a heredoc with unescaped variables like $ROOM, $DMR_BACKEND, and $GPU_TIER. Because --room= is user-provided and docker model status output can include characters that need escaping, this can easily produce invalid JSON. Consider constructing the transcript via python3 -c 'json.dump(...)' / jq -n so all fields are properly JSON-encoded.
| cat > "$OUTPUT" <<EOF | |
| { | |
| "timestamp": "$(date -u +%Y-%m-%dT%H:%M:%SZ)", | |
| "verdict": "$VERDICT", | |
| "environment": { | |
| "host_os": "$HOST_OS", | |
| "host_arch": "$HOST_ARCH", | |
| "git_sha": "$GIT_SHA", | |
| "git_branch": "$GIT_BRANCH", | |
| "dmr_backend": "$DMR_BACKEND", | |
| "gpu_tier": "$GPU_TIER" | |
| }, | |
| "room": "$ROOM", | |
| "timeout_seconds": $TIMEOUT_SEC, | |
| "results": $RESULTS | |
| } | |
| EOF | |
| TRANSCRIPT_TIMESTAMP="$(date -u +%Y-%m-%dT%H:%M:%SZ)" | |
| TRANSCRIPT_TIMESTAMP="$TRANSCRIPT_TIMESTAMP" \ | |
| TRANSCRIPT_VERDICT="$VERDICT" \ | |
| TRANSCRIPT_HOST_OS="$HOST_OS" \ | |
| TRANSCRIPT_HOST_ARCH="$HOST_ARCH" \ | |
| TRANSCRIPT_GIT_SHA="$GIT_SHA" \ | |
| TRANSCRIPT_GIT_BRANCH="$GIT_BRANCH" \ | |
| TRANSCRIPT_DMR_BACKEND="$DMR_BACKEND" \ | |
| TRANSCRIPT_GPU_TIER="$GPU_TIER" \ | |
| TRANSCRIPT_ROOM="$ROOM" \ | |
| TRANSCRIPT_TIMEOUT_SECONDS="$TIMEOUT_SEC" \ | |
| TRANSCRIPT_RESULTS="$RESULTS" \ | |
| python3 - <<'PY' > "$OUTPUT" | |
| import json | |
| import os | |
| import sys | |
| transcript = { | |
| "timestamp": os.environ["TRANSCRIPT_TIMESTAMP"], | |
| "verdict": os.environ["TRANSCRIPT_VERDICT"], | |
| "environment": { | |
| "host_os": os.environ["TRANSCRIPT_HOST_OS"], | |
| "host_arch": os.environ["TRANSCRIPT_HOST_ARCH"], | |
| "git_sha": os.environ["TRANSCRIPT_GIT_SHA"], | |
| "git_branch": os.environ["TRANSCRIPT_GIT_BRANCH"], | |
| "dmr_backend": os.environ["TRANSCRIPT_DMR_BACKEND"], | |
| "gpu_tier": os.environ["TRANSCRIPT_GPU_TIER"], | |
| }, | |
| "room": os.environ["TRANSCRIPT_ROOM"], | |
| "timeout_seconds": int(os.environ["TRANSCRIPT_TIMEOUT_SECONDS"]), | |
| "results": json.loads(os.environ["TRANSCRIPT_RESULTS"]), | |
| } | |
| json.dump(transcript, sys.stdout, indent=2) | |
| sys.stdout.write("\n") | |
| PY |
| // cause connect() to reject ONCE and never retry — leaving the pool | ||
| // permanently disconnected unless the caller knew to retry. The | ||
| // scheduleReconnect() loop has its own maxAttempts cap (currently | ||
| // 20 × exponential backoff, max 30s between tries) so this can't |
There was a problem hiding this comment.
The comment claims scheduleReconnect() has a maxAttempts cap of "20 × exponential backoff", but the implementation currently gives up after 10 attempts (if (this.reconnectAttempts < 10) ... else ...). Update the comment to match the actual cap (or adjust the cap if 20 is intended).
| // 20 × exponential backoff, max 30s between tries) so this can't | |
| // 10 × exponential backoff, max 30s between tries) so this can't |
… log once (Copilot review) Three real bugs Copilot flagged on PR891 in this file, all fixed: 1. **sysctlbyname return code unchecked** (concurrency.rs:45 in the review). Previously the call result was discarded; if it failed, `size` stayed at 0 and we reported "0 GB RAM," forcing capacity = 1 silently. Now: check rc and size; on failure, log a warn and fall back to the conservative 8 GB default. Per "errors save time" rule. 2. **Wrong fallback on Windows** (concurrency.rs:55 in the review). The non-macOS branch read /proc/meminfo unconditionally — but Windows has no /proc, so the .unwrap_or(8) silently fired and reported 8 GB on every Windows host. Now: separate cfg branches for linux + windows + other. Windows has a clear "not implemented yet, using 8 GB default" warning pointing at GlobalMemoryStatusEx as the right next step. Linux keeps /proc/meminfo with a warn on read failure (was previously also silent). 3. **Docstring lied about logging** (concurrency.rs:86 in the review). Said "Logged once on first call" — implementation logged on every call. Hot path (adapter init, scheduler sizing) → log spam. Now: AtomicUsize cache, first caller computes + logs, subsequent callers read the cache silently. Race-tolerant (pure computation, both racing threads get the same answer). Branch: test/install-e2e-mac. Closes 3 of Copilot's 12 PR891 line comments. Cargo check green; warnings are pre-existing dead-code on unrelated files. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… review) Two more PR891 Copilot line comments addressed. A) compute_router.rs:55 — m.saturating_mul(k).saturating_mul(n) for matmul + recurrence_step FLOPs. Prior `m * k * n` could wrap on large shapes (>2^64 elements is plausible in unusual configs: 192k*192k*192k = 7.1e15 fits, but 4M*4M*4M overflows). On overflow the wrapped value is small → matmul gets routed to CPU → silent wrong answer at the perf level (works, but slow). Saturating clamps at usize::MAX which falls cleanly into "above CPU ceiling, send to GPU" — the safe direction for an overflow case. B) ai_provider.rs:157 — comment claimed DMR registers at "priority -1 (above Candle's 0)." Stale: my a284951 commit moved Candle to 8/9 (kill INFERENCE_MODE promotion). DMR is at priority 0 since PR891. Updated the comment to match the code + reference the kill commit so future readers see the history. Branch: test/install-e2e-mac. Closes 2 more of Copilot's 12 PR891 line comments. Remaining: CodebaseIndexer.ts:333 promise cache race + the trivial test path doc fix. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…n (Copilot review)
Last 2 of Copilot's 12 PR891 line comments closed.
CodebaseIndexer.ts:333 — `loadQueryCache()` memoizes the in-flight
fetch in `this.queryCacheLoad` so concurrent callers share one ORM
roundtrip. The OLD shape cleared `queryCacheLoad = null` inside the
IIFE body — but only on the success path. If the IIFE threw before
reaching that line (unexpected ORM error, IPC dropping mid-fetch),
the rejected Promise stayed cached. Every subsequent loadQueryCache()
call returned the same rejection forever — indexer permanently broken
with no retry path.
Fix: hoist the IIFE into a local Promise, attach `.finally(() => {
this.queryCacheLoad = null })`. The clear runs whether the underlying
load resolved or rejected. Concurrent callers that already grabbed
the in-flight Promise still see the same outcome (success or rejection)
— but the NEXT invocation gets a clean slate and can retry.
InferenceCapacityIntegration.test.ts:13 — header comment said
`commands/Inference Capacity/test/...` but the actual path is
`src/commands/inference/capacity/test/...`. Trivial doc fix; tester
who copy-pastes the command no longer hits "no such file."
Branch: test/install-e2e-mac. ALL 12 of Copilot's PR891 line comments
now addressed (5 debug-noise removals, 3 concurrency.rs real fixes,
1 compute_router overflow, 1 stale priority comment, 1 cache-poison
race, 1 trivial doc).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
LiveKit-server's --dev mode bakes in well-known credentials
(API_KEY=devkey, API_SECRET=secret). Fine for Carl's local-only
install where the LiveKit container binds to localhost. NOT fine for
any Tailscale-grid-exposed deployment — anyone on the user's tailnet
who knows the dev keys could join voice/video sessions with full
participant rights.
Memento's PR914 voice/livekit migration calls
`getSecret('LIVEKIT_API_KEY')` with a fallback to the dev default.
This commit makes sure config.env actually HAS those keys after
install — generated per-install via openssl rand. Per-instance unique,
zero user friction.
Behavior:
- First install: 32-char API_KEY + 64-char API_SECRET written to
config.env with a short prose explainer of why
- Re-install: skipped (idempotent — preserves any user-customized values)
- Missing openssl: loud warn telling user how to generate manually,
install continues (won't block local-only Carl)
Carl's local install is unchanged in behavior — getSecret() returns
the generated keys instead of falling to devkey, but the LiveKit-server
container honors whatever LIVEKIT_KEYS env it's given. Grid-exposed
installs immediately get unique credentials.
Pairs with memento's PR914 review note #1. Branch: test/install-e2e-mac.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
cmd_doctor's config-keys check used: count=$(grep -c "=" file 2>/dev/null || echo 0) `grep -c` prints the count then exits 1 when there are 0 matches. The `|| echo 0` then ran and appended an extra "0" to the variable. Result: "Config: 0\\n0 keys" on any empty config.env (split across two lines in the doctor output, breaking the visual scan). Fix: capture grep's count, ignore its exit code with `|| true`, default-fill if empty. Same idiom that's safe for all the "non-zero-exit-doesn't-mean-error" grep usages. Spotted while smoke-testing my own stale-image-detection addition earlier in this branch (commit 3fb9b66). Pre-existing bug, not mine. Branch: test/install-e2e-mac. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
joelteply
left a comment
There was a problem hiding this comment.
Memento review (cross-review with anvil) — Reviewed full diff, 19 commits.
Critical runtime fixes — all LGTM:
- IPC reconnect (ORMRustClient + AIProviderRustClient): boot-race fix, 20-attempt cap
saturating_mulin compute_router: prevents usize overflow mis-routingconcurrency.rs: sysctlbyname rc check + Linux/Windows split + AtomicUsize cache- Seed retry: 30×1s replaces one-shot setTimeout(3000)
- CodebaseIndexer:
.finally()prevents cached rejection poisoning
Generator type safety — the systemic fix:
ResultSpec.requireddefaults true. This is the generator-level version of the VoiceStartTypes hand-fix.
No issues found. Ready to merge from my perspective.
TokenBuilder imports ResultSpec from CommandNaming, not from shared/specs/CommandSpec where I added required? in 6f7a... Two duplicate interfaces existed; the field has to live on the one TokenBuilder actually consumes or build fails. Caught by memento on cross-test of test/install-e2e-mac. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…-read PR cross-review via diff-read missed both a compile error (caught by memento) and a stale spec on his branch (caught here). Same lesson: read-diff ≠ runtime works. Each PR ships its own verify-pr-<N>.sh that exercises the changed flows in-system and writes a proof JSON (env metadata + per-check pass/fail). PR913 proof covers 16 checks: - compile gate (tsc) - install.sh §4b LiveKit key-gen (sandbox replay: lengths, idempotency, no insecure defaults) - per-OS RAM detection (no silent 8GB fallback) - generator required-by-default (both ResultSpec interfaces, jsdoc, TokenBuilder gating) - IPC reconnect race fix (no wasPreviouslyConnected guard in either client) - SystemOrchestrator seed retry loop (not setTimeout race) - CodebaseIndexer cache rejection cleanup - doctor stale-image label + config-keys display - compute_router saturating_mul (overflow safety) - setup.sh inference probe error visibility - jtag ping (skipped if system not running) Run: bash scripts/verify-pr-913.sh Output: /tmp/verify-pr-913.json + stdout Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`timeout` is GNU coreutils, not in macOS base. Mac users had to brew install coreutils to get jtag-ping check to actually run. Detect either binary, run without if neither. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
✅ Runtime PROOF —
|
| # | Check | Result |
|---|---|---|
| 1 | tsc | ✅ Zero errors |
| 2 | livekit-keygen | ✅ 32-char key + 64-char secret |
| 3 | livekit-keygen-idempotent | ✅ Re-run no-ops |
| 4 | livekit-no-defaults | ✅ No insecure dev defaults |
| 5 | concurrency-per-os | ✅ macOS rc-check + linux + windows + fallback |
| 6 | naming-resultspec-required | ✅ required? on CommandNaming.ResultSpec |
| 7 | commandspec-resultspec-required | ✅ required? + jsdoc on canonical CommandSpec |
| 8 | tokenbuilder-required-gating | ✅ ?: only when required: false |
| 9 | seed-retry | ✅ 30-attempt backoff loop |
| 10 | ipc-reconnect-guard-removed | ✅ if(wasPreviouslyConnected) removed in both clients |
| 11 | indexer-cache-finally | ✅ .finally clears rejected promise |
| 12 | doctor-stale-image | ✅ revision label check |
| 13 | doctor-config-keys | ✅ 0\\n0 keys bug fixed |
| 14 | compute-router-saturating | ✅ 4 saturating_mul occurrences |
| 15 | setup-probe-errors | ✅ Python errors visible |
| 16 | jtag-ping | ✅ System responding |
16 passed, 0 failed, 0 skipped
JSON proof artifact (excerpt)
{
"pr": 913,
"branch": "test/install-e2e-mac",
"sha": "c3ec85313",
"machine": "Mac-1950.lan",
"os": "Darwin 25.4.0",
"arch": "arm64",
"passed": 16,
"failed": 0,
"skipped": 0
}Pattern matches @memento's verify-pr-914.sh so cross-PR cross-test is mechanical going forward.
Post-PR891 follow-up — install-reliability lane (m5-test/anvil side). Memento's working voice/livekit + response quality on a separate branch. Both lanes meet at main.
What's in this PR
13 commits, three groups:
Group 1: Continuum CLI + Skills (4 commits)
The dev-side bridge — Claude Code users get continuum operations as
/commands. Carl never needs Claude Code; skills are purely additive for the dev audience.install.shonly drops them into~/.claude/skills/if Claude Code is detected (silent no-op otherwise).bda74c054—continuum updateCLI splits Carl path (docker compose pull~30s) from Dev path (--devflag → rebuild from source)1b02fc834—/continuum:updateskill + install.sh hookb11874fa9—/continuum:status,/continuum:doctor,/continuum:chatskillse80485dcf— README + SETUP.md surface skills as a featureGroup 2: Install reliability — surfaces failures BEFORE the user hits them (4 commits)
The previous setup.sh claimed success when "all containers healthy" — but a working DMR + missing GPU toggle + unverified-routing combination still shipped a broken first-chat experience that users couldn't self-diagnose.
b832cdd89— setup.sh detects `latest-cpu` backend on a GPU box + auto-enables host-side TCP via `docker desktop enable model-runner --tcp` + post-pull catalog verification (model actually appeared in DMR)ec2ffacd0— setup.sh end-to-end inference probe: sends a real chat completion to DMR, classifies tok/s as CPU-tier vs Mac Metal vs Nvidia CUDA, surfaces specific remediation prose for each broken state207c728b8— `scripts/verify-personas.sh` merge-gate acceptance test: sends @persona probes, polls chat/export, writes JSON transcript with environment metadata. Verified PASS on M5 Metal (Helper + Teacher both reply coherently). Plus `scripts/lib/repo-root.sh` shared helper.Group 3: PR891-followup gap fixes (5 commits — gaps #1-7 from memento's list)
What's NOT in this PR
Copilot PR891 review — all 12 line comments addressed
Joel flagged that Copilot's PR891 review hadn't been triaged. All 12 line comments addressed across 4 commits:
PersonaUser.ts:847[SUB-DEBUG]941adf9b3PersonaUser.ts:1302[MSG-DEBUG]941adf9b3PersonaAutonomousLoop.ts:163[LOOP-DEBUG]941adf9b3ORM.ts:157[EVENT]write-path log941adf9b3AIProviderDaemon.ts:597[ADAPTER-DEBUG]941adf9b3concurrency.rs:45sysctlbyname unchecked rc2372043ceconcurrency.rs:55/proc/meminfoon Windows2372043ceconcurrency.rs:86"logged once" docstring lied2372043cecompute_router.rs:55m*k*nusize overflowc945ada6aai_provider.rs:231priority "-1" comment vs codec945ada6a0post-a28495135CodebaseIndexer.ts:333rejected promise stays cached50d810570.finallyclearsqueryCacheLoadon rejection tooInferenceCapacityIntegration.test.ts:13wrong test path50d8105705 debug-noise removals + 5 real correctness bugs + 1 stale comment + 1 trivial doc. None broke anything related to install-reliability — all latent / debug-leftover / stylistic.
Acceptance evidence
`scripts/verify-personas.sh` ran on M5 Metal at HEAD:
Same script runs on memento's BigMama / FlashGordon to produce the Windows CUDA + M1 Pro Metal transcripts when those install paths are validated.
🤖 Coordinated with memento via airc mesh — same multi-agent pattern as PR891.