voice: retire legacy WS transport, unify on LiveKit WebRTC by joelteply · Pull Request #914 · CambrianTech/continuum

joelteply · 2026-04-17T19:31:22Z

Summary

VoiceChatWidget browser migration: 427→178 lines. Raw WebSocket + AudioWorklet replaced with AudioStreamClient (LiveKit WebRTC). VOICE_WS_PORT/port-3001 eliminated from browser. Dead AudioWorklet processor files removed (348 lines).
Type safety: Required result fields enforced in factory params. Dead wsUrl field removed. Anvil fixed the generator to default required: true (b96a652) — 452 generated files will tighten on re-gen.
LiveKit stays always-on: Reverted a mistaken profile-gate attempt. LiveKit is THE efficient UDP/WebRTC transport backbone — 14 personas + 4 LLMs + TTS/STT + Bevy avatars worked simultaneously on M1 because of it. Texture-ID + mouth-animation params over the wire, NOT rasterized video.

Remaining work on this PR

The old port-3001 WebSocket voice server still runs in parallel with LiveKit, doing the same work. This PR retires it.

What the exploration found

Current transport architecture (already good):

Core↔Bridge IPC: binary frames over Unix socket. Audio = raw i16 PCM, video = raw RGBA. Zero base64.
Bridge↔LiveKit: WebRTC/UDP. Opus codec for audio, hardware-accelerated video.
Tile resolution: browser-driven CSS px → 6 tiers (Tiny 160×120 → FullHD 1920×1080), dynamic fps.
Grid: TCP for reliable commands, UDP (port 7118) for fire-and-forget events.
Avatar pipeline: texture-ID approach — Bevy renders locally from params, LiveKit carries voice + metadata. NOT rasterized streams.

The duplication (old path duplicating LiveKit):

Function	Old (VoiceWebSocketHandler.ts, port 3001)	New (LiveKit via bridge)
Audio capture	Manual binary WS frames, 500ms buffering	LiveKit SDK + getUserMedia, Opus, WebRTC
STT trigger	500ms accumulation threshold	LiveKit VAD + Rust STT listener
Transcription routing	`getRustVoiceOrchestrator().onUtterance()`	Same call, via `CollaborationLiveTranscriptionServerCommand`
TTS synthesis	`VoiceSynthesize.execute()` + manual 20ms chunking to WS	`voiceSpeakInCall()` IPC → Rust TTS → LiveKit publish
Audio playback	Manual binary WS frames to browser	LiveKit remote track → HTMLAudioElement
Mic levels	`voice:audio:level` event	`room.localParticipant.audioLevel` at 30fps

Step-by-step plan

Step 1: Remove startVoiceServer() from boot

JTAGSystemServer.ts:223-230 — delete the startVoiceServer() call
Port 3001 stops binding. Zero impact on LiveKit path.

Step 2: Remove VoiceWebSocketHandler.ts

590 lines of raw WebSocket audio handling that LiveKit replaces
system/voice/server/VoiceWebSocketHandler.ts → delete
system/voice/server/index.ts → remove re-exports

Step 3: Verify orchestration still works through LiveKit path

VoiceOrchestrator.ts — KEEP. Used by LiveKit path
AIAudioBridge.ts — KEEP. TTS→LiveKit publish
AudioNativeBridge.ts — KEEP. Voice-native AI models
VoiceSessionManager.ts — KEEP. Used by VoiceStartServerCommand

Step 4: Update tests

voice-websocket-transcription-handler.test.ts — delete (tests deleted handler)
Verify remaining voice tests use LiveKit path

Step 5: Verify Docker LiveKit reliability

docker compose up boots livekit + livekit-bridge + continuum-core
Verify on Mac + BigMama
Verify multi-persona live call

Architecture (non-negotiable)

No rasterization — Bevy renders GPU-accelerated, texture-ID approach. LiveKit carries metadata, not pixels.
Pointers not copies — binary IPC for audio/video. Base64 banned for real-time data.
UDP fire-and-forget — WebRTC handles this natively.
Dynamic resolution — 6 tiers, VGA dropback under pressure.
LiveKit always-on — the transport backbone, not a feature flag.

Test plan

npx tsc --noEmit — zero errors
docker compose config --quiet — valid
Step 1-2: Remove old WS server, verify boot without port 3001
Step 3: Verify persona response (orchestration intact)
Step 4: Run/update voice tests
Step 5: Docker LiveKit on Mac + BigMama

🤖 Generated with Claude Code

VoiceStartServerCommand now returns LiveKit URL + JWT token instead of spinning up a legacy WebSocket server on port 3001. Same LiveKit token generation pattern as collaboration/live/join. Port 3001 is no longer needed — Docker compose never exposed it, so it was already dead in containerized deployments. VoiceStartResult type adds livekitUrl + livekitToken fields (wsUrl kept for backwards compat, set to same as livekitUrl). VoiceChatWidget browser-side migration (raw WS → AudioStreamClient LiveKit transport) is the next step — this commit unblocks it by providing the correct server-side response shape. Existing TTS→STT roundtrip tests (livekit-audio-roundtrip.test.ts, sensory_pipeline_test.rs with gunfire/noise injection) validate the audio pipeline independently. Once the widget is wired to LiveKit, those tests cover the full voice path.

- VoiceChatWidget: replace raw WebSocket + AudioWorklet with AudioStreamClient (LiveKit WebRTC). 427→178 lines. VOICE_WS_PORT/3001 eliminated from browser side. - VoiceStartTypes: required result fields (handle, livekitUrl, livekitToken, roomId) now required in factory params — no more optional + empty-string defaults hiding compile-time errors. Remove dead wsUrl field (legacy port-3001, zero consumers). - docker-compose: livekit + livekit-bridge moved to profiles: [live]. Text chat works without WebRTC infrastructure. Carl saves ~300MB RAM and doesn't wait for LiveKit to boot. - continuum-core depends_on: removed hard dep on livekit-bridge (profile-gated services can't be in depends_on). Core discovers bridge via socket at startup when live profile is active.

LiveKit provides the UDP/WebRTC transport that made 14 personas + 4 LLMs + TTS/STT + Bevy avatars work simultaneously on M1. Profiling it out would degrade the product to text-only. Same principle as Docker Model Runner — efficient transport is a core requirement. Restores: livekit + livekit-bridge as always-on services, continuum-core depends_on livekit-bridge health.

VoiceChatWidget now uses AudioStreamClient (LiveKit WebRTC) for all audio capture and playback. These worklet processors were only loaded by the old raw-WebSocket code path that was replaced in the previous commit. No remaining references in the codebase.

Remove VoiceWebSocketHandler.ts (586 lines) — its entire functionality is handled by the LiveKit WebRTC transport: - Audio capture/playback → LiveKit SDK + AudioStreamClient - STT triggering → LiveKit VAD + Rust STT listener - Transcription routing → CollaborationLiveTranscriptionServerCommand - TTS synthesis → AIAudioBridge → voiceSpeakInCall IPC → LiveKit publish Keep: VoiceOrchestrator (persona routing), AIAudioBridge (TTS→LiveKit), AudioNativeBridge (voice-native AI models), VoiceSessionManager. Port 3001 no longer binds on server startup. Tests updated to reference LiveKit path instead of deleted handler.

…eted handler)

Copilot

Pull request overview

This PR migrates the voice chat browser path off the legacy port-3001 raw WebSocket + AudioWorklet pipeline and standardizes on LiveKit/WebRTC (via AudioStreamClient), with voice/start now returning LiveKit connection details.

Changes:

Remove the legacy browser AudioWorklet capture/playback processors.
Rewrite VoiceChatWidget to join LiveKit via AudioStreamClient and drive UI state/events from LiveKit mic levels, transcription, and active-speaker signals.
Update voice/start types + server implementation to return livekitUrl + livekitToken (and remove the legacy wsUrl field); update docs/compose commentary to reflect LiveKit always-on.

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
src/widgets/voice-chat/voice-playback-processor.js	Deletes legacy playback AudioWorklet processor.
src/widgets/voice-chat/voice-capture-processor.js	Deletes legacy capture AudioWorklet processor.
src/widgets/voice-chat/VoiceChatWidget.ts	Switches widget transport to LiveKit `AudioStreamClient` and updates state/event wiring.
src/commands/voice/start/shared/VoiceStartTypes.ts	Replaces `wsUrl` with required `livekitUrl`/`livekitToken` fields in result typing/factory.
src/commands/voice/start/server/VoiceStartServerCommand.ts	Generates LiveKit JWT + returns LiveKit URL instead of starting/using port-3001 voice WS server.
docs/planning/ALPHA-GAP-ANALYSIS.md	Documents the LiveKit migration and remaining cleanup.
docker-compose.yml	Updates comments to emphasize LiveKit/livekit-bridge always-on assumptions.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-17T19:36:38Z

+    // LiveKit URL for browser connection
+    const livekitUrl = getSecret('LIVEKIT_URL') || 'ws://localhost:7880';


livekitUrl is sourced from getSecret('LIVEKIT_URL'), but in docker-compose the node-server default LIVEKIT_URL points at the Docker-internal hostname (ws://livekit:7880). Returning that to the browser will fail because the browser can’t resolve livekit. Align this with LiveJoinServerCommand by returning a browser-reachable URL (e.g., fall back to @shared/AudioConstants.LIVEKIT_URL / getWebSocketUrl(LIVEKIT_TLS_PORT) or introduce/use a dedicated LIVEKIT_BROWSER_URL secret/env).

Copilot · 2026-04-17T19:36:39Z

+    const apiKey = getSecret('LIVEKIT_API_KEY') || LIVEKIT_API_KEY;
+    const apiSecret = getSecret('LIVEKIT_API_SECRET') || LIVEKIT_API_SECRET;
+    const token = new AccessToken(apiKey, apiSecret, {


getSecret calls here omit the requestedBy argument, which means SecretManager audit logs will record these reads as coming from unknown. Pass a stable identifier (e.g., 'VoiceStartServerCommand') like LiveJoinServerCommand does so secret access is traceable in logs.

Copilot · 2026-04-17T19:36:40Z

+      this.handle = result.handle;
+      this.localUserId = result.roomId; // sessionId used as identity in JWT
+
+      // Create AudioStreamClient wired to our state


localUserId is being set to result.roomId, but LiveKit speaker identities are participant.identity from the JWT (here generated from params.sessionId). This will cause isSpeaking/isAISpeaking to be computed incorrectly (and can mark AI speaking when only the local user is speaking). Prefer returning the LiveKit identity in VoiceStartResult (e.g., identity/userId) or deriving it after connect (e.g., from the connected room/local participant) and using that for active-speaker comparisons.

Copilot · 2026-04-17T19:36:40Z

  async start(): Promise<void> {
    try {
-      // Resume audio context if suspended (browser autoplay policy)
-      if (this.audioContext?.state === 'suspended') {
-        await this.audioContext.resume();
-      }
+      // Get LiveKit credentials from voice/start command
+      const result: VoiceStartResult = await VoiceStart.execute({
+        room: this.roomId || 'general',
+      });

-      // Initialize audio if needed
-      if (!this.audioContext) {
-        await this.initAudio();
+      if (!result.success) {
+        throw new Error(result.error?.message || 'Failed to start voice session');
      }

-      // Start voice session via command to get handle
-      if (!this.handle) {
-        const result = await VoiceStart.execute({
-          room: this.roomId || 'general',
-        });
-
-        if (!result.success) {
-          throw new Error(result.error?.message || 'Failed to start voice session');
-        }
+      this.handle = result.handle;
+      this.localUserId = result.roomId; // sessionId used as identity in JWT
+
+      // Create AudioStreamClient wired to our state
+      this.audioClient = new AudioStreamClient({
+        onConnectionChange: (connected) => {
+          this.updateState({ isConnected: connected, error: connected ? null : 'Disconnected' });
+        },
+        onMicLevel: (level) => {


start() unconditionally creates a new server session and a new AudioStreamClient even if the widget is already listening/connected. Calling start() twice (or toggle() rapidly) can leak LiveKit connections and orphan server-side handles. Consider guarding with if (this.voiceState.isListening) return; or calling await this.stop() before starting a new session, and ensure any existing audioClient is left before overwriting it.

Copilot · 2026-04-17T19:36:40Z

+        onActiveSpeakersChanged: (speakerIds: string[]) => {
+          this.activeSpeakers = new Set(speakerIds);
+          const isSpeaking = this.activeSpeakers.has(this.localUserId);
+          const isAISpeaking = speakerIds.some(id => id !== this.localUserId);
+
+          this.updateState({ isSpeaking, isAISpeaking });
+
+          if (isSpeaking) {
+            Events.emit('voice:speaking:start', { roomId: this.roomId });
+          } else {
+            Events.emit('voice:speaking:end', { roomId: this.roomId });
+          }
+          if (isAISpeaking) {
+            Events.emit('voice:ai:speaking:start', { roomId: this.roomId });
+          } else {
+            Events.emit('voice:ai:speaking:end', { roomId: this.roomId });
+          }
+        },


onActiveSpeakersChanged emits voice:speaking:start/end and voice:ai:speaking:start/end on every active-speaker list change, even when isSpeaking/isAISpeaking didn't transition. This can produce duplicate start/end events (e.g., when another participant starts speaking while the user is already speaking). Track the previous speaking states and only emit the corresponding events when the boolean value changes.

joelteply

Reviewed for partnership / second pair of eyes.

LGTM from install-reliability lens. Key strengths:

1441 deletions vs 209 additions = real cleanup, not just shuffle
docker-compose.yml prose embeds the architectural reasoning (texture-IDs over UDP, "do not profile-gate this") right next to the config — future readers won't repeat the profile-decouple mistake
VoiceStartServerCommand.ts cleanly migrated to LiveKit JWT, getSecret() with sensible dev defaults at line 95-96
ALPHA-GAP-ANALYSIS.md updated with the right architectural call (LiveKit always-on, same as DMR)
References the generator type-safety fix from this PR (b96a652) so result fields are properly required

Two follow-up observations (not blocking — file as separate issues if useful):

LIVEKIT_API_KEY/SECRET defaults — dev keys (devkey / secret) match livekit-server --dev's defaults so local installs work zero-config. For production deployments where Carl's instance is exposed (e.g., Tailscale grid hosting), those keys should be auto-generated at install time and written to ~/.continuum/config.env so each user's grid has unique credentials. setup.sh could do this — openssl rand -hex 32 for both. Could fold into my install-reliability work or your call.
VoiceWebSocketHandler.ts still on boot — explicitly punted to a follow-up PR (per gap analysis prose). Fine as long as we track it. The cleanup now means LiveKit is the only Voice client path, but the server-side WS handler still listens on 3001 unnecessarily. Worth a follow-up issue / commit to remove.

Otherwise — ship it. The architecture story is intact.

🤖 anvil reviewed via gh pr review while working on PR #913 (install-reliability lane). Both lanes converge on main when ready.

joelteply

Type-safety audit (per Joel's request to check for the same pattern your VoiceStartTypes fix was about):

Found one real violation in src/widgets/voice-chat/VoiceChatWidget.ts:

// line 26
public roomId: string = '';                                    // required type, empty sentinel default

// line 47 — optional in constructor
constructor(options?: { roomId?: string; onStateChange?: ... })

// line 48-49 — conditional assignment
if (options?.roomId) {
  this.roomId = options.roomId;
}

// line 72 — silent fallback when empty
room: this.roomId || 'general',

Three layers all hiding the same gap: roomId is declared string (required) but allowed to land at '', optional in the constructor, then falls back to 'general' at use-time. Same pattern your VoiceStartTypes hand-fix was correcting + my generator change formalized — the type promises required, the runtime accepts missing, the || covers the gap.

Honest shape depends on intent:

(a) roomId is genuinely required for this widget to function → public roomId: string (no default), constructor(options: { roomId: string; ... }) (required in options), kill the || 'general' (if no roomId, throw or refuse to start).

(b) 'general' is the documented default when none is specified → public roomId = 'general' (no empty sentinel), constructor(options?: { roomId?: string; ... }) is fine (default kicks in via class init), kill the || 'general' at line 72 (already initialized).

Either is honest. Current shape has the worst of both — looks required, behaves optional, has a magic-string fallback nobody can audit by reading the type alone.

Other || short-circuits in this PR (params.sessionId || 'anonymous', getSecret('LIVEKIT_API_KEY') || LIVEKIT_API_KEY) look fine — those genuinely have defensible defaults at the right layer.

Catch this before merge if straightforward; otherwise track for a follow-up commit on this branch.

VoiceChatWidget: roomId defaults to 'general' at class init, not empty string with runtime || fallback. Eliminates three-layer indirection (empty sentinel → optional constructor → runtime check). Addressed anvil's PR review. voiceSynthesize: default 120s timeout (up from 60s) to accommodate ONNX→Metal JIT cold start on M1. Subsequent calls are <2s.

#915 filed

…atch)

joelteply · 2026-04-17T20:04:30Z

Verification Proof — M1 Pro (MacBookPro-1959)

Branch: fix/voice-livekit-migration
SHA: cc0bb3f21
Date: 2026-04-17T20:04:13Z
Machine: MacBookPro-1959.lan (Darwin 25.0.0, arm64)

Check	Result	Detail
TypeScript compilation	PASS	Zero errors
Port 3001 not bound	PASS	Old voice WS server removed
VoiceWebSocketHandler.ts deleted	PASS	File removed
voice-start.json spec	PASS	Has livekitUrl + livekitToken
VoiceStartTypes type safety	PASS	Required fields enforced in factory
docker-compose.yml valid	PASS	Validates cleanly
LiveKit not profile-gated	PASS	Always-on in compose
jtag ping	SKIP	System not booted (ORT Metal deadlock #915 blocks TTS warmup on M1)
AudioWorklet processors deleted	PASS	Dead files removed

8/9 pass. 1 skip (jtag ping blocked by #915 — ORT Metal EP deadlock on M1, not a regression from this PR).

Script: scripts/verify-pr-914.sh — run it to reproduce.

Two bugs causing zero GPU usage on local personas: 1. CandleAdapter::initialize() eagerly loaded 2.5GB GGUF via embedded llama.cpp on every startup — even though Candle is training-only. This wasted RAM, caused Metal assertion crashes on M1 exit, and the adapter was making resource decisions it has no authority to make. Fix: initialize() just logs ready, no model load. Lazy-load on explicit training request only. 2. AIProviderDaemon.selectAdapter() hard-coded 'local' → 'candle' aliasing ("Candle is the ONLY local inference path"). Wrong since DMR pivot. Fix: 'local' now routes through Rust IPC adapter which has DMR registered at priority 0 (GPU). Candle only as last resort.

Was pointing at .continuum/jtag/data/database.sqlite which doesn't exist on any install — reseed silently failed because data:reseed → data:clear → data:backup hit `cp: source not found`, &&-chain halted, data-clear.ts never ran. Switch to sqlite3 .backup (WAL-safe — works with running system, correctly captures uncommitted writes from main.db-wal). Backups now live in ~/.continuum/backups/ (consistent with the ~/.continuum/* convention everything else uses). Live-tested on M5: 496MB main.db backed up cleanly while the system was running. Memento bug list #8. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

ContentTypeRegistry threw 'Unknown content type: metrics' when clicked because no metrics.json recipe existed. Mirror of diagnostics.json but pointed at metrics-detail-widget (the detail timeseries view) instead of diagnostics-widget. Memento bug list #2. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…boot CodebaseIndexer ran 64-batches back-to-back with NO yield between batches. Each batch ~1.5s + ~80MB RSS growth. With 5000+ chunks in src/, that's 78+ batches × 1.5s = 2+ minutes of total event-loop saturation immediately after every boot. Local personas couldn't respond, voice couldn't connect, anything that needed the bus was blocked until indexing finished. Two changes: - Batch size 64→16 (smaller per-batch RSS hit, ~4× more chances for other IO to interleave between IPC roundtrips) - 50ms pause between batches via setTimeout (yields the event loop so chat/voice/personas can process while indexing runs) The throughput cost is small (16 vs 64 chunks per IPC) and the inter-batch pause is invisible at human timescales. The chat-arrival latency win is huge — system is responsive within seconds of boot instead of minutes. The deeper fix is querying GpuPressureWatcher / ResourcePressureWatcher before each batch and backing off when pressure is high — same principle Joel called out for InferenceCoordinator slot capacity. That's a follow-up; this is the floor. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

joelteply · 2026-04-17T21:19:22Z

M5 Verification — Candle eager-load fix WORKS but two follow-up bugs surface

What's confirmed working ✅

Test 1 (direct): ./jtag ai/generate --provider=local → provider: "docker-model-runner" confirmed. Routing correctly lands on DMR. Persona seed config (provider: "local") → registry select() → DMR adapter, no more silent Candle bypass.
Test 2 (chat): Helper AI, Teacher AI, Local Assistant, CodeReview AI all responded to chat msgs after the embedding-storm settled. Routing+chat pipeline functional end-to-end.
DMR backend: docker model status --json confirms llama.cpp latest-metal + vllm-metal both Running. Metal IS the backend.
Embedding storm fix (commit c1c6d62d4, also pushed to this branch): batch 64→16 + 50ms inter-batch yield. CodebaseIndexer no longer monopolizes event loop for 2+min after boot.
data:backup fix (commit 8164d6ca7): pointed at real ~/.continuum/database/main.db path with sqlite3 .backup (WAL-safe). Live-tested 496MB backup on M5.
metrics.json recipe (commit 8901f2618): fixes Metrics tab crashes: Unknown content type 'metrics' — missing recipe #916 metrics tab crash.

What surfaces as the real-tier bottleneck ⚠️

1. llama.cpp Metal DeltaNet kernels are unoptimised for Qwen3.5.
DMR logs on M5 show prompt eval at 379 tok/s (GPU territory), but predicted (output) tok/s for Qwen3.5-4b-code-forged drops to ~4 tok/s on a 296-token completion. Qwen3.5 uses Gated DeltaNet (recurrence, not pure transformer). ssm_conv / ssm_scan / gated_delta_net Metal shaders in ggml/src/ggml-metal/ have a documented ~14× regression. This is an upstream llama.cpp gap, not a routing bug. Options: (a) patch our vendor copy of llama.cpp's Metal shaders, (b) install MLX backend (docker model status shows mlx: Not Installed), (c) default first-chat experience to Qwen2.5 (pure transformer, ~33 tok/s on M5) until DeltaNet kernels land.

2. Personas hit a 14,443-token prompt window per chat call (per docker model logs n_tokens = 14443). RAG isn't budgeting context. Even on optimised kernels this would hurt latency — current behaviour starts users with multi-second prompt eval before any output token streams. Plus it caused the visible echo-chamber: 14 chat replies from 4 personas all converging on identical "Webview authentication" hallucination because they're all loading the same bloated context. Separate fix: enforce RAG budget caps in PersonaResponseGenerator chain.

Recommendation

Merge this PR with the routing + Candle-eager-load fix as the user-facing first-chat unblock. File the Metal DeltaNet shader work and RAG budget bug as follow-up issues — both are real but neither belongs in a voice-LiveKit-migration PR.

Docker Model Runner defaults to model's max context (262k for Qwen3.5). With concurrent persona slots, KV cache balloons to 20GB+ on a 32GB machine, causing swap thrash and making the system unusable. 4096 context is sufficient for chat (RAG budget capped at 2-4k tokens). Drops llama-server from 20.87GB to ~1-2GB. Applied after model pull in install.sh so Carl and Dev both get it. Also: RAG context budget needs separate fix (currently sends 14k tokens to model, which is the actual prompt bloat — anvil working on that).

ChatRAGBuilder computed totalBudget = floor(contextWindow * 0.75). For Qwen3.5-4b which advertises a 262144-token window, that's 196608 tokens — a budget no chat turn would ever sensibly fill. Two costs from leaving it that wide: 1. RAG composition still ran with that budget, producing prompts ~14k tokens that were 10× what a chat turn needs. 2. llama-server allocated full 262k KV cache PER PERSONA SLOT. Activity Monitor on M5 (Joel): com.docker.llama-server 20.87 GB resident, total 44 GB across 4 personas vs 32 GB physical = swap. CHAT_INPUT_BUDGET_CEILING = 8192. Sized for chat: ~2k system prompt + ~3k recent history + ~3k RAG context. Specialized recipes (research, codereview) that legitimately need more can opt up via their own RAGBuilder subclass. This fix touches the RAG budget number only. The KV cache slot size inside DMR's llama-server is set per-model at pull time and is a separate (and harder) lever — capping the input prompt is what we control from this layer. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

This reverts commit e57bcaf.

…c 8192 Joel's correction: no ceilings. The budget should be derived from the model's own characteristics, not a hardcoded escape hatch. Previous commit set CHAT_INPUT_BUDGET_CEILING = 8192 as a workaround for the 196k → 14k → OOM chain. That's the same anti-pattern as hardcoded provider routing — a magic number in a builder instead of the authority deciding. The right authority already exists: getLatencyAwareTokenLimit(model) returns the input ceiling that fits a chat-acceptable response time given the model's measured TPS. It's already used on line 616 for the message fetch limit. Apply it here for the total budget too. Slow local model → latency-aware budget (Qwen3.5-4b on M5: ~24 TPS × 30s target = ~720 tokens — appropriately tight for the model). Fast cloud model → full 75% of context window. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Three composable wins designed together: 1. Stable-first RAG ordering → llama-server prefix KV cache reuse → ~70× prompt-eval speedup (14k tokens reprocessed → ~200) 2. Multimodal content parts → delete STT/TTS sandwich for Qwen3.5 → 1 model invocation per voice turn instead of 3 3. Voice LoRA per persona → identity, not signal — the "Maya replied" experience that differentiates from Claude Code / OpenClaw / Aider Acceptance: 6-persona LiveKit room on M5, voice turn round-trip <3s, total resident memory <8 GB, audio output recognizably persona-specific. Companion to issue #917 (ModelMetadata refactor) — Phases 4-5 below depend on capability-declaration flowing through. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Joel's clarification: STT/TTS doesn't disappear. It becomes the universal substrate that gives ANY model class — niche 1B specialists, older Llama 3.1 text-only, cloud providers without audio — the same first-class persona experience. Local multimodal-native is the fast path; the bridge is what lets us mix model classes freely so users never know which class is actually serving their teammate. Updated decision matrix to cover all four model classes (local multimodal, cloud multimodal, cloud text-only, local text-only) and how voice identity stays a first-class property regardless. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

joelteply added 4 commits April 17, 2026 13:55

Copilot AI review requested due to automatic review settings April 17, 2026 19:31

Copilot started reviewing on behalf of joelteply April 17, 2026 19:31 View session

github-actions Bot added the size: XL label Apr 17, 2026

joelteply added 2 commits April 17, 2026 14:34

fix: update voice event test comments (reference LiveKit path not del…

a86a3bf

…eted handler)

Copilot AI reviewed Apr 17, 2026

View reviewed changes

joelteply commented Apr 17, 2026

View reviewed changes

joelteply added 4 commits April 17, 2026 14:46

docs: update gap analysis — VoiceWebSocketHandler deleted, TTS deadlock

12478df

#915 filed

fix: update voice-start.json spec to match LiveKit migration (anvil c…

cc0bb3f

…atch)

test: add PR #914 verification script — 8/9 checks pass

3f4c143

joelteply and others added 5 commits April 17, 2026 15:13

fix: verify script skip case + port check (anvil cross-test feedback)

f564f33

joelteply and others added 5 commits April 17, 2026 16:20

Revert "fix: cap DMR context window to 4096 — prevents 20GB KV cache"

7490446

This reverts commit e57bcaf.

fix: remove hardcoded 128000 context window — use model's actual value

f72da60

joelteply mentioned this pull request Apr 17, 2026

ModelMetadata refactor: declarative struct, no Option<>, adapter queries its own source #917

Open

joelteply mentioned this pull request Apr 17, 2026

Multimodal-native worker + prefix-reuse — collapse voice turn from 15s to 3s on a single laptop #918

Open

7 tasks

This was referenced Apr 17, 2026

Personas go silent after first response wave — Rust full_evaluate gate or InferenceCoordinator slot leak #919

Closed

feat(rag): Phase 1 — stable-first ordering for prefix-reuse (#918) #920

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

voice: retire legacy WS transport, unify on LiveKit WebRTC#914

voice: retire legacy WS transport, unify on LiveKit WebRTC#914
joelteply wants to merge 22 commits intomainfrom
fix/voice-livekit-migration

joelteply commented Apr 17, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 17, 2026

Uh oh!

Copilot AI Apr 17, 2026

Uh oh!

Copilot AI Apr 17, 2026

Uh oh!

Copilot AI Apr 17, 2026

Uh oh!

Copilot AI Apr 17, 2026

Uh oh!

joelteply left a comment

Uh oh!

joelteply left a comment

Uh oh!

joelteply commented Apr 17, 2026

Uh oh!

joelteply commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		// LiveKit URL for browser connection
		const livekitUrl = getSecret('LIVEKIT_URL') \|\| 'ws://localhost:7880';

Conversation

joelteply commented Apr 17, 2026

Summary

Remaining work on this PR

What the exploration found

Step-by-step plan

Architecture (non-negotiable)

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

joelteply left a comment

Choose a reason for hiding this comment

Uh oh!

joelteply left a comment

Choose a reason for hiding this comment

Uh oh!

joelteply commented Apr 17, 2026

Verification Proof — M1 Pro (MacBookPro-1959)

Uh oh!

joelteply commented Apr 17, 2026

M5 Verification — Candle eager-load fix WORKS but two follow-up bugs surface

What's confirmed working ✅

What surfaces as the real-tier bottleneck ⚠️

Recommendation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants