feat: voice pipeline + bus-mediated chat + bargein provider registry#4
Merged
chazmaniandinkle merged 12 commits intocogos-dev:mainfrom Apr 20, 2026
Merged
Conversation
Migrated from cog-workspace/apps/tts-mcp — consolidating development into the canonical mod3 repo. Voice Pipeline (5 phases): - Three-tier adaptive STT (Whisper Base 31ms + Large 470ms) - Speculative generation (agent thinks while human speaks) - Opacity-as-state rendering (transparent → solidifying → solid) - Barge-in context stitching (state snapshot on interrupt) - Self-barge draft revision (agent revises its own queued output) New files: - draft_queue.py: Thread-safe DraftQueue for speculative generation - mcp_shim.py: Lightweight MCP-to-HTTP proxy (no model loading) Modified: - agent_loop.py: Context stitching, speculative inference, self-barge - channels.py: Three-tier STT scheduler - modules/voice.py: decode_streaming(), Whisper Base loader, TTS validation - dashboard/index.html: Opacity CSS, solidification, partials, queue preview - dashboard/playback.js: Progress tracking for word-level solidification - server.py: Session-aware queue foundations Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…matting) - Remove unused imports: typing.Any, VoiceEncoder, WebSocketDisconnect, struct, mimetypes - Add missing asyncio import for to_thread() in speculative TTS - Prefix unused variables with _ (full, check_messages, loop) - Auto-fixed by ruff --fix + manual corrections Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ype error) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Wires Mod³'s dashboard chat to route user messages through the cogos kernel's running metabolic-cycle agent instead of the local MLX Gemma provider. When MOD3_USE_COGOS_AGENT=1, user turns flow as bus events (bus_dashboard_chat → kernel inlet → harness observation → respond tool → bus_dashboard_response) and render back in the dashboard as response_text frames. Voice and text now share a single conversation through the same metabolic cycle. Also lands bidirectional barge-in context stitching on the WebSocket path: BargeinContext schema + agent_loop injection into next-turn system prompt. Fixes the gap where dashboard interruptions halted TTS but didn't surface structured context to the agent (previously only the MCP/SuperWhisper file-signal path injected it). 6 new bargein tests; 2 new bus-bridge tests; 5 new cogos-agent bridge tests. 47 pytest collect total. Dashboard: live Cycle Trace drawer consuming bus_cycle_trace via SSE subscriber. Bottom-drawer UI, 100-entry rolling window, collapsible with localStorage. ort.min.js + WASM for VAD runtime. Whisper default pinned to whisper-base-mlx to reduce concurrent MLX Metal pressure (Gemma + Kokoro + Whisper segfault). Large-v3-turbo restoration is a separate MLX-stability fix; voice-input path still crashes on mic due to underlying MLX concurrency issue (known, tracked separately). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…st provider Refactor barge-in detection into a pluggable in-process primitive. Sources emit BargeinEvents through a callback; a registry routes them into the same consumer helper the legacy /tmp/mod3-barge-in.json file watcher uses. The SuperWhisper SQLite+filesystem detection logic (previously drifting in the cog workspace at .cog/bin/bargein-producer.py) is absorbed as the first provider, opt-in via MOD3_BARGEIN_PROVIDERS=superwhisper. - bargein/providers/base.py: BargeinProvider + BargeinEvent (thread-based) - bargein/providers/superwhisper.py: in-process SW watcher emitting via callback - bargein/__init__.py: BargeinRegistry + handle_bargein_start() shared helper - schemas/bargein.py: add "superwhisper" to BargeinSource literal - server.py: factor _bargein_watcher onto handle_bargein_start; start registry from env after legacy watcher (empty default preserves current behavior) - integrations/bargein-producer.py: reconcile drift from workspace (SQLite DB as structural ground truth, generous 150s stale timeout) + deprecation note - mcp_shim.py: unify signal path to /tmp/mod3-barge-in.json (was orphan ~/.mod3_bargein_signal.json that nobody wrote to) - tests: 14 new tests covering lifecycle, dispatch, subscribers, env loader
feat(bargein): first-class provider registry with SuperWhisper as first provider
10 tasks
Four fixes covering the blocker and three concerns from the gpt-5.4 review.
Fix 1 (BLOCKER) — await_voice_input() with new provider model
bargein/__init__.py:
* BargeinRegistry.wait_for_event(event_type, source=None, timeout=None) —
synchronous wait primitive that subscribes-and-waits, returning the
matching BargeinEvent or None on timeout. Auto-unsubscribes on return.
* BargeinRegistry.unsubscribe() — public counterpart to subscribe().
* make_file_mirror_subscriber(path) — factory for a subscriber that
mirrors registry events into the legacy /tmp/mod3-barge-in.json signal
file so out-of-process pollers (mcp_shim.py) keep receiving events
from in-process providers.
server.py:
* await_voice_input() now waits on BOTH the registry's user_speaking_end
events AND the legacy file (whichever fires first wins). With
MOD3_BARGEIN_PROVIDERS=superwhisper set, the registry path delivers;
without it, the file path keeps working.
* Registry installs the file-mirror subscriber at startup so file
pollers in other processes still see in-process events.
Fix 2 (CONCERN) — owner-aware speaking lock
server.py:
* Lock payload gains acquired_at (ISO timestamp) alongside pid + job_id.
* _acquire_speaking_lock returns bool. Acquires when file is missing,
holder pid is dead, or (pid, job_id) matches (idempotent re-acquire).
Returns False when a different LIVE process owns the speaker.
* _release_speaking_lock(job_id=None) only removes the file when
(pid, job_id) match. Two overlapping mod3 processes can no longer
delete each other's locks.
* _i_own_speaking_lock(job_id) is the new "stop-on-mismatch" predicate
used by the speech generation loop in place of "stop-on-disappear".
* _force_clear_speaking_lock() preserves the cross-process kill path
used by the bargein watcher (the only place that legitimately removes
another process's lock).
* _is_any_process_speaking() uses pid-alive (os.kill(pid, 0)) instead
of a 60s timestamp window to detect stale locks.
Fix 3 (CONCERN) — KernelBusSubscriber honors COGOS_ENDPOINT
bus_bridge.py:
* default_stream_url() resolves COGOS_ENDPOINT at call time and appends
/v1/events/stream. KernelBusSubscriber(url=None) now defaults to it
so sends and receives target the same kernel host.
* KERNEL_BUS_STREAM_URL kept as a back-compat module attribute.
bus_bridge_runner.py / cogos_agent_bridge.py:
* start_bridge() and start_response_bridge() take url=None and resolve
from COGOS_ENDPOINT, so the lifespan wiring tracks env at startup.
* cogos_agent_bridge.post_user_message() resolves the bus-send URL at
call time too — no more import-time freeze.
Fix 4 (CONCERN) — session-scoped browser routing for kernel replies
cogos_agent_bridge.py:
* _extract_session_id(payload) digs the optional session_id out of
the kernel reply payload (top-level OR JSON-wrapped under content).
* run_response_bridge() now passes session_id through to
BrowserChannel.broadcast_response_text(), which already supports
mod3:<channel_id> routing. Replies WITHOUT a session_id (older kernel,
non-session-scoped events) fall back to broadcast — preserves
backward compatibility per the user's gating note.
Tests
* tests/test_bargein_provider_registry.py — wait_for_event happy path,
event-type filter, source filter, timeout, subscriber cleanup,
file-mirror subscriber writes the legacy signal payload.
* tests/test_speaking_lock.py (new) — full owner-awareness coverage:
acquire idempotence, blocked-by-live-other-process, dead-pid reclaim,
release pid+job_id gating, cross-process false-interrupt regression,
stale-lock auto-removal.
* tests/test_bus_bridge.py (new) — default_stream_url + subscriber
construction honors COGOS_ENDPOINT (and explicit url= still wins).
* tests/test_cogos_agent_bridge.py — session_id extraction (top-level,
content-wrapped, missing), run_response_bridge forwards session_id
when present and falls back to broadcast when absent, post_user_message
resolves COGOS_ENDPOINT at call time.
* tests/test_browser_channel_routing.py (new) — broadcast routing:
no session_id → fan out, mod3:<channel_id> → only the matching
channel, malformed prefix → broadcast fallback, ghost session → no
delivery.
95 passed (53 in the touched/new files).
4 tasks
Contributor
Author
|
@codex Addressing all four findings — see chazmaniandinkle#2 (sub-PR into this branch). Fix commit: 614e60ab.
Tests: 95 passed (53 in touched/new files). New: Once chazmaniandinkle#2 merges, this PR will update in place. |
Nit 1: Add integration tests for await_voice_input() regression.
Three tests cover the registry-dispatch path (the original regression),
the legacy file-write path (backward-compat), and the timeout path
(negative control). Previously only wait_for_event was unit-tested in
isolation — the actual await_voice_input function had no direct coverage.
Nit 2: Unify await_voice_input's dual wait paths.
Extended _bargein_watcher to bridge file user_speaking_end events into
the registry as synthetic BargeinEvents. await_voice_input now makes a
single wait_for_event("user_speaking_end", timeout=...) call instead of
hand-rolling its own subscribe-poll-loop. Feedback between the file
mirror and watcher is broken via the via=bargein_registry marker.
The registry is now constructed before the watcher thread starts so the
bridge can reference it safely on the first iteration.
Tests: 98 passing (was 95; +3 new).
fix: address Codex review cogos-dev#4 (1 blocker, 3 concerns)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Brings the voice pipeline migration to upstream — bidirectional voice (capture → STT → agent_loop → TTS → playback), MCP shim, dashboard chat, kernel-mediated chat substrate, and a pluggable barge-in provider registry.
What's in the stack (oldest → newest)
576ae94df8b379da06193f7f0df9player.stop()→player.flush()for barge-in interrupt (pyright)503057e3b9eab0d362c10Highlights
Bidirectional voice — full duplex audio pipeline.
capture.pydoes mic input + WebRTC echo cancellation;agent_loop.pyruns Anthropic-provider tool-use turns;adaptive_player.pydoes TTS playback with barge-in interrupt support.Bus-mediated chat — dashboard chat goes through cogos kernel buses instead of in-process loop, so external observers (other agents, tracers) see the same conversation events.
Bargein provider registry — was a hardcoded SuperWhisper file watcher; now a pluggable
BargeinProviderinterface. SuperWhisper is the first provider, opt-in viaMOD3_BARGEIN_PROVIDERS=superwhisper. Future providers (silero mic-VAD, push-to-talk) drop in without touchingserver.py. Default behavior unchanged (empty env = no auto-start).Signal path unification —
mcp_shim.pywas reading from an orphan path (~/.mod3_bargein_signal.json) that nothing wrote to. Now matchesserver.py's/tmp/mod3-barge-in.json.Test plan
pytest— 61 passed (was 47, +14 new bargein registry tests)/dashboardwith kernel running on:6931bus_dashboard_chatwhen chattingCompatibility
MOD3_BARGEIN_PROVIDERS=(empty) — no providers auto-start, zero behavior change/tmp/mod3-barge-in.jsonfile watcher untouched, standalone producer still works/tmp/mod3-speaking.json) handling unchanged