Skip to content

feat: voice pipeline + bus-mediated chat + bargein provider registry#4

Merged
chazmaniandinkle merged 12 commits intocogos-dev:mainfrom
chazmaniandinkle:feat/voice-pipeline-migration
Apr 20, 2026
Merged

feat: voice pipeline + bus-mediated chat + bargein provider registry#4
chazmaniandinkle merged 12 commits intocogos-dev:mainfrom
chazmaniandinkle:feat/voice-pipeline-migration

Conversation

@chazmaniandinkle
Copy link
Copy Markdown
Contributor

Summary

Brings the voice pipeline migration to upstream — bidirectional voice (capture → STT → agent_loop → TTS → playback), MCP shim, dashboard chat, kernel-mediated chat substrate, and a pluggable barge-in provider registry.

What's in the stack (oldest → newest)

Commit Theme
576ae94 feat: bidirectional voice pipeline + MCP shim + dashboard enhancements
df8b379 fix: 9 ruff lint errors
da06193 style: ruff formatting
f7f0df9 fix: player.stop()player.flush() for barge-in interrupt (pyright)
503057e feat: bus-mediated dashboard chat — cogos kernel as inference backend
3b9eab0 feat(bargein): first-class provider registry + SuperWhisper as first provider
d362c10 Merge PR #1 (bargein providers, fork-internal)

Highlights

Bidirectional voice — full duplex audio pipeline. capture.py does mic input + WebRTC echo cancellation; agent_loop.py runs Anthropic-provider tool-use turns; adaptive_player.py does TTS playback with barge-in interrupt support.

Bus-mediated chat — dashboard chat goes through cogos kernel buses instead of in-process loop, so external observers (other agents, tracers) see the same conversation events.

Bargein provider registry — was a hardcoded SuperWhisper file watcher; now a pluggable BargeinProvider interface. SuperWhisper is the first provider, opt-in via MOD3_BARGEIN_PROVIDERS=superwhisper. Future providers (silero mic-VAD, push-to-talk) drop in without touching server.py. Default behavior unchanged (empty env = no auto-start).

Signal path unificationmcp_shim.py was reading from an orphan path (~/.mod3_bargein_signal.json) that nothing wrote to. Now matches server.py's /tmp/mod3-barge-in.json.

Test plan

  • pytest — 61 passed (was 47, +14 new bargein registry tests)
  • Manual: bargein registry round-trip via SuperWhisper recording
  • Reviewer: try the dashboard chat at /dashboard with kernel running on :6931
  • Reviewer: confirm bus events appear on bus_dashboard_chat when chatting

Compatibility

  • Default MOD3_BARGEIN_PROVIDERS= (empty) — no providers auto-start, zero behavior change
  • Legacy /tmp/mod3-barge-in.json file watcher untouched, standalone producer still works
  • Cross-process speaking lock (/tmp/mod3-speaking.json) handling unchanged

chazmaniandinkle and others added 7 commits April 15, 2026 15:29
Migrated from cog-workspace/apps/tts-mcp — consolidating development
into the canonical mod3 repo.

Voice Pipeline (5 phases):
- Three-tier adaptive STT (Whisper Base 31ms + Large 470ms)
- Speculative generation (agent thinks while human speaks)
- Opacity-as-state rendering (transparent → solidifying → solid)
- Barge-in context stitching (state snapshot on interrupt)
- Self-barge draft revision (agent revises its own queued output)

New files:
- draft_queue.py: Thread-safe DraftQueue for speculative generation
- mcp_shim.py: Lightweight MCP-to-HTTP proxy (no model loading)

Modified:
- agent_loop.py: Context stitching, speculative inference, self-barge
- channels.py: Three-tier STT scheduler
- modules/voice.py: decode_streaming(), Whisper Base loader, TTS validation
- dashboard/index.html: Opacity CSS, solidification, partials, queue preview
- dashboard/playback.js: Progress tracking for word-level solidification
- server.py: Session-aware queue foundations

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…matting)

- Remove unused imports: typing.Any, VoiceEncoder, WebSocketDisconnect, struct, mimetypes
- Add missing asyncio import for to_thread() in speculative TTS
- Prefix unused variables with _ (full, check_messages, loop)
- Auto-fixed by ruff --fix + manual corrections

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ype error)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Wires Mod³'s dashboard chat to route user messages through the cogos kernel's
running metabolic-cycle agent instead of the local MLX Gemma provider. When
MOD3_USE_COGOS_AGENT=1, user turns flow as bus events (bus_dashboard_chat → kernel
inlet → harness observation → respond tool → bus_dashboard_response) and render
back in the dashboard as response_text frames. Voice and text now share a single
conversation through the same metabolic cycle.

Also lands bidirectional barge-in context stitching on the WebSocket path:
BargeinContext schema + agent_loop injection into next-turn system prompt. Fixes
the gap where dashboard interruptions halted TTS but didn't surface structured
context to the agent (previously only the MCP/SuperWhisper file-signal path
injected it). 6 new bargein tests; 2 new bus-bridge tests; 5 new cogos-agent
bridge tests. 47 pytest collect total.

Dashboard: live Cycle Trace drawer consuming bus_cycle_trace via SSE subscriber.
Bottom-drawer UI, 100-entry rolling window, collapsible with localStorage.
ort.min.js + WASM for VAD runtime.

Whisper default pinned to whisper-base-mlx to reduce concurrent MLX Metal
pressure (Gemma + Kokoro + Whisper segfault). Large-v3-turbo restoration is a
separate MLX-stability fix; voice-input path still crashes on mic due to
underlying MLX concurrency issue (known, tracked separately).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…st provider

Refactor barge-in detection into a pluggable in-process primitive. Sources
emit BargeinEvents through a callback; a registry routes them into the same
consumer helper the legacy /tmp/mod3-barge-in.json file watcher uses. The
SuperWhisper SQLite+filesystem detection logic (previously drifting in the
cog workspace at .cog/bin/bargein-producer.py) is absorbed as the first
provider, opt-in via MOD3_BARGEIN_PROVIDERS=superwhisper.

- bargein/providers/base.py: BargeinProvider + BargeinEvent (thread-based)
- bargein/providers/superwhisper.py: in-process SW watcher emitting via callback
- bargein/__init__.py: BargeinRegistry + handle_bargein_start() shared helper
- schemas/bargein.py: add "superwhisper" to BargeinSource literal
- server.py: factor _bargein_watcher onto handle_bargein_start; start registry
  from env after legacy watcher (empty default preserves current behavior)
- integrations/bargein-producer.py: reconcile drift from workspace (SQLite DB
  as structural ground truth, generous 150s stale timeout) + deprecation note
- mcp_shim.py: unify signal path to /tmp/mod3-barge-in.json (was orphan
  ~/.mod3_bargein_signal.json that nobody wrote to)
- tests: 14 new tests covering lifecycle, dispatch, subscribers, env loader
feat(bargein): first-class provider registry with SuperWhisper as first provider
Four fixes covering the blocker and three concerns from the gpt-5.4 review.

Fix 1 (BLOCKER) — await_voice_input() with new provider model
  bargein/__init__.py:
    * BargeinRegistry.wait_for_event(event_type, source=None, timeout=None) —
      synchronous wait primitive that subscribes-and-waits, returning the
      matching BargeinEvent or None on timeout. Auto-unsubscribes on return.
    * BargeinRegistry.unsubscribe() — public counterpart to subscribe().
    * make_file_mirror_subscriber(path) — factory for a subscriber that
      mirrors registry events into the legacy /tmp/mod3-barge-in.json signal
      file so out-of-process pollers (mcp_shim.py) keep receiving events
      from in-process providers.
  server.py:
    * await_voice_input() now waits on BOTH the registry's user_speaking_end
      events AND the legacy file (whichever fires first wins). With
      MOD3_BARGEIN_PROVIDERS=superwhisper set, the registry path delivers;
      without it, the file path keeps working.
    * Registry installs the file-mirror subscriber at startup so file
      pollers in other processes still see in-process events.

Fix 2 (CONCERN) — owner-aware speaking lock
  server.py:
    * Lock payload gains acquired_at (ISO timestamp) alongside pid + job_id.
    * _acquire_speaking_lock returns bool. Acquires when file is missing,
      holder pid is dead, or (pid, job_id) matches (idempotent re-acquire).
      Returns False when a different LIVE process owns the speaker.
    * _release_speaking_lock(job_id=None) only removes the file when
      (pid, job_id) match. Two overlapping mod3 processes can no longer
      delete each other's locks.
    * _i_own_speaking_lock(job_id) is the new "stop-on-mismatch" predicate
      used by the speech generation loop in place of "stop-on-disappear".
    * _force_clear_speaking_lock() preserves the cross-process kill path
      used by the bargein watcher (the only place that legitimately removes
      another process's lock).
    * _is_any_process_speaking() uses pid-alive (os.kill(pid, 0)) instead
      of a 60s timestamp window to detect stale locks.

Fix 3 (CONCERN) — KernelBusSubscriber honors COGOS_ENDPOINT
  bus_bridge.py:
    * default_stream_url() resolves COGOS_ENDPOINT at call time and appends
      /v1/events/stream. KernelBusSubscriber(url=None) now defaults to it
      so sends and receives target the same kernel host.
    * KERNEL_BUS_STREAM_URL kept as a back-compat module attribute.
  bus_bridge_runner.py / cogos_agent_bridge.py:
    * start_bridge() and start_response_bridge() take url=None and resolve
      from COGOS_ENDPOINT, so the lifespan wiring tracks env at startup.
    * cogos_agent_bridge.post_user_message() resolves the bus-send URL at
      call time too — no more import-time freeze.

Fix 4 (CONCERN) — session-scoped browser routing for kernel replies
  cogos_agent_bridge.py:
    * _extract_session_id(payload) digs the optional session_id out of
      the kernel reply payload (top-level OR JSON-wrapped under content).
    * run_response_bridge() now passes session_id through to
      BrowserChannel.broadcast_response_text(), which already supports
      mod3:<channel_id> routing. Replies WITHOUT a session_id (older kernel,
      non-session-scoped events) fall back to broadcast — preserves
      backward compatibility per the user's gating note.

Tests
  * tests/test_bargein_provider_registry.py — wait_for_event happy path,
    event-type filter, source filter, timeout, subscriber cleanup,
    file-mirror subscriber writes the legacy signal payload.
  * tests/test_speaking_lock.py (new) — full owner-awareness coverage:
    acquire idempotence, blocked-by-live-other-process, dead-pid reclaim,
    release pid+job_id gating, cross-process false-interrupt regression,
    stale-lock auto-removal.
  * tests/test_bus_bridge.py (new) — default_stream_url + subscriber
    construction honors COGOS_ENDPOINT (and explicit url= still wins).
  * tests/test_cogos_agent_bridge.py — session_id extraction (top-level,
    content-wrapped, missing), run_response_bridge forwards session_id
    when present and falls back to broadcast when absent, post_user_message
    resolves COGOS_ENDPOINT at call time.
  * tests/test_browser_channel_routing.py (new) — broadcast routing:
    no session_id → fan out, mod3:<channel_id> → only the matching
    channel, malformed prefix → broadcast fallback, ghost session → no
    delivery.

95 passed (53 in the touched/new files).
@chazmaniandinkle
Copy link
Copy Markdown
Contributor Author

@codex Addressing all four findings — see chazmaniandinkle#2 (sub-PR into this branch). Fix commit: 614e60ab.

# Severity Status Approach
1 Blocker ✅ Fixed New BargeinRegistry.wait_for_event() primitive + file-mirror subscriber. await_voice_input() waits on the registry AND the legacy file (whichever fires first wins).
2 Concern ✅ Fixed Lock is now (pid, job_id)-scoped. Acquire blocked by live-other-process; release no-ops on mismatch; PID-alive check replaces the 60s timestamp window. _force_clear_speaking_lock() retains the cross-process kill path used by the watcher.
3 Concern ✅ Fixed KernelBusSubscriber resolves COGOS_ENDPOINT at construction (via new default_stream_url()); startup wiring + post_user_message follow suit.
4 Concern ✅ Fixed _extract_session_id() pulls the optional field; run_response_bridge() passes it to broadcast_response_text (already session-aware). Falls back to broadcast when missing — works either side of the kernel-side change landing.

Tests: 95 passed (53 in touched/new files). New: test_speaking_lock.py, test_bus_bridge.py, test_browser_channel_routing.py. Extended: test_bargein_provider_registry.py, test_cogos_agent_bridge.py. Includes a regression test for the original two-process false-interrupt bug.

Once chazmaniandinkle#2 merges, this PR will update in place.

Nit 1: Add integration tests for await_voice_input() regression.
Three tests cover the registry-dispatch path (the original regression),
the legacy file-write path (backward-compat), and the timeout path
(negative control). Previously only wait_for_event was unit-tested in
isolation — the actual await_voice_input function had no direct coverage.

Nit 2: Unify await_voice_input's dual wait paths.
Extended _bargein_watcher to bridge file user_speaking_end events into
the registry as synthetic BargeinEvents. await_voice_input now makes a
single wait_for_event("user_speaking_end", timeout=...) call instead of
hand-rolling its own subscribe-poll-loop. Feedback between the file
mirror and watcher is broken via the via=bargein_registry marker.

The registry is now constructed before the watcher thread starts so the
bridge can reference it safely on the first iteration.

Tests: 98 passing (was 95; +3 new).
fix: address Codex review cogos-dev#4 (1 blocker, 3 concerns)
@chazmaniandinkle chazmaniandinkle merged commit 4e6fb58 into cogos-dev:main Apr 20, 2026
6 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant