Skip to content

Dashboard: participant panel + /ws/audio per-session playback (Wave 4)#6

Merged
chazmaniandinkle merged 7 commits intocogos-dev:mainfrom
chazmaniandinkle:feat/dashboard-wave4
Apr 24, 2026
Merged

Dashboard: participant panel + /ws/audio per-session playback (Wave 4)#6
chazmaniandinkle merged 7 commits intocogos-dev:mainfrom
chazmaniandinkle:feat/dashboard-wave4

Conversation

@chazmaniandinkle
Copy link
Copy Markdown
Contributor

Summary

Makes the mod3 dashboard the primary audio interface. Two stacked commits.

Depends on: cogos-dev/mod3#5 (ADR-082 Phase 1). Diff will be large until #5 merges; please review the top two commits (69dd70d, a5321ee) individually.

Commits

69dd70d — feat(dashboard): participant panel + auto-register on page load

  • Fetches /v1/sessions every 3-5s; renders participant badges with session_id, voice, last_active
  • Auto-registers on load via kernel's /v1/channel-sessions/register (kernel-owned authority), falls back to mod3's direct /v1/sessions/register on CORS/unreachable (expected fallback today — kernel has no CORS yet)
  • Stores session_id in sessionStorage so refresh reuses; best-effort deregister on beforeunload
  • Self-row highlighted as "you" in green
  • Console logs and window.__mod3Session for observability; emits mod3-session-registered CustomEvent

a5321ee — feat(channels): /ws/audio/{session_id} WebSocket for per-session playback

  • New audio_subscribers.py module: thread-safe subscriber registry + emit_wav(session_id, payload, header) fanout helper
  • New GET /v1/sessions/{id}/subscribers endpoint returning {subscribed: bool, count: N} — the kernel's pre-afplay check uses this
  • POST /v1/synthesize emits WAV over WebSocket to subscribers of the caller's session_id and adds X-Mod3-WS-Subscribers response header; callers (kernel, MCP shim) honor it to avoid double-play
  • Wire contract: JSON audio_header frame followed by single binary WAV frame per job. No chunking for Kokoro-length outputs.
  • MCP shim gets a pre-play subscriber check; skips local sd.play when a WS subscriber exists
  • Browser JS opens ws://host:7860/ws/audio/<sid>, decodes WAV via AudioContext.decodeAudioData, plays on first user gesture (autoplay policy)

Architectural effect

  • Dashboard is the primary audio sink for its own session when open
  • Server-side afplay is now a fallback, not the default
  • Per-session routing: mod3_speak(session_id=...) lands on the right browser
  • Forward-compatible with Discord/REPL: same subscriber pattern can add a Discord voice subscriber at /v1/sessions/{id}/subscribers/discord later

Known caveats (noted in commit bodies)

  • Kernel CORS missing — browser hits mod3-direct register path today (still works; flagged for Wave 5)
  • AudioContext needs first user gesture to resume (one-shot click/keydown listeners installed)
  • _session_has_ws_subscriber has a 1.5s timeout — if mod3 HTTP wedges, adds 1.5s per speak before fallback

Test plan

  • 41 mod3 tests green (new: TestAudioSubscriberRegistry 6 cases, TestSubscribersEndpoint 2 cases, TestSynthesizeEmitsOverWS 2 cases)
  • ruff clean
  • Manual smoke test (documented in commit bodies): open dashboard, verify participant badge, trigger speak with session_id, confirm audio plays in browser not speakers; close tab, confirm afplay fallback

Introduces SessionRegistry + GlobalSerializer + live output-device
resolution so multiple concurrent agents/users can share one Mod3
instance without colliding on voice, queue, or speaker.

- session_registry.py: SessionChannel, voice-pool greedy allocation,
  per-session queues, round-robin/priority/fifo-global policies, live
  device re-query per playback (ADR-082 2026-04-22 amendment - no
  caching, macOS CoreAudio default tracked live).
- http_api.py: POST /v1/sessions/register, POST /v1/sessions/{id}/deregister,
  GET /v1/sessions, GET /v1/sessions/{id}. Synthesize honors the
  session's assigned voice when unspecified.
- server.py + mcp_shim.py: mirrored MCP tools (register_session,
  deregister_session, list_sessions) so stdio MCP callers get the
  same surface.
- Backward-compat: legacy callers without a session_id route to an
  implicit "default" session.

Out of scope (later ADR phases): input routing, barge-in state
machine, native input provider.
Regression: with MOD3_USE_COGOS_AGENT=1, agent_loop's success path
returns before the local-inference path's send_response_complete,
leaving the dashboard's isResponding spinner hung forever.

- channels.py: BrowserChannel.broadcast_response_complete(metrics,
  session_id) - thread-safe companion to broadcast_response_text,
  routes to the same channel that received the text frames.
- cogos_agent_bridge.py: on agent_response receipt, emit the
  complete frame after the text frame.
- demo/e2e_dashboard_harness.py + tests: updated to assert the
  completion frame fires on both code paths.
Wave 4.1 + 4.2 of the mod3-kernel integration. The dashboard now owns its
own bus identity instead of being an anonymous WebSocket client.

On page load:
  1. Reuse a session_id from sessionStorage (refreshes stay on the same
     identity).
  2. Otherwise POST to the kernel's /v1/channel-sessions/register — ADR-082
     Wave 3.5 says session-id minting is kernel-owned. On CORS / kernel-down
     the JS falls back to mod3's /v1/sessions/register direct so the
     dashboard keeps working in a mod3-only deployment.
  3. Poll GET /v1/sessions every 4s, render the live roster.
  4. On beforeunload, navigator.sendBeacon a best-effort deregister so the
     voice returns to the pool without waiting for a sweep.

The participant panel is a collapsible drawer keyed off a header pill
(count + plural). Rows show participant_id, assigned_voice, session_id
prefix, age, and participant_type badge. The "self" row is pulled to the
top and highlighted with a green left-border + "you" pill.

window.__mod3Session is exposed for Wave 4.3 — the audio WebSocket
subscription will key off its session_id, and a "mod3-session-registered"
CustomEvent fires when registration completes so late-loaded scripts can
subscribe without polling.

Branching: stacked on feat/session-registry-adr-082-phase1 because the
/v1/sessions endpoints this UI depends on only exist on that branch
(Phase 1 of the session registry).
…back

Wave 4.3 mod3 side — route synthesized audio to the dashboard via a
per-session WebSocket instead of (or in addition to) the server's
sounddevice / afplay fallback.

New module: audio_subscribers.py
  AudioSubscriberRegistry holds session_id → [subscriber] with
  register/unregister/has_subscribers/count/emit_wav. emit_wav pushes
  a JSON header frame + binary WAV frame through each subscriber's
  WebSocket via run_coroutine_threadsafe on the socket's event loop,
  matching the BrowserChannel.broadcast_trace_event pattern.

New endpoints (http_api.py):
  WS  /ws/audio/{session_id}              — accept + register + hold open
  GET /v1/sessions/{session_id}/subscribers — returns
      {"session_id": ..., "subscribed": bool, "count": N}. Unknown
      session_ids intentionally return subscribed=false instead of 404
      so the kernel's pre-afplay check stays a single predicate.

/v1/synthesize now also emits the generated WAV over the WebSocket
when the request names a session and at least one subscriber is
attached. Emit is best-effort (disconnect mid-send just drops the
frame) and non-blocking on the HTTP path. A new X-Mod3-WS-Subscribers
response header reports how many subscribers received the blob;
callers use this to skip their local playback.

mcp_shim._play_wav_bytes gains a pre-check (_session_has_ws_subscriber)
that GETs /v1/sessions/{id}/subscribers with a 1.5s timeout. When
subscribed=true we skip sounddevice entirely and record
status=routed_ws in the job ledger. Keeps the legacy path unchanged
when no session is attached or the HTTP check fails.

Dashboard wiring (dashboard/index.html):
  A new IIFE opens ws://host:7860/ws/audio/<session_id> after the
  session-registered event fires, listens for audio_header + binary
  frames, and plays the WAV through AudioContext.decodeAudioData.
  Reconnect on close with exponential backoff up to 30s. The self-row
  audio-dot indicator flips green while the WS is up. AudioContext is
  resumed on first user gesture to satisfy the autoplay policy.

Tests (tests/test_audio_subscribers.py):
  - AudioSubscriberRegistry unit tests: register/unregister, multiple
    subscribers per session, empty-bucket pruning, emit_wav delivers
    header+bytes, no-subscriber returns zero, default registry is a
    shared singleton.
  - HTTP tests via FastAPI TestClient: /subscribers returns false for
    unknown sessions, /ws/audio upgrade flips subscribed=true and
    disconnect flips it back to false.
  - Integration test (guarded by SKIP_TTS_TESTS env var because it
    loads Kokoro): /v1/synthesize with a session + subscriber pushes a
    RIFF/WAVE binary frame through the WebSocket and reports
    X-Mod3-WS-Subscribers: 1.

All 32 existing session-registry tests + 9 new tests pass (1 skipped
for Kokoro cold-start). Ruff clean.

Branch: feat/dashboard-wave4, stacked on
feat/session-registry-adr-082-phase1 (Phase 1 /v1/sessions surface).
The playWavBlob ArrayBuffer-extraction ternary had one branch that used
Uint8Array.slice() — which returns another Uint8Array, not an ArrayBuffer.
decodeAudioData then throws "parameter 1 is not of type 'ArrayBuffer'".

The fix: use buffer.buffer.slice(0) in the "view covers whole buffer"
branch so both branches emit an ArrayBuffer copy.

Found via live smoke test: dashboard connected cleanly, WebSocket
received WAV frames, but every playback failed silently in the console
while the kernel and mod3 both reported routed_ws success. The server
side was correct; this was purely a browser-side type error.
…evice

The output-device <select> at the top of the dashboard was only calling
_playback.setOutputDevice() — the legacy chat-path router. The Wave 4
per-session WebSocket path owns a separate AudioContext in the
setupAudioSubscription IIFE and was ignoring the selection, so every
session-routed playback landed on the system default even when the user
had picked a different device in the UI.

Fix:
- setupAudioSubscription exposes window.__mod3AudioSink(deviceId).
  It remembers the selection in a closure and calls AudioContext.setSinkId
  once the context exists (Chrome 110+; older browsers silently skip).
- ensureAudioCtx now applies the pending sink or the current dropdown
  value at creation time, so the first playback after a fresh page load
  already honors the selection.
- The <select> change handler calls both _playback.setOutputDevice and
  window.__mod3AudioSink, keeping the legacy path and the WS path in sync.
- window.__mod3AudioCtx is set for diagnostic access (evaluate_script,
  debugger probes).

Found by Chaz during the Wave 4 smoke test: he set the dashboard output
to MacBook Pro Speakers, but mod3_speak audio still played through the
system-default Dell USB Audio. BrowserOS console confirmed the context
had no sinkId set and the UI handler only logged.
Two linked fixes on top of the setSinkId wiring:

1. Timing race — populateOutputDevices is async; if the first mod3_speak
   arrives before enumeration completes, ensureAudioCtx reads a half-built
   dropdown and applies "default" even when the user had previously picked
   a different device. populateOutputDevices now pushes the resolved sink
   into window.__mod3AudioSink at the end of its own selection logic, so
   the context always gets bound to the final selection regardless of
   which race happens first.

2. Persistence — the selected device now round-trips through localStorage
   under key "mod3-output-device". populateOutputDevices consults the
   saved value before falling back to the "Default -" heuristic, so a
   reload keeps the user's previous choice instead of silently reverting.

Found by Chaz during the Wave 4 smoke test: his MacBook Pro selection
survived reload (browser form-state restoration), but the sink stayed
on "default" because the race fired first. Both paths now converge on
the right sink.
chazmaniandinkle added a commit that referenced this pull request Apr 24, 2026
…examples-pr6

docs(mod3): scrub participant_id examples (PR #6 ancillary)
@chazmaniandinkle chazmaniandinkle merged commit 08d679f into cogos-dev:main Apr 24, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant