Skip to content

feat(audio-lab): live ASR backend + transcript UI#50

Merged
wavekat-eason merged 9 commits into
mainfrom
feat/asr
May 15, 2026
Merged

feat(audio-lab): live ASR backend + transcript UI#50
wavekat-eason merged 9 commits into
mainfrom
feat/asr

Conversation

@wavekat-eason
Copy link
Copy Markdown
Contributor

Summary

Implements the ASR plan end-to-end. Supersedes the original stacked PRs (#46, #47, #48 — all closed).

Backend

  • New tools/audio-lab/backend/src/asr.rs module: AsrConfig, AsrServerEvent, run_asr_pipeline. Each config gets a dedicated OS worker thread that owns a SherpaOnnxAsr; a tokio task bridges the audio broadcast in, and blocking_send bridges transcript events back onto a tokio mpsc.
  • New WS messages: ListAsrBackends / SetAsrConfigs (client) and AsrBackends / Asr (server). Asr carries a kind discriminator: ready / speech_started / speech_ended / partial / final / warning, with optional ts_ms / end_ms / text / confidence / message fields populated per kind.
  • Wired into both StartRecording (live mic) and LoadFile (WAV upload) paths.
  • Adds wavekat-asr = "0.0.4" with the sherpa-onnx feature.

Frontend

  • New AsrConfigPanel mirrors TurnConfigPanel (backend + preset dropdowns, editable label, add / clone / remove).
  • New AsrTranscript card per active ASR config: committed finals with [mm:ss.s–mm:ss.s] prefix, dimmed trailing partial that gets overwritten until the final lands, footer with last confidence / count / avg segment duration. Shows loading model… until the backend's ready event arrives.
  • websocket.ts types + log-panel batching of asr.partial messages so the log doesn't drown in partials. Finals and warnings still log inline.
  • App.tsx: asrConfigs persisted to localStorage (lab-asr-configs), pushed to backend on change + before every start / load_file, transcripts reset on each new session.
  • 2-column layout: all config panels (VAD / Turn / Pipeline / ASR) moved into a left aside (w-80 on lg+); waveform / spectrum / timelines / ASR transcript / preprocessed sections fill a flex-1 main column. Matches the layout sketch in docs/05-plan-asr.md. Single-column on narrower screens.

Docs

  • tools/audio-lab/README.md: new "ASR" subsection with the sherpa-onnx preset table and a NOTE about the first-run ~75 MB HF model download. "Live transcripts" added to What It Does.
  • Top-level README.md: ASR mentioned in the audio-lab one-liner + tool-layout blurb.

Out of scope (follow-up)

  • Loom / screenshot in the README video table — needs a recording session.
  • Transcript ticks on VadTimeline / PipelineTimeline at each final.
  • Two-channel ASR (Channel::Remote).
  • WER / latency benchmarking — wait for a second ASR backend.
  • Audio-lab release tag — release-please will cut it automatically on merge.

Test plan

  • cargo check --workspace (backend)
  • cargo clippy --workspace -- -D warnings (backend, when M1 landed)
  • cargo test --workspace (5 pre-existing tests still pass)
  • npm run lint (no new warnings beyond pre-existing 7 in FrequencySpectrum / Waveform)
  • npm run build (clean)
  • Manual smoke test: make dev, add an ASR config (sherpa-onnx · bilingual), record / load a WAV, confirm partials roll in and finals commit; toggle preset between bilingual / en / zh and verify model reload.

🤖 Generated with Claude Code

wavekat-eason and others added 9 commits May 15, 2026 11:37
Adds a sherpa-onnx ASR backend that fans out alongside the existing VAD
and turn-detection pipelines. Each AsrConfig runs in its own worker
thread (sherpa-onnx is sync + holds model state); a tokio task bridges
the audio broadcast in, and a blocking_send loop bridges transcript
events back to the websocket.

WS surface: ListAsrBackends / SetAsrConfigs client messages,
AsrBackends + Asr server messages. Asr events carry a `kind` field
(ready, speech_started, speech_ended, partial, final, warning) with
optional ts_ms/end_ms/text/confidence/message.

M1 scope: backend only — no frontend yet. cargo check + clippy clean,
existing tests still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Frontend half of the ASR integration:
- New AsrConfigPanel mirrors TurnConfigPanel — backend + preset + label.
- New AsrTranscript card renders finals (with [mm:ss.s–mm:ss.s] prefix)
  plus a dimmed trailing partial that overwrites until the final lands.
  Footer shows last confidence, count of finals, average segment
  duration. "loading model…" until the backend's `ready` event arrives.
  Copy-all button concatenates final text to the clipboard.
- App.tsx wires list_asr_backends on connect, persists asr configs to
  localStorage, pushes set_asr_configs on change + before start /
  load_file, resets transcripts on new session.
- websocket.ts: new AsrConfig / AsrEventKind types, asr_backends + asr
  server messages, list_asr_backends + set_asr_configs client messages.
  Log panel batches `partial` events (matching how `vad` is batched)
  and inlines finals / warnings.

cargo isn't touched — backend already merged on feat/asr-backend.
npm run lint clean (no new warnings); npm run build clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- New "ASR" subsection under Supported Backends with the sherpa-onnx
  preset table and a NOTE about the first-run model download (~75 MB
  to \$HF_HOME).
- Mention "live transcripts" in the audio-lab What It Does list.
- Top-level README: include ASR in the audio-lab description + tool
  layout blurb.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wraps the post-controls body in a flex container so all config panels
(VAD, Turn, Pipeline, ASR) stack in a left aside (w-80 on lg+) and the
waveform / spectrum / timelines / ASR transcript / preprocessed
sections fill a flex-1 main column. Matches the layout sketch in
docs/05-plan-asr.md. Collapses to a single column on screens narrower
than lg.

npm run lint and npm run build clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sidebar config panels used a 3-column grid that clipped titles and
wrapped labels in the narrow aside. Stack cards vertically, let the
title input fill remaining width, and widen the sidebar to 384px.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sidebar sections (VAD, Turn, Pipeline, ASR) now default collapsed,
and individual cards inside each section can be toggled too. Widen
SelectContent dropdowns so long option labels aren't clipped.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Each ASR config card shows whether its preset is cached in HF_HOME
and offers a Preload button that downloads + loads the model so the
first record/upload doesn't stall on a silent multi-hundred-MB fetch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sidebar config sections and right-side result panels now have
independent open/close state, so collapsing a config card no longer
hides its data. Adds a missing toggle for the ASR transcripts panel.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@wavekat-eason wavekat-eason merged commit e2cb784 into main May 15, 2026
9 checks passed
@wavekat-eason wavekat-eason deleted the feat/asr branch May 15, 2026 00:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant