feat(audio-lab): live ASR backend + transcript UI by wavekat-eason · Pull Request #50 · wavekat/wavekat-lab

wavekat-eason · 2026-05-14T23:40:38Z

Summary

Implements the ASR plan end-to-end. Supersedes the original stacked PRs (#46, #47, #48 — all closed).

Backend

New tools/audio-lab/backend/src/asr.rs module: AsrConfig, AsrServerEvent, run_asr_pipeline. Each config gets a dedicated OS worker thread that owns a SherpaOnnxAsr; a tokio task bridges the audio broadcast in, and blocking_send bridges transcript events back onto a tokio mpsc.
New WS messages: ListAsrBackends / SetAsrConfigs (client) and AsrBackends / Asr (server). Asr carries a kind discriminator: ready / speech_started / speech_ended / partial / final / warning, with optional ts_ms / end_ms / text / confidence / message fields populated per kind.
Wired into both StartRecording (live mic) and LoadFile (WAV upload) paths.
Adds wavekat-asr = "0.0.4" with the sherpa-onnx feature.

Frontend

New AsrConfigPanel mirrors TurnConfigPanel (backend + preset dropdowns, editable label, add / clone / remove).
New AsrTranscript card per active ASR config: committed finals with [mm:ss.s–mm:ss.s] prefix, dimmed trailing partial that gets overwritten until the final lands, footer with last confidence / count / avg segment duration. Shows loading model… until the backend's ready event arrives.
websocket.ts types + log-panel batching of asr.partial messages so the log doesn't drown in partials. Finals and warnings still log inline.
App.tsx: asrConfigs persisted to localStorage (lab-asr-configs), pushed to backend on change + before every start / load_file, transcripts reset on each new session.
2-column layout: all config panels (VAD / Turn / Pipeline / ASR) moved into a left aside (w-80 on lg+); waveform / spectrum / timelines / ASR transcript / preprocessed sections fill a flex-1 main column. Matches the layout sketch in docs/05-plan-asr.md. Single-column on narrower screens.

Docs

tools/audio-lab/README.md: new "ASR" subsection with the sherpa-onnx preset table and a NOTE about the first-run ~75 MB HF model download. "Live transcripts" added to What It Does.
Top-level README.md: ASR mentioned in the audio-lab one-liner + tool-layout blurb.

Out of scope (follow-up)

Loom / screenshot in the README video table — needs a recording session.
Transcript ticks on VadTimeline / PipelineTimeline at each final.
Two-channel ASR (Channel::Remote).
WER / latency benchmarking — wait for a second ASR backend.
Audio-lab release tag — release-please will cut it automatically on merge.

Test plan

cargo check --workspace (backend)
cargo clippy --workspace -- -D warnings (backend, when M1 landed)
cargo test --workspace (5 pre-existing tests still pass)
npm run lint (no new warnings beyond pre-existing 7 in FrequencySpectrum / Waveform)
npm run build (clean)
Manual smoke test: make dev, add an ASR config (sherpa-onnx · bilingual), record / load a WAV, confirm partials roll in and finals commit; toggle preset between bilingual / en / zh and verify model reload.

🤖 Generated with Claude Code

Adds a sherpa-onnx ASR backend that fans out alongside the existing VAD and turn-detection pipelines. Each AsrConfig runs in its own worker thread (sherpa-onnx is sync + holds model state); a tokio task bridges the audio broadcast in, and a blocking_send loop bridges transcript events back to the websocket. WS surface: ListAsrBackends / SetAsrConfigs client messages, AsrBackends + Asr server messages. Asr events carry a `kind` field (ready, speech_started, speech_ended, partial, final, warning) with optional ts_ms/end_ms/text/confidence/message. M1 scope: backend only — no frontend yet. cargo check + clippy clean, existing tests still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Frontend half of the ASR integration: - New AsrConfigPanel mirrors TurnConfigPanel — backend + preset + label. - New AsrTranscript card renders finals (with [mm:ss.s–mm:ss.s] prefix) plus a dimmed trailing partial that overwrites until the final lands. Footer shows last confidence, count of finals, average segment duration. "loading model…" until the backend's `ready` event arrives. Copy-all button concatenates final text to the clipboard. - App.tsx wires list_asr_backends on connect, persists asr configs to localStorage, pushes set_asr_configs on change + before start / load_file, resets transcripts on new session. - websocket.ts: new AsrConfig / AsrEventKind types, asr_backends + asr server messages, list_asr_backends + set_asr_configs client messages. Log panel batches `partial` events (matching how `vad` is batched) and inlines finals / warnings. cargo isn't touched — backend already merged on feat/asr-backend. npm run lint clean (no new warnings); npm run build clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- New "ASR" subsection under Supported Backends with the sherpa-onnx preset table and a NOTE about the first-run model download (~75 MB to \$HF_HOME). - Mention "live transcripts" in the audio-lab What It Does list. - Top-level README: include ASR in the audio-lab description + tool layout blurb. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Wraps the post-controls body in a flex container so all config panels (VAD, Turn, Pipeline, ASR) stack in a left aside (w-80 on lg+) and the waveform / spectrum / timelines / ASR transcript / preprocessed sections fill a flex-1 main column. Matches the layout sketch in docs/05-plan-asr.md. Collapses to a single column on screens narrower than lg. npm run lint and npm run build clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Sidebar config panels used a 3-column grid that clipped titles and wrapped labels in the narrow aside. Stack cards vertically, let the title input fill remaining width, and widen the sidebar to 384px. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Sidebar sections (VAD, Turn, Pipeline, ASR) now default collapsed, and individual cards inside each section can be toggled too. Widen SelectContent dropdowns so long option labels aren't clipped. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Each ASR config card shows whether its preset is cached in HF_HOME and offers a Preload button that downloads + loads the model so the first record/upload doesn't stall on a silent multi-hundred-MB fetch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Sidebar config sections and right-side result panels now have independent open/close state, so collapsing a config card no longer hides its data. Adds a missing toggle for the ASR transcripts panel. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

wavekat-eason and others added 9 commits May 15, 2026 11:37

chore(audio-lab): make dev runs backend + frontend concurrently

0e18c69

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

wavekat-eason merged commit e2cb784 into main May 15, 2026
9 checks passed

wavekat-eason deleted the feat/asr branch May 15, 2026 00:20

github-actions Bot mentioned this pull request May 15, 2026

chore(main): release 0.0.15 #51

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(audio-lab): live ASR backend + transcript UI#50

feat(audio-lab): live ASR backend + transcript UI#50
wavekat-eason merged 9 commits into
mainfrom
feat/asr

wavekat-eason commented May 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

wavekat-eason commented May 14, 2026

Summary

Backend

Frontend

Docs

Out of scope (follow-up)

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant