feat(audio-lab): live ASR backend + transcript UI#50
Merged
Conversation
Adds a sherpa-onnx ASR backend that fans out alongside the existing VAD and turn-detection pipelines. Each AsrConfig runs in its own worker thread (sherpa-onnx is sync + holds model state); a tokio task bridges the audio broadcast in, and a blocking_send loop bridges transcript events back to the websocket. WS surface: ListAsrBackends / SetAsrConfigs client messages, AsrBackends + Asr server messages. Asr events carry a `kind` field (ready, speech_started, speech_ended, partial, final, warning) with optional ts_ms/end_ms/text/confidence/message. M1 scope: backend only — no frontend yet. cargo check + clippy clean, existing tests still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Frontend half of the ASR integration: - New AsrConfigPanel mirrors TurnConfigPanel — backend + preset + label. - New AsrTranscript card renders finals (with [mm:ss.s–mm:ss.s] prefix) plus a dimmed trailing partial that overwrites until the final lands. Footer shows last confidence, count of finals, average segment duration. "loading model…" until the backend's `ready` event arrives. Copy-all button concatenates final text to the clipboard. - App.tsx wires list_asr_backends on connect, persists asr configs to localStorage, pushes set_asr_configs on change + before start / load_file, resets transcripts on new session. - websocket.ts: new AsrConfig / AsrEventKind types, asr_backends + asr server messages, list_asr_backends + set_asr_configs client messages. Log panel batches `partial` events (matching how `vad` is batched) and inlines finals / warnings. cargo isn't touched — backend already merged on feat/asr-backend. npm run lint clean (no new warnings); npm run build clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- New "ASR" subsection under Supported Backends with the sherpa-onnx preset table and a NOTE about the first-run model download (~75 MB to \$HF_HOME). - Mention "live transcripts" in the audio-lab What It Does list. - Top-level README: include ASR in the audio-lab description + tool layout blurb. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wraps the post-controls body in a flex container so all config panels (VAD, Turn, Pipeline, ASR) stack in a left aside (w-80 on lg+) and the waveform / spectrum / timelines / ASR transcript / preprocessed sections fill a flex-1 main column. Matches the layout sketch in docs/05-plan-asr.md. Collapses to a single column on screens narrower than lg. npm run lint and npm run build clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sidebar config panels used a 3-column grid that clipped titles and wrapped labels in the narrow aside. Stack cards vertically, let the title input fill remaining width, and widen the sidebar to 384px. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sidebar sections (VAD, Turn, Pipeline, ASR) now default collapsed, and individual cards inside each section can be toggled too. Widen SelectContent dropdowns so long option labels aren't clipped. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Each ASR config card shows whether its preset is cached in HF_HOME and offers a Preload button that downloads + loads the model so the first record/upload doesn't stall on a silent multi-hundred-MB fetch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sidebar config sections and right-side result panels now have independent open/close state, so collapsing a config card no longer hides its data. Adds a missing toggle for the ASR transcripts panel. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements the ASR plan end-to-end. Supersedes the original stacked PRs (#46, #47, #48 — all closed).
Backend
tools/audio-lab/backend/src/asr.rsmodule:AsrConfig,AsrServerEvent,run_asr_pipeline. Each config gets a dedicated OS worker thread that owns aSherpaOnnxAsr; a tokio task bridges the audio broadcast in, andblocking_sendbridges transcript events back onto a tokio mpsc.ListAsrBackends/SetAsrConfigs(client) andAsrBackends/Asr(server).Asrcarries akinddiscriminator:ready/speech_started/speech_ended/partial/final/warning, with optionalts_ms/end_ms/text/confidence/messagefields populated per kind.StartRecording(live mic) andLoadFile(WAV upload) paths.wavekat-asr = "0.0.4"with thesherpa-onnxfeature.Frontend
AsrConfigPanelmirrorsTurnConfigPanel(backend + preset dropdowns, editable label, add / clone / remove).AsrTranscriptcard per active ASR config: committed finals with[mm:ss.s–mm:ss.s]prefix, dimmed trailing partial that gets overwritten until the final lands, footer with last confidence / count / avg segment duration. Showsloading model…until the backend'sreadyevent arrives.websocket.tstypes + log-panel batching ofasr.partialmessages so the log doesn't drown in partials. Finals and warnings still log inline.App.tsx:asrConfigspersisted tolocalStorage(lab-asr-configs), pushed to backend on change + before every start / load_file, transcripts reset on each new session.w-80on lg+); waveform / spectrum / timelines / ASR transcript / preprocessed sections fill a flex-1 main column. Matches the layout sketch indocs/05-plan-asr.md. Single-column on narrower screens.Docs
tools/audio-lab/README.md: new "ASR" subsection with the sherpa-onnx preset table and a NOTE about the first-run ~75 MB HF model download. "Live transcripts" added to What It Does.README.md: ASR mentioned in the audio-lab one-liner + tool-layout blurb.Out of scope (follow-up)
VadTimeline/PipelineTimelineat eachfinal.Channel::Remote).Test plan
cargo check --workspace(backend)cargo clippy --workspace -- -D warnings(backend, when M1 landed)cargo test --workspace(5 pre-existing tests still pass)npm run lint(no new warnings beyond pre-existing 7 inFrequencySpectrum/Waveform)npm run build(clean)make dev, add an ASR config (sherpa-onnx · bilingual), record / load a WAV, confirm partials roll in and finals commit; toggle preset between bilingual / en / zh and verify model reload.🤖 Generated with Claude Code