feat: add training data pipeline and audio viewer#18
Merged
Conversation
Add plan for LCCC-based turn detection dataset generation pipeline covering conversation-level splits, LLM text generation, and TTS synthesis. Also number existing doc files for consistent ordering. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move pipeline plan into training/smart-turn-zh/, add README and dataset survey for Mandarin conversational corpora. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add Jupyter notebook for running FunASR Paraformer-zh on WAV files and saving word-level ASR results to JSONL. Includes Makefile for venv setup and JupyterLab, requirements.txt, and gitignore updates. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Save per-frame speech probabilities as .npy instead of pre-filtered segments - Add 3-panel visualization (overview, zoom, histogram) - Add tqdm progress bars, torchcodec/matplotlib/numpy deps - Add make install target for re-installing into existing venv Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Output one JSON file per WAV into data/asr_results/ instead of appending all results to a single JSONL, matching the pattern used for VAD probs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Browser-based tool for inspecting WAV audio alongside VAD probabilities and ASR transcriptions. Includes waveform with LOD decimation, VAD hysteresis display, ASR transcript panel with search, minimap, playback, and keyboard shortcuts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Per-character ASR timestamps with hover tooltips and click-to-seek - Karaoke-style character highlighting during playback - Sentence/character segment boundary lines on waveform - Click sentence in transcript to zoom-to-fit - Linear/dB waveform scale toggle - Single multi-file open button (WAV+NPY+JSON at once) - Scroll to pan, Cmd/Ctrl+scroll to zoom - Ignore data/example/ directory Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add gain slider (default 10x, max 20x) with real-time GainNode - Highlight gaps between char timestamps in red on waveform and label bar - Alternating char backgrounds in label bar for visual separation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
FFT Web Worker with 10s tile caching, mel/linear freq scale toggle (default mel), turbo colormap, and frequency axis labels. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move ASR panel to right side, add drag-to-resize handles between tracks and on ASR panel edge, persist layout in localStorage. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Redesign CSS with design tokens, better dark theme, focus states, hover transitions, and layered surfaces. Add +/- zoom buttons to toolbar. Fix ASR nav button gap. Persist layout in localStorage. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Highlight the sentence matching the cursor position (or nearest) when clicking tracks. Reduce scroll pan speed to 2% per tick. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Hovering a sentence timestamp reveals a play button that plays audio for that sentence's time range and auto-stops at the end. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add pixel-perfect waveform rendering for views up to ~25s by computing per-pixel min/max from raw samples. Show current view time range in zoom controls toolbar. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Shift+drag on waveform to select a loop range. Space plays the range and resets cursor to start when done. Click or Escape clears the selection. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Display loop range duration in orange next to the view span label when a loop selection is active. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
training/smart-turn-zh/with data pipeline plan, dataset research, VAD comparison, and notebooks for ASR (Paraformer-zh) and VAD (Silero)Test plan
make installand run the viewer locally{/}), loop range, and zoom presets🤖 Generated with Claude Code