Skip to content

feat: add training data pipeline and audio viewer#18

Merged
wavekat-eason merged 36 commits into
mainfrom
feat/training-data-pipeline
May 10, 2026
Merged

feat: add training data pipeline and audio viewer#18
wavekat-eason merged 36 commits into
mainfrom
feat/training-data-pipeline

Conversation

@wavekat-eason
Copy link
Copy Markdown
Contributor

Summary

  • Add training/smart-turn-zh/ with data pipeline plan, dataset research, VAD comparison, and notebooks for ASR (Paraformer-zh) and VAD (Silero)
  • Add React-based audio viewer with waveform, spectrogram, VAD, and ASR tracks; supports zoom/pan, segment playback, loop ranges, sentence/VAD-block navigation, shortcuts dialog, and Google Analytics
  • Reorganize top-level docs with numbered prefixes

Test plan

  • make install and run the viewer locally
  • Open a wav + ASR + VAD bundle and verify playback, navigation ({ / }), loop range, and zoom presets
  • Re-run notebooks 01-03 end-to-end on sample audio

🤖 Generated with Claude Code

wavekat-eason and others added 30 commits April 18, 2026 11:15
Add plan for LCCC-based turn detection dataset generation pipeline
covering conversation-level splits, LLM text generation, and TTS
synthesis. Also number existing doc files for consistent ordering.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move pipeline plan into training/smart-turn-zh/, add README and
dataset survey for Mandarin conversational corpora.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add Jupyter notebook for running FunASR Paraformer-zh on WAV files
and saving word-level ASR results to JSONL. Includes Makefile for
venv setup and JupyterLab, requirements.txt, and gitignore updates.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Save per-frame speech probabilities as .npy instead of pre-filtered segments
- Add 3-panel visualization (overview, zoom, histogram)
- Add tqdm progress bars, torchcodec/matplotlib/numpy deps
- Add make install target for re-installing into existing venv

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Output one JSON file per WAV into data/asr_results/ instead of
appending all results to a single JSONL, matching the pattern
used for VAD probs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Browser-based tool for inspecting WAV audio alongside VAD
probabilities and ASR transcriptions. Includes waveform with
LOD decimation, VAD hysteresis display, ASR transcript panel
with search, minimap, playback, and keyboard shortcuts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Per-character ASR timestamps with hover tooltips and click-to-seek
- Karaoke-style character highlighting during playback
- Sentence/character segment boundary lines on waveform
- Click sentence in transcript to zoom-to-fit
- Linear/dB waveform scale toggle
- Single multi-file open button (WAV+NPY+JSON at once)
- Scroll to pan, Cmd/Ctrl+scroll to zoom
- Ignore data/example/ directory

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add gain slider (default 10x, max 20x) with real-time GainNode
- Highlight gaps between char timestamps in red on waveform and label bar
- Alternating char backgrounds in label bar for visual separation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
FFT Web Worker with 10s tile caching, mel/linear freq scale
toggle (default mel), turbo colormap, and frequency axis labels.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move ASR panel to right side, add drag-to-resize handles between
tracks and on ASR panel edge, persist layout in localStorage.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Redesign CSS with design tokens, better dark theme, focus states,
hover transitions, and layered surfaces. Add +/- zoom buttons to
toolbar. Fix ASR nav button gap. Persist layout in localStorage.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Highlight the sentence matching the cursor position (or nearest)
when clicking tracks. Reduce scroll pan speed to 2% per tick.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Hovering a sentence timestamp reveals a play button that plays
audio for that sentence's time range and auto-stops at the end.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add pixel-perfect waveform rendering for views up to ~25s by
computing per-pixel min/max from raw samples. Show current view
time range in zoom controls toolbar.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
wavekat-eason and others added 6 commits April 20, 2026 20:03
Shift+drag on waveform to select a loop range. Space plays the
range and resets cursor to start when done. Click or Escape
clears the selection.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Display loop range duration in orange next to the view span
label when a loop selection is active.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@wavekat-eason wavekat-eason merged commit a8a6b0a into main May 10, 2026
5 checks passed
@wavekat-eason wavekat-eason deleted the feat/training-data-pipeline branch May 10, 2026 22:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant