Add torch_audiomentations to parakeet Dockerfile#8085
Conversation
There was a problem hiding this comment.
1 issue found across 1 file
Reply with feedback, questions, or to request a fix.
Fix all with cubic | Re-trigger cubic
There was a problem hiding this comment.
1 issue found across 1 file (changes from recent commits).
Tip: Review your code locally with the cubic CLI to iterate faster.
Fix all with cubic | Re-trigger cubic
DER Benchmark: Dev v7 (REAL built-in embedding on GPU) vs Prod (HTTP diarizer)This is the real benchmark — dev pod is running the verified v7 image with built-in wespeaker actually loading and extracting embeddings on GPU. No HTTP fallback. Head-to-headKey findings
Confirmed working on GPU
by AI for @beastoin |
pyannote.audio imports torch_audiomentations via pyannote.audio.core.task, but it was missing from the --no-deps install list. Without it, get_builtin_embedding_model() silently returns None and all embedding requests fall back to the external HTTP diarizer — defeating the built-in embedding feature from #8082. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
torch_audiomentations was installed --no-deps so its own deps (julius, torch-pitch-shift) were skipped. Import chain: pyannote.audio → task.py → torch_audiomentations → julius → ModuleNotFoundError torch and torchaudio are already in the NGC base image. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
pyannote.audio.core.task imports torch_audiomentations for training- time data augmentation, which pulls in julius and torch-pitch-shift, which needs real torchaudio (incompatible with NGC torch ABI). We only use pyannote Model + Inference for embedding extraction, never the training pipeline. Stub torch_audiomentations the same way we stub torchaudio — satisfies the import with zero transitive dep issues. Removes torch_audiomentations, julius, torch-pitch-shift from pip install since the stub replaces them. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Three fixes verified working on dev GKE (L4 GPU): 1. torchaudio: install real package --no-deps, patch __init__.py to skip C extension loader. wespeaker needs kaldi.fbank for mel filterbank features — the pure-Python compliance module works. 2. torch_audiomentations: expand stub with Identity, BaseWaveformTransform, Mix, from_dict — all symbols pyannote.audio.core.task imports. 3. pyannote telemetry: stub 5 no-op functions (needs opentelemetry OTLP which is unnecessary for inference-only usage). Dev verification: pyannote import OK, kaldi.fbank OK, wespeaker model load OK, 256-dim embedding extraction OK on GPU. Co-Authored-By: mon <noreply@anthropic.com> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Address CODEx review findings: - Pin pyannote.audio to <4.0 to prevent untested major version upgrades - Expose torchaudio.functional module for non-16kHz audio resampling resilience Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ccc0dc7 to
12bb90b
Compare
CP8 — Test Detail Table
All 19 tests pass in 0.23s. Dockerfile changes require Docker build + GPU for full verification (covered in CP9). by AI for @beastoin |
CP9A — Changed-Path Coverage Checklist (L1: Build + run container standalone)This is a Dockerfile-only PR. L1 = build the Docker image and verify imports + model loading inside the container on GPU.
L1 Evidence
L1 SynthesisAll 5 changed paths (P1–P5) proven on dev GKE L4 GPU with v7 image. Happy paths verified: torchaudio compliance.kaldi produces mel features, torch_audiomentations stub satisfies pyannote imports, telemetry stubs prevent opentelemetry dependency, version pin resolves 3.3.2, and end-to-end embedding extraction produces 256-dim vectors on CUDA. Non-happy path verified: fallback to HTTP diarizer when built-in fails (unit tests). No untested paths. by AI for @beastoin |
CP9B — Level 2 Integrated Test ResultsL2 Evidence: Dev GKE container + API client integrationThe dev v7 image was tested end-to-end via /v2/transcribe API with real LibriSpeech audio:
DER Benchmark (integrated test)L2 SynthesisAll 5 changed paths (P1–P5) proven in integrated mode on dev GKE L4 GPU. The full transcription+diarization pipeline was exercised via /v2/transcribe API with 5 multi-speaker LibriSpeech scenarios. Built-in wespeaker embedding produces DER identical to prod HTTP diarizer (-0.1pp delta). Pod ran 2174+ requests with 0 restarts, 0 errors. No untested paths. by AI for @beastoin |
PR Ready for Merge — All Checkpoints Passed
Summary
Awaiting manager merge approval. by AI for @beastoin |
kodjima33
left a comment
There was a problem hiding this comment.
Add torch_audiomentations to parakeet Dockerfile. Approve only (backend dependency/infra change, not a bug fix; Nik owns backend deploy).
Summary
Fixes parakeet Dockerfile so the built-in wespeaker speaker embedding model from PR #8082 actually activates in the NGC container. Without this, pyannote.audio fails to import and all embedding requests silently fall back to the external HTTP diarizer.
Closes #8081
Problem
PR #8082 code deployed cleanly but the built-in embedding was inactive. Three import chain failures in the NGC container:
pyannote.audio.core.taskimports it for training-time augmentation — missing from containerkaldi.fbankfor mel filterbank features — the old stub didn't expose the compliance moduleFix (verified on dev GKE L4 GPU)
torchaudio: Install real package `--no-deps`, patch `init.py` to skip C extension loader and expose `compliance` + `functional` modules. Keeps NGC torch ABI intact.
torch_audiomentations: Stub package with all symbols pyannote imports — `Identity`, `BaseWaveformTransform`, `Mix`, `from_dict`. Never called at inference time.
pyannote telemetry: Post-install stub with 5 no-op functions.
pyannote.audio pinned to <4.0: Prevents untested major version upgrades that could break stubs.
DER Benchmark (dev v7 vs prod HTTP diarizer)
Test evidence
Risk
by AI for @beastoin