Context
PR #8082 added built-in wespeaker speaker embedding for batch diarization. PR #8085 patches the Dockerfile with 3 stubs to work around NGC ABI incompatibility with torchaudio/torch_audiomentations/opentelemetry.
Research by @geni (PR #8085 comment) confirmed:
- No NVIDIA-published torchaudio wheel exists for NGC (known issue since 2023)
- No pyannote inference-only install mode (torch_audiomentations is a hard dep)
- pyannote 4.0 adds torchcodec as hard dep (another C extension ABI crash) —
<4.0 pin is critical
- All NGC users use the same stub workaround or avoid torchaudio entirely
Proposal
Vendor the wespeaker inference path (~60 lines) to eliminate pyannote.audio + ~25 transitive deps + all 3 Dockerfile stubs:
torchaudio.compliance.kaldi.fbank — pure Python, already working in NGC via the patched init
- ResNet34 forward pass — load weights from HuggingFace, run inference directly
- L2 normalize output → 256-dim embedding
This removes:
pyannote.audio, pyannote.core, pyannote.database, pyannote.pipeline
speechbrain, asteroid-filterbanks, einops, torch_audiomentations (stub)
tensorboardX, hf_transfer, semver
- All 3 Dockerfile stubs (torchaudio patch, torch_audiomentations stub, telemetry stub)
- The
<4.0 version pin (no longer needed)
Files
backend/parakeet/transcribe.py:33-65 — current pyannote import + get_builtin_embedding_model()
backend/parakeet/Dockerfile — current stubs
Acceptance criteria
Context
PR #8082 added built-in wespeaker speaker embedding for batch diarization. PR #8085 patches the Dockerfile with 3 stubs to work around NGC ABI incompatibility with torchaudio/torch_audiomentations/opentelemetry.
Research by @geni (PR #8085 comment) confirmed:
<4.0pin is criticalProposal
Vendor the wespeaker inference path (~60 lines) to eliminate pyannote.audio + ~25 transitive deps + all 3 Dockerfile stubs:
torchaudio.compliance.kaldi.fbank— pure Python, already working in NGC via the patched initThis removes:
pyannote.audio,pyannote.core,pyannote.database,pyannote.pipelinespeechbrain,asteroid-filterbanks,einops,torch_audiomentations(stub)tensorboardX,hf_transfer,semver<4.0version pin (no longer needed)Files
backend/parakeet/transcribe.py:33-65— current pyannote import +get_builtin_embedding_model()backend/parakeet/Dockerfile— current stubsAcceptance criteria