Conversation
uqio
added a commit
that referenced
this pull request
May 4, 2026
# This is the 1st commit message: update # This is the commit message #2: update
…ccuracy v0.1.0 ships: - diarization::segment — speaker segmentation (pyannote/segmentation-3.0). Bundled by default (~6 MB, MIT) via SegmentModel::bundled(). - diarization::embed — speaker fingerprint (WeSpeaker ResNet34 ONNX + kaldi fbank). Caller-fetched (27 MB, exceeds crates.io 10 MB cap). - diarization::plda — pyannote/speaker-diarization-community-1 PLDA whitening. Bundled by default (CC-BY-4.0) via PldaTransform::new(). - diarization::cluster + pipeline — pyannote cluster_vbx primitives (PLDA → AHC → VBx → centroid → cosine → Hungarian → reconstruct). - diarization::offline::OwnedDiarizationPipeline — owned-audio batch entrypoint. - diarization::streaming::StreamingOfflineDiarizer — voice-range-driven streaming entrypoint with the same per-fixture DER as offline.
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Initial release of
diarization— a Rust port of pyannote.audio's speaker-diarization pipeline, restructured around a Sans-I/O design (push PCM → get spans). Targets pyannote-equivalent accuracy on the capturedcommunity-1fixtures.DER on the six captured fixtures via the streaming-offline path:
Pipeline
Two public entrypoints, both running the full pyannote
cluster_vbxflow:offline::OwnedDiarizationPipeline— owned-audio batch path. Caller passes the entire 16 kHz mono PCM at once.streaming::StreamingOfflineDiarizer— voice-range-driven streaming path. Caller drives a VAD externally and pushes one voice range at a time; heavy stages run eagerly per range, global clustering deferred tofinalize. Same DER as the offline path.Bundled model artifacts
models/segmentation-3.0.onnx(~6 MB, MIT) — embedded by default viaSegmentModel::bundled(). Off-switch:default-features = falsefor callers shipping a fine-tuned variant.models/plda/*.bin(~530 KB total, CC-BY-4.0) — PLDA whitening weights frompyannote/speaker-diarization-community-1, loaded byPldaTransform::new().scripts/download-embed-model.sh..cratetarball: ~6.4 MiB. License SPDX:(MIT OR Apache-2.0) AND MIT AND CC-BY-4.0.Public API shape
SegmentModel/EmbedModel—from_file/from_memory/bundledconstructors with options-builder.PldaTransform::new()— loads embedded PLDA weights.OwnedDiarizationPipeline::run(&mut seg, &mut emb, &plda, &samples)— owned audio.StreamingOfflineDiarizer::push_voice_range(...)+.finalize(&plda)— VAD-driven streaming.diarize_offline,assign_embeddings,reconstruct,count_pyannote) take builder-style input structs (OfflineInput::new(...).with_threshold(...).with_fa(...)).All public structs use accessor patterns (no public fields). Hyperparameters default to
community-1values; override viawith_*builders.SIMD policy
ops::differential_tests).ops::scalardirectly on every architecture — they feed discrete decisions where ulp drift could flip a partition.ops::dot/ops::axpy— continuous/iterative paths where ulp drift smooths instead of flipping decisions.SP_ALIVE_THRESHOLDrejects pathological VBx outputs that could land within SIMD ulp drift of the alive-cluster cut.nalgebra/matrixmultiplyGEMMs in VBx are uncontrolled; cross-arch determinism end-to-end is not claimed for T>200 inputs but is empirically validated under SDE-emulated AVX2 + AVX-512 in CI.Testing
#[ignore]d due to GEMM-roundoff drift; covered by the tolerant Hungarian-permuted per-frame match inreconstruct::parity_tests).tests/parity/run.shmeasures DER against pyannote captures; results table above.Test plan
cargo test --lib(355 tests; 10#[ignore]d for documented reasons)cargo clippy --lib --tests --features 'ort bundled-segmentation'cleancargo build --examples --features ortcleanRUSTFLAGS=-Dwarnings cargo check --no-default-features --lib(used by SDE CI lanes)bash tests/parity/run.sh tests/parity/fixtures/<fixture>/clip_16k.wavfor each of the six fixtures; DER table above.🤖 Generated with Claude Code