v0.1.0: Sans-I/O speaker diarization with pyannote-equivalent accuracy by uqio · Pull Request #2 · Findit-AI/diarization

uqio · 2026-05-02T00:28:24Z

Summary

Initial release of diarization — a Rust port of pyannote.audio's speaker-diarization pipeline, restructured around a Sans-I/O design (push PCM → get spans). Targets pyannote-equivalent accuracy on the captured community-1 fixtures.

DER on the six captured fixtures via the streaming-offline path:

Fixture	DER
01_dialogue	0.37 %
02_pyannote_sample	0 %
03_dual_speaker	0 %
04_three_speaker	0 %
05_four_speaker	0 %
06_long_recording (16 min)	0.19 %

Pipeline

audio → segmentation → embedding → PLDA → AHC → VBx → centroid → cosine → Hungarian → reconstruct → RTTM

Two public entrypoints, both running the full pyannote cluster_vbx flow:

offline::OwnedDiarizationPipeline — owned-audio batch path. Caller passes the entire 16 kHz mono PCM at once.
streaming::StreamingOfflineDiarizer — voice-range-driven streaming path. Caller drives a VAD externally and pushes one voice range at a time; heavy stages run eagerly per range, global clustering deferred to finalize. Same DER as the offline path.

Bundled model artifacts

models/segmentation-3.0.onnx (~6 MB, MIT) — embedded by default via SegmentModel::bundled(). Off-switch: default-features = false for callers shipping a fine-tuned variant.
models/plda/*.bin (~530 KB total, CC-BY-4.0) — PLDA whitening weights from pyannote/speaker-diarization-community-1, loaded by PldaTransform::new().
WeSpeaker ResNet34-LM embedding ONNX (~27 MB) — not bundled, exceeds crates.io's 10 MB cap. Fetch via scripts/download-embed-model.sh.

.crate tarball: ~6.4 MiB. License SPDX: (MIT OR Apache-2.0) AND MIT AND CC-BY-4.0.

Public API shape

SegmentModel / EmbedModel — from_file / from_memory / bundled constructors with options-builder.
PldaTransform::new() — loads embedded PLDA weights.
OwnedDiarizationPipeline::run(&mut seg, &mut emb, &plda, &samples) — owned audio.
StreamingOfflineDiarizer::push_voice_range(...) + .finalize(&plda) — VAD-driven streaming.
Algorithm-level entrypoints (diarize_offline, assign_embeddings, reconstruct, count_pyannote) take builder-style input structs (OfflineInput::new(...).with_threshold(...).with_fa(...)).

All public structs use accessor patterns (no public fields). Hyperparameters default to community-1 values; override via with_* builders.

SIMD policy

NEON ≡ scalar bit-exact on aarch64 at the primitive level (verified by ops::differential_tests).
AHC pdist and Hungarian-feeding cosine use ops::scalar directly on every architecture — they feed discrete decisions where ulp drift could flip a partition.
VBx EM, centroid sums, embed aggregation use SIMD via ops::dot / ops::axpy — continuous/iterative paths where ulp drift smooths instead of flipping decisions.
A guard band on SP_ALIVE_THRESHOLD rejects pathological VBx outputs that could land within SIMD ulp drift of the alive-cluster cut.
nalgebra/matrixmultiply GEMMs in VBx are uncontrolled; cross-arch determinism end-to-end is not claimed for T>200 inputs but is empirically validated under SDE-emulated AVX2 + AVX-512 in CI.

Testing

355 in-tree lib tests, including bit-exact pyannote parity for PLDA, AHC, VBx, centroid, count tensor, reconstruct, RTTM on all six captured fixtures (06_long_recording strict pipeline parity is #[ignore]d due to GEMM-roundoff drift; covered by the tolerant Hungarian-permuted per-frame match in reconstruct::parity_tests).
CI matrix: ASan, Miri (SB+TB), AVX2 SDE, AVX-512 SDE.
Streaming parity harness at tests/parity/run.sh measures DER against pyannote captures; results table above.

Test plan

cargo test --lib (355 tests; 10 #[ignore]d for documented reasons)
cargo clippy --lib --tests --features 'ort bundled-segmentation' clean
cargo build --examples --features ort clean
RUSTFLAGS=-Dwarnings cargo check --no-default-features --lib (used by SDE CI lanes)
bash tests/parity/run.sh tests/parity/fixtures/<fixture>/clip_16k.wav for each of the six fixtures; DER table above.

🤖 Generated with Claude Code

Copilot

Copilot wasn't able to review this pull request because it exceeds the maximum number of lines (20,000). Try reducing the number of changed lines and requesting a review from Copilot again.

# This is the 1st commit message: update # This is the commit message #2: update

Copilot

Copilot wasn't able to review this pull request because it exceeds the maximum number of lines (20,000). Try reducing the number of changed lines and requesting a review from Copilot again.

…ccuracy v0.1.0 ships: - diarization::segment — speaker segmentation (pyannote/segmentation-3.0). Bundled by default (~6 MB, MIT) via SegmentModel::bundled(). - diarization::embed — speaker fingerprint (WeSpeaker ResNet34 ONNX + kaldi fbank). Caller-fetched (27 MB, exceeds crates.io 10 MB cap). - diarization::plda — pyannote/speaker-diarization-community-1 PLDA whitening. Bundled by default (CC-BY-4.0) via PldaTransform::new(). - diarization::cluster + pipeline — pyannote cluster_vbx primitives (PLDA → AHC → VBx → centroid → cosine → Hungarian → reconstruct). - diarization::offline::OwnedDiarizationPipeline — owned-audio batch entrypoint. - diarization::streaming::StreamingOfflineDiarizer — voice-range-driven streaming entrypoint with the same per-fixture DER as offline.

codecov · 2026-05-06T10:17:23Z

Codecov Report

❌ Patch coverage is 53.26913% with 922 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/embed/model.rs	0.00%	212 Missing ⚠️
src/offline/owned.rs	25.96%	134 Missing ⚠️
src/offline/algo.rs	73.00%	61 Missing ⚠️
src/embed/embedder.rs	17.74%	51 Missing ⚠️
src/embed/fbank.rs	53.53%	46 Missing ⚠️
src/embed/options.rs	0.00%	42 Missing ⚠️
src/ops/arch/x86_avx512/pdist_euclidean.rs	0.00%	42 Missing ⚠️
src/ops/arch/neon/pdist_euclidean.rs	0.00%	39 Missing ⚠️
src/ops/spill.rs	73.33%	36 Missing ⚠️
src/aggregate/count.rs	82.78%	26 Missing ⚠️
... and 17 more

📢 Thoughts on this report? Let us know!

al8n requested a review from Copilot May 2, 2026 00:28

Copilot AI reviewed May 2, 2026

View reviewed changes

uqio changed the title ~~0.1.0~~ v0.1.0: Sans-I/O speaker diarization with pyannote-equivalent accuracy May 2, 2026

uqio added a commit that referenced this pull request May 4, 2026

# This is a combination of 2 commits.

f7def15

# This is the 1st commit message: update # This is the commit message #2: update

uqio force-pushed the 0.1.0 branch from 733cab5 to 583c48d Compare May 4, 2026 22:17

al8n requested a review from Copilot May 6, 2026 09:30

Copilot AI reviewed May 6, 2026

View reviewed changes

uqio force-pushed the 0.1.0 branch from 2836fec to e7b22f0 Compare May 6, 2026 09:41

uqio merged commit 07a684f into main May 6, 2026
62 checks passed

uqio deleted the 0.1.0 branch May 6, 2026 09:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.1.0: Sans-I/O speaker diarization with pyannote-equivalent accuracy#2

v0.1.0: Sans-I/O speaker diarization with pyannote-equivalent accuracy#2
uqio merged 1 commit intomainfrom
0.1.0

uqio commented May 2, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

codecov Bot commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

uqio commented May 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Pipeline

Bundled model artifacts

Public API shape

SIMD policy

Testing

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

codecov Bot commented May 6, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

uqio commented May 2, 2026 •

edited

Loading