diarization

Sans-I/O speaker diarization with pyannote-equivalent accuracy.

Quick start

The segmentation model and PLDA weights ship inside the crate — only the WeSpeaker ResNet34-LM embedding ONNX is BYO (~26 MB; above the crates.io 10 MB hard limit, so it cannot be bundled). Fetch it from the FinDIT-Studio/dia-models HuggingFace bundle. Both commands below pin a specific HF commit and verify SHA-256 before installing — a republished or truncated upstream model surfaces as a hard failure rather than silently altering diarization output.

# Pinned upstream revision + expected SHA-256 of the FP32 single-file ONNX.
DIA_EMBED_MODEL_REV="38168b544a562dec24d49e63786c16e80782eeaf"
DIA_EMBED_MODEL_SHA256="4c15c6be4235318d092c9d347e00c68ba476136d6172f675f76ad6b0c2661f01"
mkdir -p models
TMP="$(mktemp "${TMPDIR:-/tmp}/wespeaker_resnet34_lm.XXXXXXXXXX")"

# Option A: huggingface_hub CLI (handles caching, retries, optional auth).
hf download \
  --revision "$DIA_EMBED_MODEL_REV" \
  --local-dir "$(dirname "$TMP")" \
  --local-dir-use-symlinks False \
  FinDIT-Studio/dia-models wespeaker_resnet34_lm.onnx
mv "$(dirname "$TMP")/wespeaker_resnet34_lm.onnx" "$TMP"

# Option B: plain curl, no extra tools.
curl --fail --location \
  --output "$TMP" \
  "https://huggingface.co/FinDIT-Studio/dia-models/resolve/${DIA_EMBED_MODEL_REV}/wespeaker_resnet34_lm.onnx"

# Then verify and install:
ACTUAL="$(shasum -a 256 "$TMP" | awk '{print $1}')"
if [ "$ACTUAL" != "$DIA_EMBED_MODEL_SHA256" ]; then
  echo "SHA-256 mismatch: expected $DIA_EMBED_MODEL_SHA256, got $ACTUAL" >&2
  rm -f "$TMP"; exit 1
fi
mv "$TMP" models/wespeaker_resnet34_lm.onnx

(Workspace developers can also run ./scripts/download-embed-model.sh, which wraps the same revision + SHA. The script is omitted from the published crate tarball, so the inline commands above are the source of truth for crates.io users.)

Then run an end-to-end example. The simplest needs only the ort feature:

cargo run --release --features ort --example run_owned_pipeline -- \
  path/to/clip_16k.wav > hyp.rttm

For the streaming pipeline (uses silero-vad to detect voice ranges on the fly), enable the matching feature:

cargo run --release --features ort,silero-vad --example run_streaming_pipeline -- \
  path/to/clip.wav

DIA_EMBED_MODEL_PATH overrides the default models/wespeaker_resnet34_lm.onnx location if you keep the model elsewhere.

Cargo features

Feature	Default	What it enables
`ort`	yes	The ONNX-runtime-backed `SegmentModel` and `EmbedModel` types.
`bundled-segmentation`	yes	Embeds `models/segmentation-3.0.onnx` (~6 MB) into the binary. Exposes `SegmentModel::bundled()`. Implies `ort`. Disable to ship a fine-tuned segmentation model separately.
`tch`	no	TorchScript embedding backend (libtorch ≈600 MB). Bit-exact pyannote on heavy-overlap fixtures where ONNX→ORT diverges.
`silero-vad`	no	Path-dep on the sister `silero` crate; only used by `examples/run_streaming_pipeline.rs`.

The PLDA parity test runs as part of the regular test suite — no feature flag required:

cargo test plda::parity_tests

It auto-skips when tests/parity/fixtures/01_dialogue/*.npz is absent (checked-in for this repo, but a fresh checkout from a model-only mirror would have to regenerate them via the Phase-0 capture script).

License

diarization is under the terms of both the MIT license and the Apache License (Version 2.0).

See LICENSE-APACHE, LICENSE-MIT for details. Bundled third-party model attributions and source licenses are documented in THIRD_PARTY_NOTICES.md.

Bundled-model attributions propagate to downstream binaries

diarization embeds two third-party model artifacts into every compiled binary via include_bytes!:

File	License	Source
`models/segmentation-3.0.onnx` (bundled when `bundled-segmentation` feature is on, default)	MIT	pyannote/segmentation-3.0
`models/plda/*.bin`	CC-BY-4.0	pyannote/speaker-diarization-community-1

The full SPDX expression is therefore (MIT OR Apache-2.0) AND MIT AND CC-BY-4.0. When you redistribute a binary that depends on diarization, reproduce the attributions from NOTICE somewhere a recipient can find — for instance, in your application's "About" or third-party-licenses page. Full provenance: models/SOURCE.md (segmentation), models/plda/SOURCE.md (PLDA).

To opt out of the segmentation bundling (e.g. to ship a fine-tuned variant), disable default features: diarization = { version = "...", default-features = false, features = ["ort"] }. You then load via SegmentModel::from_file / from_memory directly.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.github		.github
benches		benches
ci		ci
examples		examples
models		models
scripts		scripts
spikes/kaldi_fbank		spikes/kaldi_fbank
src		src
tests		tests
.codecov.yml		.codecov.yml
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
Cargo.toml		Cargo.toml
LICENSE-APACHE		LICENSE-APACHE
LICENSE-MIT		LICENSE-MIT
NOTICE		NOTICE
README-zh_CN.md		README-zh_CN.md
README.md		README.md
build.rs		build.rs
rustfmt.toml		rustfmt.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

diarization

Quick start

Cargo features

License

Bundled-model attributions propagate to downstream binaries

About

Licenses found

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

diarization

Quick start

Cargo features

License

Bundled-model attributions propagate to downstream binaries

About

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages