Skip to content

Findit-AI/diarization

Repository files navigation

diarization

Sans-I/O speaker diarization with pyannote-equivalent accuracy.

github LoC Build codecov

docs.rs crates.io crates.io license

Quick start

The segmentation model and PLDA weights ship inside the crate — only the WeSpeaker ResNet34-LM embedding ONNX is BYO (~26 MB; above the crates.io 10 MB hard limit, so it cannot be bundled). Fetch it from the FinDIT-Studio/dia-models HuggingFace bundle. Both commands below pin a specific HF commit and verify SHA-256 before installing — a republished or truncated upstream model surfaces as a hard failure rather than silently altering diarization output.

# Pinned upstream revision + expected SHA-256 of the FP32 single-file ONNX.
DIA_EMBED_MODEL_REV="38168b544a562dec24d49e63786c16e80782eeaf"
DIA_EMBED_MODEL_SHA256="4c15c6be4235318d092c9d347e00c68ba476136d6172f675f76ad6b0c2661f01"
mkdir -p models
TMP="$(mktemp "${TMPDIR:-/tmp}/wespeaker_resnet34_lm.XXXXXXXXXX")"
# Option A: huggingface_hub CLI (handles caching, retries, optional auth).
hf download \
  --revision "$DIA_EMBED_MODEL_REV" \
  --local-dir "$(dirname "$TMP")" \
  --local-dir-use-symlinks False \
  FinDIT-Studio/dia-models wespeaker_resnet34_lm.onnx
mv "$(dirname "$TMP")/wespeaker_resnet34_lm.onnx" "$TMP"
# Option B: plain curl, no extra tools.
curl --fail --location \
  --output "$TMP" \
  "https://huggingface.co/FinDIT-Studio/dia-models/resolve/${DIA_EMBED_MODEL_REV}/wespeaker_resnet34_lm.onnx"
# Then verify and install:
ACTUAL="$(shasum -a 256 "$TMP" | awk '{print $1}')"
if [ "$ACTUAL" != "$DIA_EMBED_MODEL_SHA256" ]; then
  echo "SHA-256 mismatch: expected $DIA_EMBED_MODEL_SHA256, got $ACTUAL" >&2
  rm -f "$TMP"; exit 1
fi
mv "$TMP" models/wespeaker_resnet34_lm.onnx

(Workspace developers can also run ./scripts/download-embed-model.sh, which wraps the same revision + SHA. The script is omitted from the published crate tarball, so the inline commands above are the source of truth for crates.io users.)

Then run an end-to-end example. The simplest needs only the ort feature:

cargo run --release --features ort --example run_owned_pipeline -- \
  path/to/clip_16k.wav > hyp.rttm

For the streaming pipeline (uses silero-vad to detect voice ranges on the fly), enable the matching feature:

cargo run --release --features ort,silero-vad --example run_streaming_pipeline -- \
  path/to/clip.wav

DIA_EMBED_MODEL_PATH overrides the default models/wespeaker_resnet34_lm.onnx location if you keep the model elsewhere.

Cargo features

Feature Default What it enables
ort yes The ONNX-runtime-backed SegmentModel and EmbedModel types.
bundled-segmentation yes Embeds models/segmentation-3.0.onnx (~6 MB) into the binary. Exposes SegmentModel::bundled(). Implies ort. Disable to ship a fine-tuned segmentation model separately.
tch no TorchScript embedding backend (libtorch ≈600 MB). Bit-exact pyannote on heavy-overlap fixtures where ONNX→ORT diverges.
silero-vad no Path-dep on the sister silero crate; only used by examples/run_streaming_pipeline.rs.

The PLDA parity test runs as part of the regular test suite — no feature flag required:

cargo test plda::parity_tests

It auto-skips when tests/parity/fixtures/01_dialogue/*.npz is absent (checked-in for this repo, but a fresh checkout from a model-only mirror would have to regenerate them via the Phase-0 capture script).

License

diarization is under the terms of both the MIT license and the Apache License (Version 2.0).

See LICENSE-APACHE, LICENSE-MIT for details. Bundled third-party model attributions and source licenses are documented in THIRD_PARTY_NOTICES.md.

Copyright (c) 2026 FinDIT studio authors.

Bundled-model attributions propagate to downstream binaries

diarization embeds two third-party model artifacts into every compiled binary via include_bytes!:

File License Source
models/segmentation-3.0.onnx (bundled when bundled-segmentation feature is on, default) MIT pyannote/segmentation-3.0
models/plda/*.bin CC-BY-4.0 pyannote/speaker-diarization-community-1

The full SPDX expression is therefore (MIT OR Apache-2.0) AND MIT AND CC-BY-4.0. When you redistribute a binary that depends on diarization, reproduce the attributions from NOTICE somewhere a recipient can find — for instance, in your application's "About" or third-party-licenses page. Full provenance: models/SOURCE.md (segmentation), models/plda/SOURCE.md (PLDA).

To opt out of the segmentation bundling (e.g. to ship a fine-tuned variant), disable default features: diarization = { version = "...", default-features = false, features = ["ort"] }. You then load via SegmentModel::from_file / from_memory directly.

About

No description, website, or topics provided.

Resources

License

Apache-2.0, MIT licenses found

Licenses found

Apache-2.0
LICENSE-APACHE
MIT
LICENSE-MIT

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Generated from Findit-AI/template-rs