Safe Rust bindings for whisper.cpp speech-to-text inference.
- Always-bundled build.
whispercpp-syscmake-builds a vendored, patched whisper.cpp; there is no pkg-config / system-install path. The patched source lives on a fork branch with each fix as a reviewable commit (see Memory safety below). - Panic-free safe surface. Every FFI call is wrapped in a C++
exception-catching shim, every fallible setter returns
WhisperError, every accessor short-circuits on poisoned state. Send + SyncContext; per-ContextStateisSend. Concurrent inference is serialized through a per-Contextmutex so per-call leak budgets are structural, not documentary.- Backend matrix. Metal, CoreML, Vulkan, OpenCL, CUDA, ROCm (HIP), oneAPI (SYCL), Moore Threads (MUSA), OpenVINO, OpenBLAS — all opt-in via Cargo features.
- DTW token timestamps. Built-in token-level timing via DTW
over the configured alignment heads (
AlignmentHeadsPreset), with safe per-token availability throughToken::t_dtw() -> Option<i64>. See DTW timestamps.
[dependencies]
whispercpp = "0.2"The default build is plain CPU. Opt into accelerators per-target:
# macOS Apple Silicon
[target.'cfg(all(target_os = "macos", target_arch = "aarch64"))'.dependencies]
whispercpp = { version = "0.2", features = ["metal", "coreml"] }
# Linux + NVIDIA
[target.'cfg(all(target_os = "linux", target_arch = "x86_64"))'.dependencies]
whispercpp = { version = "0.2", features = ["cuda"] }A working end-to-end example lives at
whispercpp/examples/smoke.rs.
All backend features chain to the matching whispercpp-sys feature
which toggles the corresponding ggml / whisper CMake flag.
| Feature | Backend | Platforms |
|---|---|---|
metal |
Metal GPU | Apple |
coreml |
CoreML / ANE encoder | Apple (with .mlmodelc) |
vulkan |
Vulkan compute | Linux / Windows / Android / MoltenVK on macOS |
opencl |
OpenCL (mobile / Adreno) | Linux / Android |
cuda |
NVIDIA CUDA | Linux / Windows |
hipblas |
AMD ROCm / HIP | Linux |
sycl |
Intel oneAPI / Arc | Linux / Windows |
musa |
Moore Threads MUSA | Linux |
openvino |
Intel OpenVINO encoder | Linux / Windows |
openblas |
OpenBLAS CPU | Any |
serde |
Serialize / Deserialize for Lang (lowercase ISO-639-1) |
— |
GPU backends require the corresponding vendor SDK (CUDA Toolkit, ROCm, oneAPI, etc.) installed at link time. CI exercises the bundled CPU path on Linux/macOS/Windows and Metal+CoreML on macOS.
Token-level timestamps via DTW over the decoder's
cross-attention weights. Enable at Context construction:
use whispercpp::{Context, ContextParams, AlignmentHeadsPreset};
let ctx = Context::new(
"ggml-large-v3-turbo.bin",
ContextParams::new()
.with_use_gpu(true)
.with_dtw_token_timestamps(true)
.with_dtw_aheads_preset(AlignmentHeadsPreset::LargeV3Turbo),
)?;Match AlignmentHeadsPreset to your model — the safe API
ships every standard checkpoint preset (TinyEn through
LargeV3Turbo). Mismatched presets produce noisy timings
without erroring; bound-checked by required_dtw_mem_size_for
and rejected at load if the model's n_text_ctx exceeds
SUPPORTED_DTW_N_TEXT_CTX.
After state.full(¶ms, &samples), read per-token DTW
timing as Option<i64> (centiseconds):
for i in 0..state.n_segments() {
let seg = state.segment(i).unwrap();
for j in 0..seg.n_tokens() {
let token = seg.token(j).unwrap();
match token.t_dtw() {
Some(t) => println!("token={} t_dtw={:.2}s",
token.id(), t as f64 / 100.0),
None => /* DTW unavailable for this token */ (),
}
}
}None covers four cases: DTW not enabled at construction,
non-text token (special / timestamp), per-segment DTW skip
because Params::set_audio_ctx was overridden too small, or
audio window too short for the median-filter pass. The
underlying C-side patch (whispercpp-sys: dtw t_dtw sentinel init) initialises t_dtw = -1 before every DTW pass so the
sentinel uniquely identifies "unavailable" — Some(0) is a
valid timestamp (token at audio offset 0), not the sentinel.
Constraints (enforced at Context::new):
| Constraint | What it does |
|---|---|
dtw + flash_attn |
Rejected. Whisper.cpp silently disables DTW under flash-attn; the wrapper refuses the combination explicitly. |
dtw + custom n_text_ctx > 448 |
Rejected. The DTW scratch arena is sized for standard whisper checkpoints; non-standard models with larger text context would overflow it. |
dtw_mem_size |
Clamped to [MIN_DTW_MEM_SIZE, MAX_DTW_MEM_SIZE], then raised to the per-preset minimum from required_dtw_mem_size_for. |
Native abort paths inside the DTW helper
(allocation failures, invalid windows, decoder errors) are
all converted to WhisperError::StateLost via the existing
exception shim — no abort() is reachable from safe Rust
through this surface.
whisper.cpp is a binary parser of attacker-controllable model files
plus a substantial C++ inference path. The vendored submodule is
pinned to our fork branch
(Findit-AI/whisper.cpp@rust), which carries
fixes for upstream issues reachable from safe Rust:
whisper_kv_cache_freemade idempotent (closes a multi-decoder OOM double-free of a ggml backend buffer).whisper_init_state/whisper_init_with_params_no_state/whisper_vad_init_with_paramswrapped in RAII so a throw mid-init releases the partial allocation rather than leaking the whisper_context / whisper_state.- Tensor headers fully validated:
n_dims ∈ [0, 4], name length bounded,ttype < GGML_TYPE_COUNT, per-dim positivity, 64-bit overflow check onnelements. - Hparams validated against generous-but-bounded ranges; min
n_text_ctxenforced so the decode batch can hold the worst-case prompt. - Special-token ids verified to fit
n_vocabafter the multilingual shift (closes a corrupt-vocab OOB intologits[]). - File / buffer loaders throw on partial reads (peek-based EOF detection so clean end-of-tensor-list still terminates).
- Tensor-name set tracking rejects models that satisfy the loaded-count check by repeating one name.
ggml_log_setinstalled once per process viastd::atomicso concurrentcreate_state+State::fulldon't race on ggml's static logger globals.vocab.num_languages()synthesis null-checkswhisper_lang_str(closesstd::string(nullptr)UB).- The abort callback is wired through every sched-based graph compute so cancellation interrupts the long-running encoder / decoder paths, not just the gaps between them.
A C++ exception-catching shim layer (whispercpp_shim.cpp) sits
between the safe Rust API and every throwing entry point. The
bindgen allowlist is enumerated symbol-by-symbol — only no-throw
raw whisper_* functions are exposed; every throwing function
goes through a whispercpp_* shim that catches and surfaces the
exception class as a sentinel (WhisperError::ConstructorLost,
StateLost, etc.).
build.rs includes a canary that scans the linked source for the
required patch markers and hard-fails the build if any are missing.
For the design details, the per-finding analysis lives on the fork branch's commit history.
| Crate | Purpose |
|---|---|
whispercpp |
Safe Rust API (Context, State, Params, Lang, WhisperError). End-user dependency. |
whispercpp-sys |
Bindgen output + build.rs (cmake build, link directives) + the C++ exception-catching shim. |
End users should depend on whispercpp. whispercpp-sys is
re-exported as whispercpp::sys for callers who need a raw
escape hatch (review every use carefully — only no-throw symbols
are exposed but it's unsafe regardless).
CI runs on ubuntu-latest, macos-latest, and windows-latest.
Sanitizer (ASan + UBSan) and Miri jobs gate the unsafe boundary
on every PR. MSRV is pinned in Cargo.toml and enforced via
rust-version.
whispercpp is under the terms of both the MIT license and the
Apache License (Version 2.0).
See LICENSE-APACHE, LICENSE-MIT for details.
Copyright (c) 2026 FinDIT Studio authors.