whispercpp

Safe Rust bindings for whisper.cpp speech-to-text inference.

Always-bundled build. whispercpp-sys cmake-builds a vendored, patched whisper.cpp; there is no pkg-config / system-install path. The patched source lives on a fork branch with each fix as a reviewable commit (see Memory safety below).
Panic-free safe surface. Every FFI call is wrapped in a C++ exception-catching shim, every fallible setter returns WhisperError, every accessor short-circuits on poisoned state.
Send + Sync Context; per-Context State is Send. Concurrent inference is serialized through a per-Context mutex so per-call leak budgets are structural, not documentary.
Backend matrix. Metal, CoreML, Vulkan, OpenCL, CUDA, ROCm (HIP), oneAPI (SYCL), Moore Threads (MUSA), OpenVINO, OpenBLAS — all opt-in via Cargo features.
DTW token timestamps. Built-in token-level timing via DTW over the configured alignment heads (AlignmentHeadsPreset), with safe per-token availability through Token::t_dtw() -> Option<i64>. See DTW timestamps.

Installation

[dependencies]
whispercpp = "0.2"

The default build is plain CPU. Opt into accelerators per-target:

# macOS Apple Silicon
[target.'cfg(all(target_os = "macos", target_arch = "aarch64"))'.dependencies]
whispercpp = { version = "0.2", features = ["metal", "coreml"] }

# Linux + NVIDIA
[target.'cfg(all(target_os = "linux", target_arch = "x86_64"))'.dependencies]
whispercpp = { version = "0.2", features = ["cuda"] }

Examples

A working end-to-end example lives at whispercpp/examples/smoke.rs.

Backends

All backend features chain to the matching whispercpp-sys feature which toggles the corresponding ggml / whisper CMake flag.

Feature	Backend	Platforms
`metal`	Metal GPU	Apple
`coreml`	CoreML / ANE encoder	Apple (with `.mlmodelc`)
`vulkan`	Vulkan compute	Linux / Windows / Android / MoltenVK on macOS
`opencl`	OpenCL (mobile / Adreno)	Linux / Android
`cuda`	NVIDIA CUDA	Linux / Windows
`hipblas`	AMD ROCm / HIP	Linux
`sycl`	Intel oneAPI / Arc	Linux / Windows
`musa`	Moore Threads MUSA	Linux
`openvino`	Intel OpenVINO encoder	Linux / Windows
`openblas`	OpenBLAS CPU	Any
`serde`	`Serialize` / `Deserialize` for `Lang` (lowercase ISO-639-1)	—

GPU backends require the corresponding vendor SDK (CUDA Toolkit, ROCm, oneAPI, etc.) installed at link time. CI exercises the bundled CPU path on Linux/macOS/Windows and Metal+CoreML on macOS.

DTW timestamps

Token-level timestamps via DTW over the decoder's cross-attention weights. Enable at Context construction:

use whispercpp::{Context, ContextParams, AlignmentHeadsPreset};

let ctx = Context::new(
    "ggml-large-v3-turbo.bin",
    ContextParams::new()
        .with_use_gpu(true)
        .with_dtw_token_timestamps(true)
        .with_dtw_aheads_preset(AlignmentHeadsPreset::LargeV3Turbo),
)?;

Match AlignmentHeadsPreset to your model — the safe API ships every standard checkpoint preset (TinyEn through LargeV3Turbo). Mismatched presets produce noisy timings without erroring; bound-checked by required_dtw_mem_size_for and rejected at load if the model's n_text_ctx exceeds SUPPORTED_DTW_N_TEXT_CTX.

After state.full(&params, &samples), read per-token DTW timing as Option<i64> (centiseconds):

for i in 0..state.n_segments() {
    let seg = state.segment(i).unwrap();
    for j in 0..seg.n_tokens() {
        let token = seg.token(j).unwrap();
        match token.t_dtw() {
            Some(t) => println!("token={} t_dtw={:.2}s",
                token.id(), t as f64 / 100.0),
            None    => /* DTW unavailable for this token */ (),
        }
    }
}

None covers four cases: DTW not enabled at construction, non-text token (special / timestamp), per-segment DTW skip because Params::set_audio_ctx was overridden too small, or audio window too short for the median-filter pass. The underlying C-side patch (whispercpp-sys: dtw t_dtw sentinel init) initialises t_dtw = -1 before every DTW pass so the sentinel uniquely identifies "unavailable" — Some(0) is a valid timestamp (token at audio offset 0), not the sentinel.

Constraints (enforced at Context::new):

Constraint	What it does
`dtw + flash_attn`	Rejected. Whisper.cpp silently disables DTW under flash-attn; the wrapper refuses the combination explicitly.
`dtw + custom n_text_ctx > 448`	Rejected. The DTW scratch arena is sized for standard whisper checkpoints; non-standard models with larger text context would overflow it.
`dtw_mem_size`	Clamped to `[MIN_DTW_MEM_SIZE, MAX_DTW_MEM_SIZE]`, then raised to the per-preset minimum from `required_dtw_mem_size_for`.

Native abort paths inside the DTW helper (allocation failures, invalid windows, decoder errors) are all converted to WhisperError::StateLost via the existing exception shim — no abort() is reachable from safe Rust through this surface.

Memory safety

whisper.cpp is a binary parser of attacker-controllable model files plus a substantial C++ inference path. The vendored submodule is pinned to our fork branch (Findit-AI/whisper.cpp@rust), which carries fixes for upstream issues reachable from safe Rust:

whisper_kv_cache_free made idempotent (closes a multi-decoder OOM double-free of a ggml backend buffer).
whisper_init_state / whisper_init_with_params_no_state / whisper_vad_init_with_params wrapped in RAII so a throw mid-init releases the partial allocation rather than leaking the whisper_context / whisper_state.
Tensor headers fully validated: n_dims ∈ [0, 4], name length bounded, ttype < GGML_TYPE_COUNT, per-dim positivity, 64-bit overflow check on nelements.
Hparams validated against generous-but-bounded ranges; min n_text_ctx enforced so the decode batch can hold the worst-case prompt.
Special-token ids verified to fit n_vocab after the multilingual shift (closes a corrupt-vocab OOB into logits[]).
File / buffer loaders throw on partial reads (peek-based EOF detection so clean end-of-tensor-list still terminates).
Tensor-name set tracking rejects models that satisfy the loaded-count check by repeating one name.
ggml_log_set installed once per process via std::atomic so concurrent create_state + State::full don't race on ggml's static logger globals.
vocab.num_languages() synthesis null-checks whisper_lang_str (closes std::string(nullptr) UB).
The abort callback is wired through every sched-based graph compute so cancellation interrupts the long-running encoder / decoder paths, not just the gaps between them.

A C++ exception-catching shim layer (whispercpp_shim.cpp) sits between the safe Rust API and every throwing entry point. The bindgen allowlist is enumerated symbol-by-symbol — only no-throw raw whisper_* functions are exposed; every throwing function goes through a whispercpp_* shim that catches and surfaces the exception class as a sentinel (WhisperError::ConstructorLost, StateLost, etc.).

build.rs includes a canary that scans the linked source for the required patch markers and hard-fails the build if any are missing.

For the design details, the per-finding analysis lives on the fork branch's commit history.

Crate structure

Crate	Purpose
`whispercpp`	Safe Rust API (`Context`, `State`, `Params`, `Lang`, `WhisperError`). End-user dependency.
`whispercpp-sys`	Bindgen output + `build.rs` (cmake build, link directives) + the C++ exception-catching shim.

End users should depend on whispercpp. whispercpp-sys is re-exported as whispercpp::sys for callers who need a raw escape hatch (review every use carefully — only no-throw symbols are exposed but it's unsafe regardless).

Supported platforms

CI runs on ubuntu-latest, macos-latest, and windows-latest. Sanitizer (ASan + UBSan) and Miri jobs gate the unsafe boundary on every PR. MSRV is pinned in Cargo.toml and enforced via rust-version.

License

whispercpp is under the terms of both the MIT license and the Apache License (Version 2.0).

See LICENSE-APACHE, LICENSE-MIT for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

whispercpp

Installation

Examples

Backends

DTW timestamps

Memory safety

Crate structure

Supported platforms

License

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

whispercpp

Installation

Examples

Backends

DTW timestamps

Memory safety

Crate structure

Supported platforms

License