Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
75 changes: 72 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,24 +32,29 @@ Safe Rust bindings for [whisper.cpp][whisper-cpp] speech-to-text inference.
- **Backend matrix.** Metal, CoreML, Vulkan, OpenCL, CUDA, ROCm
(HIP), oneAPI (SYCL), Moore Threads (MUSA), OpenVINO, OpenBLAS —
all opt-in via Cargo features.
- **DTW token timestamps.** Built-in token-level timing via DTW
over the configured alignment heads (`AlignmentHeadsPreset`),
with safe per-token availability through
`Token::t_dtw() -> Option<i64>`. See
[DTW timestamps](#dtw-timestamps).

## Installation

```toml
[dependencies]
whispercpp = "0.1"
whispercpp = "0.2"
```

The default build is plain CPU. Opt into accelerators per-target:

```toml
# macOS Apple Silicon
[target.'cfg(all(target_os = "macos", target_arch = "aarch64"))'.dependencies]
whispercpp = { version = "0.1", features = ["metal", "coreml"] }
whispercpp = { version = "0.2", features = ["metal", "coreml"] }

# Linux + NVIDIA
[target.'cfg(all(target_os = "linux", target_arch = "x86_64"))'.dependencies]
whispercpp = { version = "0.1", features = ["cuda"] }
whispercpp = { version = "0.2", features = ["cuda"] }
```

## Examples
Expand Down Expand Up @@ -80,6 +85,70 @@ GPU backends require the corresponding vendor SDK (CUDA Toolkit,
ROCm, oneAPI, etc.) installed at link time. CI exercises the
bundled CPU path on Linux/macOS/Windows and Metal+CoreML on macOS.

## DTW timestamps

Token-level timestamps via DTW over the decoder's
cross-attention weights. Enable at `Context` construction:

```rust
use whispercpp::{Context, ContextParams, AlignmentHeadsPreset};

let ctx = Context::new(
"ggml-large-v3-turbo.bin",
ContextParams::new()
.with_use_gpu(true)
.with_dtw_token_timestamps(true)
.with_dtw_aheads_preset(AlignmentHeadsPreset::LargeV3Turbo),
)?;
```

Match `AlignmentHeadsPreset` to your model — the safe API
ships every standard checkpoint preset (`TinyEn` through
`LargeV3Turbo`). Mismatched presets produce noisy timings
without erroring; bound-checked by `required_dtw_mem_size_for`
and rejected at load if the model's `n_text_ctx` exceeds
`SUPPORTED_DTW_N_TEXT_CTX`.

After `state.full(&params, &samples)`, read per-token DTW
timing as `Option<i64>` (centiseconds):

```rust
for i in 0..state.n_segments() {
let seg = state.segment(i).unwrap();
for j in 0..seg.n_tokens() {
let token = seg.token(j).unwrap();
match token.t_dtw() {
Some(t) => println!("token={} t_dtw={:.2}s",
token.id(), t as f64 / 100.0),
None => /* DTW unavailable for this token */ (),
}
}
}
```

`None` covers four cases: DTW not enabled at construction,
non-text token (special / timestamp), per-segment DTW skip
because `Params::set_audio_ctx` was overridden too small, or
audio window too short for the median-filter pass. The
underlying C-side patch (`whispercpp-sys: dtw t_dtw sentinel
init`) initialises `t_dtw = -1` before every DTW pass so the
sentinel uniquely identifies "unavailable" — `Some(0)` is a
valid timestamp (token at audio offset 0), not the sentinel.

Constraints (enforced at `Context::new`):

| Constraint | What it does |
|---|---|
| `dtw + flash_attn` | Rejected. Whisper.cpp silently disables DTW under flash-attn; the wrapper refuses the combination explicitly. |
| `dtw + custom n_text_ctx > 448` | Rejected. The DTW scratch arena is sized for standard whisper checkpoints; non-standard models with larger text context would overflow it. |
| `dtw_mem_size` | Clamped to `[MIN_DTW_MEM_SIZE, MAX_DTW_MEM_SIZE]`, then raised to the per-preset minimum from `required_dtw_mem_size_for`. |

Native abort paths inside the DTW helper
(allocation failures, invalid windows, decoder errors) are
all converted to `WhisperError::StateLost` via the existing
exception shim — no `abort()` is reachable from safe Rust
through this surface.

## Memory safety

`whisper.cpp` is a binary parser of attacker-controllable model files
Expand Down
2 changes: 1 addition & 1 deletion whispercpp-sys/Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[package]
name = "whispercpp-sys"
version = "0.1.0"
version = "0.2.0"
edition.workspace = true
rust-version.workspace = true
license.workspace = true
Expand Down
155 changes: 123 additions & 32 deletions whispercpp-sys/build.rs
Original file line number Diff line number Diff line change
Expand Up @@ -116,51 +116,142 @@ fn bundled_build() {
/// upstream AND someone manually replacing the submodule with
/// a different tree.
fn verify_patched_source(whisper_src: &Path) {
let target = whisper_src.join("src").join("whisper.cpp");
let body = match std::fs::read_to_string(&target) {
Ok(b) => b,
Err(e) => panic!(
"whispercpp-sys: failed to read {} for patch verification: {e}",
target.display()
),
};

// Sentinels chosen from the highest-leverage patches —
// the ones whose absence would re-introduce the
// double-free / null-deref / leak hazards the Rust
// wrapper assumes are closed.
const REQUIRED_MARKERS: &[&str] = &[
"whispercpp-sys: kv_cache_free idempotent fix",
"whispercpp-sys: read_safe zero-init",
"whispercpp-sys: init_state RAII entry",
"whispercpp-sys: init_context RAII entry",
"whispercpp-sys: tensor header validation (model_load)",
"whispercpp-sys: ggml_log_set once-per-process",
"whispercpp-sys: hparams validation",
"whispercpp-sys: lang_str null guard",
"whispercpp-sys: special-token bounds check",
"whispercpp-sys: path_model assignment guard",
"whispercpp-sys: sched abort callback wiring",
"whispercpp-sys: vad_init RAII guard",
// double-free / null-deref / leak / native-abort hazards
// the Rust wrapper assumes are closed. Each entry is
// `(file_relative_to_whisper_src, expected_marker)`; the
// build hard-fails if any are absent.
//
// We split across both `src/whisper.cpp` and
// `ggml/src/ggml.c` because some safety patches sit in
// each. The ggml patch (OOM-safe `ggml_init`) is what
// turns the DTW scratch-allocation OOM path from
// `abort()`-uncatchable into a `WhisperError::StateLost`
// recovery — without it the wrapper's `dtw scratch
// alloc-fail throws` patch is dead code.
const REQUIRED_MARKERS: &[(&str, &str)] = &[
(
"src/whisper.cpp",
"whispercpp-sys: kv_cache_free idempotent fix",
),
("src/whisper.cpp", "whispercpp-sys: read_safe zero-init"),
("src/whisper.cpp", "whispercpp-sys: init_state RAII entry"),
("src/whisper.cpp", "whispercpp-sys: init_context RAII entry"),
(
"src/whisper.cpp",
"whispercpp-sys: tensor header validation (model_load)",
),
(
"src/whisper.cpp",
"whispercpp-sys: ggml_log_set once-per-process",
),
("src/whisper.cpp", "whispercpp-sys: hparams validation"),
("src/whisper.cpp", "whispercpp-sys: lang_str null guard"),
(
"src/whisper.cpp",
"whispercpp-sys: special-token bounds check",
),
(
"src/whisper.cpp",
"whispercpp-sys: path_model assignment guard",
),
(
"src/whisper.cpp",
"whispercpp-sys: sched abort callback wiring",
),
("src/whisper.cpp", "whispercpp-sys: vad_init RAII guard"),
("src/whisper.cpp", "whispercpp-sys: dtw scratch RAII guard"),
(
"src/whisper.cpp",
"whispercpp-sys: dtw scratch alloc-fail throws",
),
(
"src/whisper.cpp",
"whispercpp-sys: dtw token assignment bounded",
),
(
"src/whisper.cpp",
"whispercpp-sys: dtw short-window medfilt clamp",
),
(
"src/whisper.cpp",
"whispercpp-sys: dtw audio_ctx override guard",
),
(
"src/whisper.cpp",
"whispercpp-sys: ggml_init throw-on-null wrapper",
),
(
"src/whisper.cpp",
"whispercpp-sys: dtw decode failure throws",
),
("src/whisper.cpp", "whispercpp-sys: kv buffer null throws"),
(
"src/whisper.cpp",
"whispercpp-sys: dtw backtrace impossible-case throws",
),
(
"src/whisper.cpp",
"whispercpp-sys: dtw aheads_cross_QKs invariants throw",
),
(
"src/whisper.cpp",
"whispercpp-sys: token_to_str sparse-vocab no-throw",
),
(
"src/whisper.cpp",
"whispercpp-sys: hparams head divisibility check",
),
(
"src/whisper.cpp",
"whispercpp-sys: dtw backend compute throws",
),
("src/whisper.cpp", "whispercpp-sys: dtw t_dtw sentinel init"),
(
"ggml/src/ggml.c",
"whispercpp-sys: ggml_init OOM-safe context alloc",
),
];

let missing: Vec<&str> = REQUIRED_MARKERS
.iter()
.copied()
.filter(|m| !body.contains(m))
.collect();
// Read each referenced file once, then check every
// marker that points at it. Group markers by file so we
// don't re-read the same source on every iteration.
use std::collections::HashMap;
let mut by_file: HashMap<&str, Vec<&str>> = HashMap::new();
for (file, marker) in REQUIRED_MARKERS {
by_file.entry(*file).or_default().push(*marker);
}

let mut missing: Vec<(&str, &str)> = Vec::new();
for (rel, markers) in &by_file {
let target = whisper_src.join(rel);
let body = match std::fs::read_to_string(&target) {
Ok(b) => b,
Err(e) => panic!(
"whispercpp-sys: failed to read {} for patch verification: {e}",
target.display()
),
};
for m in markers {
if !body.contains(*m) {
missing.push((*rel, *m));
}
}
}

if !missing.is_empty() {
panic!(
"whispercpp-sys: the linked whisper.cpp source at {} is missing the rust-branch patches \
"whispercpp-sys: the linked whisper.cpp source under {} is missing rust-branch patches \
(required marker{} absent: {:?}).\n\n\
The Rust safety surface depends on these patches; building against unpatched upstream \
reintroduces multi-decoder double-free / use-after-free / null-deref classes.\n\n\
reintroduces multi-decoder double-free / use-after-free / null-deref / native-abort \
classes.\n\n\
Fix: ensure the submodule tracks `Findit-AI/whisper.cpp` branch `rust`. Run\n \
git submodule update --init --recursive\n\
from the repo root. If you intentionally pointed at a different source, add equivalent \
patches and the matching marker comments before retrying.",
target.display(),
whisper_src.display(),
if missing.len() == 1 { "" } else { "s" },
missing,
);
Expand Down
2 changes: 1 addition & 1 deletion whispercpp-sys/whisper.cpp
Submodule whisper.cpp updated 2 files
+45 −4 ggml/src/ggml.c
+461 −38 src/whisper.cpp
4 changes: 2 additions & 2 deletions whispercpp/Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[package]
name = "whispercpp"
version = "0.1.0"
version = "0.2.0"
edition.workspace = true
rust-version.workspace = true
license.workspace = true
Expand Down Expand Up @@ -76,7 +76,7 @@ openblas = ["whispercpp-sys/openblas"] # OpenBLAS
# `../whispercpp-sys/`. All `unsafe extern "C"` declarations
# live there; this crate only ever calls them behind safe
# wrappers.
whispercpp-sys = { version = "0.1", path = "../whispercpp-sys", default-features = false }
whispercpp-sys = { version = "0.2", path = "../whispercpp-sys", default-features = false }
# Public error type. `thiserror` keeps things light.
thiserror = { version = "2", default-features = false }
# Inline small strings (≤23 bytes) for error payloads — paths,
Expand Down
12 changes: 0 additions & 12 deletions whispercpp/TODO.md
Original file line number Diff line number Diff line change
Expand Up @@ -100,18 +100,6 @@ requires more design than a 1:1 port.
Symbols: `whisper_set_log_callback`, `set_debug_mode`,
`whisper_log_callback`.

### DTW token timestamps

Whispery uses wav2vec2 forced alignment for word-level timing.
whisper.cpp's DTW path is a parallel mechanism with its own
configuration (`dtw_aheads`, `dtw_n_top`, `dtw_mem_size`). Wrapping
it would invite confusion about which timestamping path is
authoritative.

Symbols: `whisper_full_params::dtw_token_timestamps` (true at
construction, but `Params::set_dtw_*` and `dtw_aheads` array are
not exposed), `whisper_aheads`, `whisper_full_get_token_dtw_t0_*`.

### Buffer-load constructors

We support `Context::new(path, params)` only. Loading from an
Expand Down
Loading
Loading