Conversation
Adds the safe wrapper layer for whisper.cpp's DTW
(Dynamic Time Warping) token-level timestamp path,
plus the matching native-side hardening that makes the
feature reachable from safe Rust without exposing
abort paths under resource pressure or malformed input.
`whispercpp::ContextParams` gains:
- `with_dtw_token_timestamps(bool)` — enable the DTW
pass at context construction.
- `with_dtw_aheads_preset(AlignmentHeadsPreset)` —
pick the alignment-head set; one variant per
shipping whisper checkpoint (`TinyEn` through
`LargeV3Turbo`). `None` disables DTW even when the
flag is on.
- `with_dtw_mem_size(usize)` — override the DTW
scratch arena. Clamped to
`[MIN_DTW_MEM_SIZE, MAX_DTW_MEM_SIZE]` and raised
to the per-preset minimum from
`required_dtw_mem_size_for(preset)`.
`whispercpp::Token` gains:
- `t_dtw() -> Option<i64>` — DTW-derived timestamp in
centiseconds. `None` covers DTW-disabled, non-text
tokens, per-segment skips (audio_ctx mismatch,
short-window medfilt). `-1` sentinel set by the
native side before each DTW pass.
New public constants / functions:
- `DEFAULT_DTW_MEM_SIZE`, `MIN_DTW_MEM_SIZE`,
`MAX_DTW_MEM_SIZE` — scratch arena bounds (128 MiB,
128 MiB, 4 GiB).
- `SUPPORTED_DTW_N_TEXT_CTX` (= 448) — the standard
whisper text-context window the wrapper budgets
for; non-standard models with larger `n_text_ctx`
are refused at `Context::new` when DTW is on.
- `required_dtw_mem_size_for(AlignmentHeadsPreset)`
— per-preset scratch requirement (callers can
pre-size budgets).
Configuration rejections enforced at `Context::new`:
- DTW + `flash_attn` (whisper.cpp silently disables
DTW under flash-attn; refused explicitly so the
Rust API contract isn't violated).
- DTW + custom `n_text_ctx > 448` (the scratch arena
is sized for standard checkpoints).
`Findit-AI/whisper.cpp@rust`)
Sixteen native-side patches harden every assertion /
abort / silent-skip path reachable from safe Rust
through the DTW surface, plus several adjacent paths
the audit surfaced:
DTW-specific:
1. DTW scratch RAII guard — `gctx` wrapped in a
`unique_ptr` so any throw between
`ggml_init` and the explicit `ggml_free` releases
the arena via stack unwinding. Closes a
~`dtw_mem_size` (default 128 MiB) leak per failed
decode.
2. DTW scratch alloc-fail throws — explicit
`throw std::bad_alloc()` on `ggml_init` NULL,
replacing a silent return that left every
`Token::t_dtw` at zero with no error signal.
3. DTW token assignment bounded — replaces the nested
iterator walk over `result_all` with a flat list
of text-token pointers + bounded index walk.
Closes a past-the-end iterator deref on the last
token of the last segment.
4. DTW short-window medfilt clamp — adapts the
hardcoded `medfilt_width=7` down to the largest
odd value strictly less than `n_audio_tokens`,
skipping the median filter entirely for
`n_audio_tokens <= 1`. Closes a
`WHISPER_ASSERT(filter_width < a->ne[2])` abort
on residual short segments.
5. DTW `audio_ctx` override guard — replaces
`WHISPER_ASSERT(n_frames <= n_audio_ctx * 2)`
with a recoverable WARN+return when callers
override `Params::set_audio_ctx` smaller than the
chunk requires.
6. DTW backtrace impossible-case throws —
replaces `WHISPER_ASSERT(0)` in the impossible
branch of the lattice state machine with a
`std::runtime_error` throw.
7. DTW `aheads_cross_QKs` invariants throw —
post-decode null / dimension checks throw rather
than abort.
8. DTW backend compute throws — checks
`ggml_backend_init_by_type` for NULL and
`ggml_backend_graph_compute` for non-success,
throwing on either.
9. DTW decode failure throws — replaces a
`WHISPER_ASSERT(0)` after a failed
`whisper_decode_internal` pass with
`std::bad_alloc`.
10. DTW `t_dtw` sentinel init — sets every text
token's `t_dtw = -1` BEFORE any skip path, so
the safe wrapper's `Option<i64>` accessor can
distinguish "DTW skipped" from "DTW computed
at audio offset 0".
Adjacent paths surfaced by the audit:
11. `ggml_init` OOM-safe context alloc (in
`ggml/src/ggml.c`) — replaces `GGML_MALLOC` +
`GGML_ASSERT(ctx->mem_buffer != NULL)` with
plain `malloc` + null-handling, so OOM returns
NULL instead of `abort()`-ing.
12. `whispercpp_ggml_init_or_throw` wrapper — every
unchecked `ggml_init` call site in whisper.cpp
(8 sites: model_load, graph builders for
conv/encoder/cross/decoder/vad, vad_init,
bench) goes through the wrapper, which throws
`std::bad_alloc` on NULL. Closes 8 SIGSEGV
paths reachable from safe Rust.
13. KV buffer null throws — replaces
`WHISPER_ASSERT(!!kv_pad.buffer)` /
`WHISPER_ASSERT(!!kv_self.buffer)` in the
encoder / decoder graph builders with throws.
14. `token_to_str` sparse-vocab no-throw — replaces
`id_to_token.at(token)` with `.find()` returning
NULL on miss. Closes a
`std::out_of_range` UB across `extern "C"` for
sparse-vocab models where `hparams.n_vocab`
exceeds the actually-loaded vocab table.
15. Hparams head divisibility check — rejects
`n_audio_state % n_audio_head != 0` and
`n_text_state % n_text_head != 0` at load time.
Closes a `GGML_ASSERT` on shape mismatch during
encoder graph build.
`whispercpp-sys/build.rs::verify_patched_source`
scans the linked submodule for every patch's
sentinel marker (29 markers across `src/whisper.cpp`
and `ggml/src/ggml.c`) and hard-fails the build if
any are missing. The shape changed from a single
flat marker list to `(file, marker)` tuples since
some patches now sit in `ggml.c` alongside the
existing `whisper.cpp` ones.
28 unit tests cover the new public API surface:
- `AlignmentHeadsPreset` enum bijection (every
variant maps to a distinct C enum value).
- Per-preset alignment head counts pinned against
whisper.cpp's `g_aheads_*` tables.
- Per-preset scratch budget pins (`LargeV2` =
278 MiB, `SmallEn` = 230 MiB, `MediumEn` =
218 MiB).
- `clamp_dtw_mem_size` boundary cases (0, MIN-1,
MIN, MIN+1, MAX-1, MAX, MAX+1, `usize::MAX`).
- `with_dtw_mem_size` setter clamping (0, max).
- `Context::new` rejection of DTW + `flash_attn`
(in two setter orders + a positive
no-effect-config case).
- `Token::t_dtw()` sentinel mapping (`-1` → `None`,
`0` → `Some(0)`).
- `SUPPORTED_DTW_N_TEXT_CTX` constant pin.
Fault-injection tests for OOM scenarios (allocator
override, model with non-divisible head dims,
sparse vocab) need crafted GGUF fixtures and
`LD_PRELOAD` malloc-fail harnesses, deferred to
follow-up infrastructure work — the structural
contracts are now correct (failures propagate
through C++ unwinding to the FFI shim, never abort
from safe Rust).
- README adds a `## DTW timestamps` section: enable-
at-construction example, per-token reader pattern,
enumerated `None` cases, rejection rules table.
- `whispercpp/TODO.md` removes the "DTW token
timestamps" entry from the deliberate-omissions
list.
- GitHub issue #6 (intentional omissions) updated to
drop the DTW section.
Both `whispercpp` and `whispercpp-sys` bumped to
0.2.0; the dependency declaration in
`whispercpp/Cargo.toml` was updated to `^0.2`.
CI's `cargo clippy --workspace --all-targets` failed under
`RUSTFLAGS=-Dwarnings` on:
```
error: this assertion has a constant value
--> whispercpp/src/context.rs:1289:5
|
1289 | assert!(MIN_DTW_MEM_SIZE <= MAX_DTW_MEM_SIZE);
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
```
Both sides are `const usize`, so the comparison is computed
at compile time. clippy's `assertions_on_constants` lint
wants a `const { ... }` block to make the compile-time
evaluation explicit.
Wrapping the assertion in `const { ... }` preserves the
semantics (the order invariant pin) and satisfies clippy
without needing `#[allow]`.
There was a problem hiding this comment.
Pull request overview
This PR adds a safe Rust wrapper surface for whisper.cpp’s DTW (Dynamic Time Warping) token-level timestamp path, alongside build-time verification that the bundled native sources include the required hardening patches.
Changes:
- Introduces DTW configuration on
ContextParams(enable flag, alignment-head preset, and bounded scratch-memory sizing) plus load-time validation for unsupported DTW configurations. - Exposes DTW-derived per-token timestamps via
Token::t_dtw() -> Option<i64>and documents the two timestamp sources (t0/t1vst_dtw). - Strengthens
whispercpp-syspatch verification to check markers across multiple native files and bumps both crates to0.2.0.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
whispercpp/TODO.md |
Removes DTW timestamps from the “intentional omissions” list now that the feature is supported. |
whispercpp/src/state.rs |
Adds t_dtw field/accessor on Token plus unit tests for projection and sentinel mapping. |
whispercpp/src/lib.rs |
Re-exports the new DTW-related API (preset enum, constants, sizing helper). |
whispercpp/src/context.rs |
Implements DTW configuration, memory sizing/clamping, and Context::new validation/rejections; adds extensive unit tests. |
whispercpp/Cargo.toml |
Bumps whispercpp version to 0.2.0 and updates whispercpp-sys dependency to 0.2. |
whispercpp-sys/Cargo.toml |
Bumps whispercpp-sys version to 0.2.0. |
whispercpp-sys/build.rs |
Updates patch-marker verification to validate markers across src/whisper.cpp and ggml/src/ggml.c. |
README.md |
Updates published crate version to 0.2 and documents DTW timestamps usage/constraints. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
The constant `MAX_DTW_MEM_SIZE = 4 * 1024 * 1024 * 1024` evaluates to `2^32`. On 32-bit targets `usize::MAX` is `2^32 - 1`, so the const-eval overflows and the crate fails to compile (`error: this arithmetic operation will overflow`). The crate's CI matrix only covers 64-bit targets (macos-latest, ubuntu-latest, windows-latest, aarch64-linux), so the issue didn't surface, but downstream users on i686 / armv7 / wasm32 etc. would hit it on first build. Split via `cfg(target_pointer_width)`: * 64-bit: 4 GiB (unchanged) — three orders of magnitude above `required_dtw_mem_size_for(LargeV2) = 278 MiB`, so a `usize::MAX` slip still saturates short of `ggml_init`'s internal arena-math overflow. * 32-bit / 16-bit: 1 GiB. Still ~3.7× the per-preset worst case, so the safety property (saturate above the realistic peak to dodge the `ggml_init` overflow) is preserved. `usize::MAX = 2^32 - 1` on 32-bit gives the cap 75% headroom under the type's max, so `MAX + 1` arithmetic in tests and `clamp_dtw_mem_size` doesn't overflow. `required_dtw_mem_size_for` already clamps its output against `MAX_DTW_MEM_SIZE`, so the per-preset minimums (all ≤ 278 MiB) fit comfortably within the smaller 32-bit cap. No test changes needed: existing tests use `MAX_DTW_MEM_SIZE - 1` and `MAX_DTW_MEM_SIZE + 1`, both of which now fit in `usize` on every supported pointer width. Found by Copilot's review on PR #7 (#7 (review)).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds the safe wrapper layer for whisper.cpp's DTW (Dynamic Time Warping) token-level timestamp path, plus the matching native-side hardening that makes the feature reachable from safe Rust without exposing abort paths under resource pressure or malformed input.
Public API
whispercpp::ContextParamsgains:with_dtw_token_timestamps(bool)— enable the DTW pass at context construction.with_dtw_aheads_preset(AlignmentHeadsPreset)— pick the alignment-head set; one variant per shipping whisper checkpoint (TinyEnthroughLargeV3Turbo).Nonedisables DTW even when the flag is on.with_dtw_mem_size(usize)— override the DTW scratch arena. Clamped to[MIN_DTW_MEM_SIZE, MAX_DTW_MEM_SIZE]and raised to the per-preset minimum fromrequired_dtw_mem_size_for(preset).whispercpp::Tokengains:t_dtw() -> Option<i64>— DTW-derived timestamp in centiseconds.Nonecovers DTW-disabled, non-text tokens, per-segment skips (audio_ctx mismatch, short-window medfilt).-1sentinel set by the native side before each DTW pass.New public constants / functions:
DEFAULT_DTW_MEM_SIZE,MIN_DTW_MEM_SIZE,MAX_DTW_MEM_SIZE— scratch arena bounds (128 MiB, 128 MiB, 4 GiB).SUPPORTED_DTW_N_TEXT_CTX(= 448) — the standard whisper text-context window the wrapper budgets for; non-standard models with largern_text_ctxare refused atContext::newwhen DTW is on.required_dtw_mem_size_for(AlignmentHeadsPreset)— per-preset scratch requirement (callers can pre-size budgets).Configuration rejections enforced at
Context::new:flash_attn(whisper.cpp silently disables DTW under flash-attn; refused explicitly so the Rust API contract isn't violated).n_text_ctx > 448(the scratch arena is sized for standard checkpoints).Usage
Native-side patches (in submodule
Findit-AI/whisper.cpp@rust)Sixteen patches harden every assertion / abort / silent-skip path reachable from safe Rust through the DTW surface, plus several adjacent paths the audit surfaced.
DTW-specific:
gctxwrapped in aunique_ptrso any throw betweenggml_initand the explicitggml_freereleases the arena via stack unwinding. Closes a ~dtw_mem_size(default 128 MiB) leak per failed decode.throw std::bad_alloc()onggml_initNULL, replacing a silent return that left everyToken::t_dtwat zero with no error signal.result_allwith a flat list of text-token pointers + bounded index walk. Closes a past-the-end iterator deref on the last token of the last segment (C++ UB).medfilt_width=7down to the largest odd value strictly less thann_audio_tokens, skipping the median filter entirely forn_audio_tokens <= 1. Closes aWHISPER_ASSERT(filter_width < a->ne[2])abort on residual short segments.audio_ctxoverride guard — replacesWHISPER_ASSERT(n_frames <= n_audio_ctx * 2)with a recoverable WARN+return when callers overrideParams::set_audio_ctxsmaller than the chunk requires.WHISPER_ASSERT(0)in the impossible branch of the lattice state machine with astd::runtime_errorthrow.aheads_cross_QKsinvariants throw — post-decode null / dimension checks throw rather than abort.ggml_backend_init_by_typefor NULL andggml_backend_graph_computefor non-success, throwing on either.WHISPER_ASSERT(0)after a failedwhisper_decode_internalpass withstd::bad_alloc.t_dtwsentinel init — sets every text token'st_dtw = -1BEFORE any skip path, so the safe wrapper'sOption<i64>accessor can distinguish "DTW skipped" from "DTW computed at audio offset 0".Adjacent paths surfaced by the audit:
ggml_initOOM-safe context alloc (inggml/src/ggml.c) — replacesGGML_MALLOC+GGML_ASSERT(ctx->mem_buffer != NULL)with plainmalloc+ null-handling, so OOM returns NULL instead ofabort()-ing.whispercpp_ggml_init_or_throwwrapper — every uncheckedggml_initcall site in whisper.cpp (8 sites:model_load, graph builders for conv/encoder/cross/decoder/vad,vad_init, bench) goes through the wrapper, which throwsstd::bad_allocon NULL. Closes 8 SIGSEGV paths reachable from safe Rust.WHISPER_ASSERT(!!kv_pad.buffer)/WHISPER_ASSERT(!!kv_self.buffer)in the encoder / decoder graph builders with throws.token_to_strsparse-vocab no-throw — replacesid_to_token.at(token)with.find()returning NULL on miss. Closes astd::out_of_rangeUB acrossextern "C"for sparse-vocab models wherehparams.n_vocabexceeds the actually-loaded vocab table.n_audio_state % n_audio_head != 0andn_text_state % n_text_head != 0at load time. Closes aGGML_ASSERTon shape mismatch during encoder graph build.Build-time verification
whispercpp-sys/build.rs::verify_patched_sourcescans the linked submodule for every patch's sentinel marker (29 markers acrosssrc/whisper.cppandggml/src/ggml.c) and hard-fails the build if any are missing. The shape changed from a single flat marker list to(file, marker)tuples since some patches now sit inggml.calongside the existingwhisper.cppones.Audit scope
A comprehensive audit of
WHISPER_ASSERT/GGML_ASSERT/GGML_ABORTsites in the linked submodule classified each as:The audit's classification is documented in the round-9 commit message in the per-round commit history (preserved on the submodule's
rustbranch). After the audit + previous patches, every assertion path reachable from safe Rust under runtime conditions is now a throw that the existing exception shim converts to a RustWhisperError. Remaining sites guard programming-error invariants.Testing
28 unit tests cover the new public API surface:
AlignmentHeadsPresetenum bijection (every variant maps to a distinct C enum value).g_aheads_*tables.LargeV2= 278 MiB,SmallEn= 230 MiB,MediumEn= 218 MiB).clamp_dtw_mem_sizeboundary cases (0, MIN-1, MIN, MIN+1, MAX-1, MAX, MAX+1,usize::MAX).with_dtw_mem_sizesetter clamping (0, max).Context::newrejection of DTW +flash_attn(in two setter orders + a positive no-effect-config case).Token::t_dtw()sentinel mapping (-1→None,0→Some(0)).SUPPORTED_DTW_N_TEXT_CTXconstant pin.Fault-injection tests for OOM scenarios (allocator override, model with non-divisible head dims, sparse vocab) need crafted GGUF fixtures and
LD_PRELOADmalloc-fail harnesses, deferred to follow-up infrastructure work — the structural contracts are now correct (failures propagate through C++ unwinding to the FFI shim, never abort from safe Rust).Documentation
## DTW timestampssection: enable-at-construction example, per-token reader pattern, enumeratedNonecases, rejection rules table.whispercpp/TODO.mdremoves the "DTW token timestamps" entry from the deliberate-omissions list.Crate version
Both
whispercppandwhispercpp-sysbumped to 0.2.0; the dependency declaration inwhispercpp/Cargo.tomlwas updated to^0.2.🤖 Generated with Claude Code