+Safe Rust bindings for [whisper.cpp][whisper-cpp] speech-to-text inference.
+
+- **Always-bundled build.** `whispercpp-sys` cmake-builds a vendored,
+ patched whisper.cpp; there is no pkg-config / system-install path.
+ The patched source lives on a fork branch with each fix as a
+ reviewable commit (see [Memory safety](#memory-safety) below).
+- **Panic-free safe surface.** Every FFI call is wrapped in a C++
+ exception-catching shim, every fallible setter returns
+ `WhisperError`, every accessor short-circuits on poisoned state.
+- **`Send + Sync`** `Context`; per-`Context` `State` is `Send`.
+ Concurrent inference is serialized through a per-`Context` mutex
+ so per-call leak budgets are structural, not documentary.
+- **Backend matrix.** Metal, CoreML, Vulkan, OpenCL, CUDA, ROCm
+ (HIP), oneAPI (SYCL), Moore Threads (MUSA), OpenVINO, OpenBLAS —
+ all opt-in via Cargo features.
+
## Installation
```toml
[dependencies]
-template_rs = "0.1"
+whispercpp = "0.1"
+```
+
+The default build is plain CPU. Opt into accelerators per-target:
+
+```toml
+# macOS Apple Silicon
+[target.'cfg(all(target_os = "macos", target_arch = "aarch64"))'.dependencies]
+whispercpp = { version = "0.1", features = ["metal", "coreml"] }
+
+# Linux + NVIDIA
+[target.'cfg(all(target_os = "linux", target_arch = "x86_64"))'.dependencies]
+whispercpp = { version = "0.1", features = ["cuda"] }
```
-## Features
-- [x] Create a Rust open-source repo fast
+## Examples
+
+A working end-to-end example lives at
+[`whispercpp/examples/smoke.rs`](whispercpp/examples/smoke.rs).
+
+## Backends
+
+All backend features chain to the matching `whispercpp-sys` feature
+which toggles the corresponding ggml / whisper CMake flag.
+
+| Feature | Backend | Platforms |
+|------------|--------------------------------------|--------------------------|
+| `metal` | Metal GPU | Apple |
+| `coreml` | CoreML / ANE encoder | Apple (with `.mlmodelc`) |
+| `vulkan` | Vulkan compute | Linux / Windows / Android / MoltenVK on macOS |
+| `opencl` | OpenCL (mobile / Adreno) | Linux / Android |
+| `cuda` | NVIDIA CUDA | Linux / Windows |
+| `hipblas` | AMD ROCm / HIP | Linux |
+| `sycl` | Intel oneAPI / Arc | Linux / Windows |
+| `musa` | Moore Threads MUSA | Linux |
+| `openvino` | Intel OpenVINO encoder | Linux / Windows |
+| `openblas` | OpenBLAS CPU | Any |
+| `serde` | `Serialize` / `Deserialize` for `Lang` (lowercase ISO-639-1) | — |
+
+GPU backends require the corresponding vendor SDK (CUDA Toolkit,
+ROCm, oneAPI, etc.) installed at link time. CI exercises the
+bundled CPU path on Linux/macOS/Windows and Metal+CoreML on macOS.
+
+## Memory safety
+
+`whisper.cpp` is a binary parser of attacker-controllable model files
+plus a substantial C++ inference path. The vendored submodule is
+pinned to our fork branch
+([`Findit-AI/whisper.cpp@rust`][fork-rust-branch]), which carries
+fixes for upstream issues reachable from safe Rust:
+
+- `whisper_kv_cache_free` made idempotent (closes a multi-decoder
+ OOM double-free of a ggml backend buffer).
+- `whisper_init_state` / `whisper_init_with_params_no_state` /
+ `whisper_vad_init_with_params` wrapped in RAII so a throw mid-init
+ releases the partial allocation rather than leaking the
+ whisper_context / whisper_state.
+- Tensor headers fully validated: `n_dims ∈ [0, 4]`, name length
+ bounded, `ttype < GGML_TYPE_COUNT`, per-dim positivity, 64-bit
+ overflow check on `nelements`.
+- Hparams validated against generous-but-bounded ranges; min
+ `n_text_ctx` enforced so the decode batch can hold the
+ worst-case prompt.
+- Special-token ids verified to fit `n_vocab` after the
+ multilingual shift (closes a corrupt-vocab OOB into `logits[]`).
+- File / buffer loaders throw on partial reads (peek-based EOF
+ detection so clean end-of-tensor-list still terminates).
+- Tensor-name set tracking rejects models that satisfy the
+ loaded-count check by repeating one name.
+- `ggml_log_set` installed once per process via `std::atomic`
+ so concurrent `create_state` + `State::full` don't race on
+ ggml's static logger globals.
+- `vocab.num_languages()` synthesis null-checks
+ `whisper_lang_str` (closes `std::string(nullptr)` UB).
+- The abort callback is wired through every sched-based graph
+ compute so cancellation interrupts the long-running encoder /
+ decoder paths, not just the gaps between them.
+
+A C++ exception-catching shim layer (`whispercpp_shim.cpp`) sits
+between the safe Rust API and every throwing entry point. The
+bindgen allowlist is enumerated symbol-by-symbol — only no-throw
+raw `whisper_*` functions are exposed; every throwing function
+goes through a `whispercpp_*` shim that catches and surfaces the
+exception class as a sentinel (`WhisperError::ConstructorLost`,
+`StateLost`, etc.).
+
+`build.rs` includes a canary that scans the linked source for the
+required patch markers and hard-fails the build if any are missing.
+
+For the design details, the per-finding analysis lives on the fork
+branch's commit history.
+
+## Crate structure
+
+| Crate | Purpose |
+|------------------|-------------------------------------------------------------------------------------------------|
+| `whispercpp` | Safe Rust API (`Context`, `State`, `Params`, `Lang`, `WhisperError`). End-user dependency. |
+| `whispercpp-sys` | Bindgen output + `build.rs` (cmake build, link directives) + the C++ exception-catching shim. |
+
+End users should depend on `whispercpp`. `whispercpp-sys` is
+re-exported as `whispercpp::sys` for callers who need a raw
+escape hatch (review every use carefully — only no-throw symbols
+are exposed but it's `unsafe` regardless).
+
+## Supported platforms
+
+CI runs on `ubuntu-latest`, `macos-latest`, and `windows-latest`.
+Sanitizer (ASan + UBSan) and Miri jobs gate the `unsafe` boundary
+on every PR. MSRV is pinned in `Cargo.toml` and enforced via
+`rust-version`.
-#### License
+## License
-`template-rs` is under the terms of both the MIT license and the
+`whispercpp` is under the terms of both the MIT license and the
Apache License (Version 2.0).
See [LICENSE-APACHE](LICENSE-APACHE), [LICENSE-MIT](LICENSE-MIT) for details.
-Copyright (c) 2021 Al Liu.
+Copyright (c) 2026 FinDIT Studio authors.
-[Github-url]: https://github.com/al8n/template-rs/
-[CI-url]: https://github.com/al8n/template-rs/actions/workflows/ci.yml
-[doc-url]: https://docs.rs/template-rs
-[crates-url]: https://crates.io/crates/template-rs
-[codecov-url]: https://app.codecov.io/gh/al8n/template-rs/
-[zh-cn-url]: https://github.com/al8n/template-rs/tree/main/README-zh_CN.md
+[whisper-cpp]: https://github.com/ggerganov/whisper.cpp
+[fork-rust-branch]: https://github.com/Findit-AI/whisper.cpp/tree/rust
+[Github-url]: https://github.com/findit-ai/whispercpp/
+[CI-url]: https://github.com/findit-ai/whispercpp/actions/workflows/ci.yml
+[doc-url]: https://docs.rs/whispercpp
+[crates-url]: https://crates.io/crates/whispercpp
+[codecov-url]: https://app.codecov.io/gh/findit-ai/whispercpp/
diff --git a/benches/foo.rs b/benches/foo.rs
deleted file mode 100644
index f328e4d..0000000
--- a/benches/foo.rs
+++ /dev/null
@@ -1 +0,0 @@
-fn main() {}
diff --git a/ci/miri_sb.sh b/ci/miri_sb.sh
deleted file mode 100755
index cc3c6e0..0000000
--- a/ci/miri_sb.sh
+++ /dev/null
@@ -1,38 +0,0 @@
-#!/bin/bash
-set -e
-
-if [ -z "$1" ]; then
- echo "Error: TARGET is not provided"
- exit 1
-fi
-
-TARGET="$1"
-
-# Install cross-compilation toolchain on Linux
-if [ "$(uname)" = "Linux" ]; then
- case "$TARGET" in
- aarch64-unknown-linux-gnu)
- sudo apt-get update && sudo apt-get install -y gcc-aarch64-linux-gnu
- ;;
- i686-unknown-linux-gnu)
- sudo apt-get update && sudo apt-get install -y gcc-multilib
- ;;
- powerpc64-unknown-linux-gnu)
- sudo apt-get update && sudo apt-get install -y gcc-powerpc64-linux-gnu
- ;;
- s390x-unknown-linux-gnu)
- sudo apt-get update && sudo apt-get install -y gcc-s390x-linux-gnu
- ;;
- riscv64gc-unknown-linux-gnu)
- sudo apt-get update && sudo apt-get install -y gcc-riscv64-linux-gnu
- ;;
- esac
-fi
-
-rustup toolchain install nightly --component miri
-rustup override set nightly
-cargo miri setup
-
-export MIRIFLAGS="-Zmiri-strict-provenance -Zmiri-disable-isolation -Zmiri-symbolic-alignment-check"
-
-cargo miri test --all-targets --target "$TARGET"
diff --git a/ci/miri_tb.sh b/ci/miri_tb.sh
deleted file mode 100755
index 5d374c7..0000000
--- a/ci/miri_tb.sh
+++ /dev/null
@@ -1,38 +0,0 @@
-#!/bin/bash
-set -e
-
-if [ -z "$1" ]; then
- echo "Error: TARGET is not provided"
- exit 1
-fi
-
-TARGET="$1"
-
-# Install cross-compilation toolchain on Linux
-if [ "$(uname)" = "Linux" ]; then
- case "$TARGET" in
- aarch64-unknown-linux-gnu)
- sudo apt-get update && sudo apt-get install -y gcc-aarch64-linux-gnu
- ;;
- i686-unknown-linux-gnu)
- sudo apt-get update && sudo apt-get install -y gcc-multilib
- ;;
- powerpc64-unknown-linux-gnu)
- sudo apt-get update && sudo apt-get install -y gcc-powerpc64-linux-gnu
- ;;
- s390x-unknown-linux-gnu)
- sudo apt-get update && sudo apt-get install -y gcc-s390x-linux-gnu
- ;;
- riscv64gc-unknown-linux-gnu)
- sudo apt-get update && sudo apt-get install -y gcc-riscv64-linux-gnu
- ;;
- esac
-fi
-
-rustup toolchain install nightly --component miri
-rustup override set nightly
-cargo miri setup
-
-export MIRIFLAGS="-Zmiri-strict-provenance -Zmiri-disable-isolation -Zmiri-symbolic-alignment-check -Zmiri-tree-borrows"
-
-cargo miri test --all-targets --target "$TARGET"
diff --git a/ci/sanitizer.sh b/ci/sanitizer.sh
deleted file mode 100755
index 4ff6819..0000000
--- a/ci/sanitizer.sh
+++ /dev/null
@@ -1,22 +0,0 @@
-#!/bin/bash
-set -ex
-
-export ASAN_OPTIONS="detect_odr_violation=0 detect_leaks=0"
-
-TARGET="x86_64-unknown-linux-gnu"
-
-# Run address sanitizer
-RUSTFLAGS="-Z sanitizer=address" \
-cargo test --tests --target "$TARGET" --all-features
-
-# Run leak sanitizer
-RUSTFLAGS="-Z sanitizer=leak" \
-cargo test --tests --target "$TARGET" --all-features
-
-# Run memory sanitizer (requires -Zbuild-std for instrumented std)
-RUSTFLAGS="-Z sanitizer=memory" \
-cargo -Zbuild-std test --tests --target "$TARGET" --all-features
-
-# Run thread sanitizer (requires -Zbuild-std for instrumented std)
-RUSTFLAGS="-Z sanitizer=thread" \
-cargo -Zbuild-std test --tests --target "$TARGET" --all-features
diff --git a/examples/foo.rs b/examples/foo.rs
deleted file mode 100644
index f328e4d..0000000
--- a/examples/foo.rs
+++ /dev/null
@@ -1 +0,0 @@
-fn main() {}
diff --git a/src/lib.rs b/src/lib.rs
deleted file mode 100644
index 0a58390..0000000
--- a/src/lib.rs
+++ /dev/null
@@ -1,11 +0,0 @@
-//! A template for creating Rust open-source repo on GitHub
-#![cfg_attr(not(feature = "std"), no_std)]
-#![cfg_attr(docsrs, feature(doc_cfg))]
-#![cfg_attr(docsrs, allow(unused_attributes))]
-#![deny(missing_docs)]
-
-#[cfg(all(not(feature = "std"), feature = "alloc"))]
-extern crate alloc as std;
-
-#[cfg(feature = "std")]
-extern crate std;
diff --git a/tests/foo.rs b/tests/foo.rs
deleted file mode 100644
index 8b13789..0000000
--- a/tests/foo.rs
+++ /dev/null
@@ -1 +0,0 @@
-
diff --git a/whispercpp-sys/.gitignore b/whispercpp-sys/.gitignore
new file mode 100644
index 0000000..e998183
--- /dev/null
+++ b/whispercpp-sys/.gitignore
@@ -0,0 +1,2 @@
+# cmake-rs build output for whisper.cpp.
+target/
diff --git a/whispercpp-sys/Cargo.toml b/whispercpp-sys/Cargo.toml
new file mode 100644
index 0000000..eedc6e2
--- /dev/null
+++ b/whispercpp-sys/Cargo.toml
@@ -0,0 +1,67 @@
+[package]
+name = "whispercpp-sys"
+version = "0.1.0"
+edition.workspace = true
+rust-version.workspace = true
+license.workspace = true
+repository.workspace = true
+readme = "README.md"
+description = "Low-level Rust FFI bindings to whisper.cpp. Cmake-builds a patched fork that closes upstream OOM / UB / leak hazards reachable from safe Rust."
+keywords = ["whisper", "ffi", "bindings", "asr", "speech-to-text"]
+categories = ["external-ffi-bindings", "multimedia::audio", "science"]
+links = "whisper"
+
+[lib]
+name = "whispercpp_sys"
+path = "src/lib.rs"
+
+[features]
+# Default to a CPU build — no GPU dep pulled. The `whispercpp`
+# safe wrapper turns Metal/CoreML on by default via its own
+# feature defaults; this crate stays minimal so servers / CI
+# runners that build `--no-default-features` get a fast
+# plain-CPU compile.
+#
+# whisper.cpp is **always** built from the vendored submodule
+# (`whisper.cpp/`, pinned to a patched fork branch). There is
+# no pkg-config / system-install path: routing safe Rust code
+# through a stock libwhisper would silently lose the
+# memory-safety guarantees the bundled patches provide.
+default = []
+
+# Backend feature flags map 1:1 to whisper.cpp/ggml CMake
+# options. Each feature toggles the matching `-DGGML_*=ON`
+# (or `-DWHISPER_*=ON`) flag plus the link directives for the
+# static library cmake produces and any system framework /
+# library it depends on.
+#
+# Apple-only:
+metal = [] # GGML_METAL: Metal GPU encoder/decoder
+coreml = [] # WHISPER_COREML: CoreML encoder dispatch (ANE)
+# Cross-platform GPU:
+vulkan = [] # GGML_VULKAN: Vulkan compute (Linux/Win/Android)
+opencl = [] # GGML_OPENCL: OpenCL (mobile GPUs, Adreno)
+# Vendor-specific GPU:
+cuda = [] # GGML_CUDA: NVIDIA CUDA
+hipblas = [] # GGML_HIP: AMD ROCm/HIP (formerly GGML_HIPBLAS)
+sycl = [] # GGML_SYCL: Intel oneAPI / Arc GPUs
+musa = [] # GGML_MUSA: Moore Threads MUSA
+# Encoder accelerators (similar role to CoreML, but other vendors):
+openvino = [] # WHISPER_OPENVINO: Intel OpenVINO encoder
+# CPU-side BLAS:
+openblas = [] # GGML_BLAS=ON, GGML_BLAS_VENDOR=OpenBLAS
+
+[build-dependencies]
+# `cmake` drives the bundled whisper.cpp build.
+cmake = "0.1"
+# `bindgen` generates Rust FFI for whisper.h + the shim
+# header. Output lands in `OUT_DIR/generated.rs` so the
+# source tree stays read-only for cargo vendor / Nix builds.
+bindgen = "0.72"
+# `cc` compiles `whispercpp_shim.cpp` into a tiny static lib
+# that catches C++ exceptions before they unwind across the
+# `extern "C"` boundary into Rust.
+cc = "1"
+
+[lints]
+workspace = true
diff --git a/whispercpp-sys/README.md b/whispercpp-sys/README.md
new file mode 120000
index 0000000..32d46ee
--- /dev/null
+++ b/whispercpp-sys/README.md
@@ -0,0 +1 @@
+../README.md
\ No newline at end of file
diff --git a/whispercpp-sys/build.rs b/whispercpp-sys/build.rs
new file mode 100644
index 0000000..37a3088
--- /dev/null
+++ b/whispercpp-sys/build.rs
@@ -0,0 +1,553 @@
+//! Build script for the whisper.cpp FFI bindings.
+//!
+//! Compiles the vendored `whisper.cpp/` git submodule via
+//! cmake-rs, links static. Feature flags translate to
+//! `-DGGML_METAL=ON` etc. Output is a static `libwhisper.a`
+//! plus the ggml satellite libraries that whisper.cpp's
+//! CMakeLists produces.
+//!
+//! There is no pkg-config / system-install path: the bundled
+//! source is patched in `OUT_DIR/whisper-src/` to close
+//! several upstream memory-safety bugs,
+//! and routing safe-Rust code through a stock libwhisper
+//! would silently drop those guarantees.
+//!
+//! Bindgen runs against the resolved header set so the Rust
+//! FFI matches the linked library's ABI. Output goes to
+//! `OUT_DIR/generated.rs` (— must NOT mutate
+//! the source tree).
+//!
+//! Bootstrap behaviour: when the submodule is missing this
+//! script emits clear `cargo:warning=`s rather than panicking,
+//! so `cargo check` still resolves the API. The actual link
+//! step fails downstream, by design.
+
+use std::{
+ env,
+ path::{Path, PathBuf},
+};
+
+fn main() {
+ println!("cargo:rerun-if-changed=build.rs");
+ println!("cargo:rerun-if-changed=wrapper.h");
+
+ bundled_build();
+}
+
+// ─── Bundled path ────────────────────────────────────────────
+
+fn bundled_build() {
+ let crate_dir = PathBuf::from(env!("CARGO_MANIFEST_DIR"));
+ // The vendored submodule pinned via `.gitmodules` to the
+ // `Findit-AI/whisper.cpp` fork's `rust` branch — which
+ // carries our memory-safety patches as committed history
+ // — is the SOLE source of truth for the whisper.cpp build.
+ //
+ // No `WHISPER_CPP_DIR` override: the Rust safety surface
+ // (e.g. `State::full`'s free-on-sentinel path) relies on
+ // the fork's idempotent `whisper_kv_cache_free` and other
+ // patches being present in the linked binary. A pristine
+ // upstream checkout shares the same ABI but lacks those
+ // patches, so an env-var override would silently
+ // reintroduce the double-free / use-after-free class the
+ // wrapper closes. Users who need a different source must
+ // edit `.gitmodules` (reviewable) rather than flip an env
+ // var.
+ let whisper_src = crate_dir.join("whisper.cpp");
+
+ if !whisper_src.join("CMakeLists.txt").is_file() {
+ println!(
+ "cargo:warning=whisper.cpp source not found at {:?}.",
+ whisper_src
+ );
+ println!("cargo:warning=Run `git submodule update --init --recursive` from the repo root.");
+ println!(
+ "cargo:warning=Skipping cmake + bindgen for now; link step will fail until the source is available."
+ );
+ return;
+ }
+
+ // Verify the linked source carries our patches. Cheap
+ // canary: scan for any sentinel comment from the patch
+ // set. If absent, the build hard-fails — Rust safety
+ // assumptions in the wrapper depend on this.
+ verify_patched_source(&whisper_src);
+
+ // Tell cargo to rerun build.rs when files in the submodule
+ // change so `git submodule update` picks up automatically.
+ for top in ["CMakeLists.txt", "cmake", "include", "src", "ggml"] {
+ let p = whisper_src.join(top);
+ if p.exists() {
+ println!("cargo:rerun-if-changed={}", p.display());
+ }
+ }
+
+ let dst = build_whisper_cpp(&whisper_src);
+ let bundled_includes = vec![
+ whisper_src.join("include"),
+ whisper_src.join("ggml").join("include"),
+ ];
+ // Build the shim BEFORE emitting whisper.cpp's link
+ // directives. GNU ld resolves left-to-right; the shim
+ // depends on `whisper_*` symbols so it must appear first
+ // in the link list. cc::Build emits its `link-lib` line
+ // immediately on `compile`.
+ build_shim(&bundled_includes);
+ emit_bundled_link_directives(&dst);
+ let bundled_args: Vec = bundled_includes
+ .iter()
+ .map(|p| format!("-I{}", p.display()))
+ .collect();
+ generate_bindings_with_args(&bundled_args);
+}
+
+/// Hard-fail the build if the linked whisper.cpp source is
+/// missing the rust-branch patch set. The Rust wrapper's
+/// memory-safety guarantees (e.g. `State::full`'s
+/// free-on-sentinel path in relying on 's
+/// idempotent `whisper_kv_cache_free`) are unsound against a
+/// pristine upstream tree even though the ABI is identical.
+///
+/// Strategy: scan `src/whisper.cpp` for one or more sentinel
+/// comments inserted by the rust-branch patches. If any
+/// expected marker is missing the build refuses to proceed.
+///
+/// This catches both `git submodule update` against unpatched
+/// upstream AND someone manually replacing the submodule with
+/// a different tree.
+fn verify_patched_source(whisper_src: &Path) {
+ let target = whisper_src.join("src").join("whisper.cpp");
+ let body = match std::fs::read_to_string(&target) {
+ Ok(b) => b,
+ Err(e) => panic!(
+ "whispercpp-sys: failed to read {} for patch verification: {e}",
+ target.display()
+ ),
+ };
+
+ // Sentinels chosen from the highest-leverage patches —
+ // the ones whose absence would re-introduce the
+ // double-free / null-deref / leak hazards the Rust
+ // wrapper assumes are closed.
+ const REQUIRED_MARKERS: &[&str] = &[
+ "whispercpp-sys: kv_cache_free idempotent fix",
+ "whispercpp-sys: read_safe zero-init",
+ "whispercpp-sys: init_state RAII entry",
+ "whispercpp-sys: init_context RAII entry",
+ "whispercpp-sys: tensor header validation (model_load)",
+ "whispercpp-sys: ggml_log_set once-per-process",
+ "whispercpp-sys: hparams validation",
+ "whispercpp-sys: lang_str null guard",
+ "whispercpp-sys: special-token bounds check",
+ "whispercpp-sys: path_model assignment guard",
+ "whispercpp-sys: sched abort callback wiring",
+ "whispercpp-sys: vad_init RAII guard",
+ ];
+
+ let missing: Vec<&str> = REQUIRED_MARKERS
+ .iter()
+ .copied()
+ .filter(|m| !body.contains(m))
+ .collect();
+
+ if !missing.is_empty() {
+ panic!(
+ "whispercpp-sys: the linked whisper.cpp source at {} is missing the rust-branch patches \
+ (required marker{} absent: {:?}).\n\n\
+ The Rust safety surface depends on these patches; building against unpatched upstream \
+ reintroduces multi-decoder double-free / use-after-free / null-deref classes.\n\n\
+ Fix: ensure the submodule tracks `Findit-AI/whisper.cpp` branch `rust`. Run\n \
+ git submodule update --init --recursive\n\
+ from the repo root. If you intentionally pointed at a different source, add equivalent \
+ patches and the matching marker comments before retrying.",
+ target.display(),
+ if missing.len() == 1 { "" } else { "s" },
+ missing,
+ );
+ }
+}
+
+/// Compile `whispercpp_shim.cpp` into a `libwhispercpp_shim.a`
+/// staticlib in `OUT_DIR`, and emit the link directive for it.
+///
+/// The shim catches C++ exceptions inside whisper.cpp so they
+/// can't unwind across `extern "C"` into Rust. It must be
+/// linked BEFORE the whisper static libs in the GNU ld
+/// dependency chain so the shim's references to `whisper_*`
+/// resolve.
+fn build_shim(include_paths: &[PathBuf]) {
+ let crate_dir = PathBuf::from(env!("CARGO_MANIFEST_DIR"));
+ let mut build = cc::Build::new();
+ build
+ .cpp(true)
+ .file(crate_dir.join("whispercpp_shim.cpp"))
+ .flag_if_supported("-std=c++17")
+ .flag_if_supported("/std:c++17");
+ for inc in include_paths {
+ build.include(inc);
+ }
+ // `cc::Build::compile` emits `cargo:rustc-link-lib=static=...`
+ // and `cargo:rustc-link-search=native=...` automatically.
+ build.compile("whispercpp_shim");
+ // Tell cargo to rerun the shim build when the source files
+ // change. (cc doesn't do this for us.)
+ println!("cargo:rerun-if-changed=whispercpp_shim.cpp");
+ println!("cargo:rerun-if-changed=whispercpp_shim.h");
+}
+
+/// Drive the cmake build. Returns the install root cmake-rs
+/// produced (typically `OUT_DIR/`).
+fn build_whisper_cpp(whisper_src: &PathBuf) -> PathBuf {
+ let mut cfg = cmake::Config::new(whisper_src);
+ cfg
+ .define("BUILD_SHARED_LIBS", "OFF")
+ .define("WHISPER_BUILD_EXAMPLES", "OFF")
+ .define("WHISPER_BUILD_TESTS", "OFF")
+ .define("WHISPER_BUILD_SERVER", "OFF")
+ // Force OpenMP off. ggml's CMake auto-
+ // detects OpenMP; if the host has it (Linux + libgomp,
+ // macOS + brew libomp, etc.) it links against the
+ // OpenMP runtime which our `cargo:rustc-link-lib=` set
+ // doesn't emit, producing platform-specific link
+ // surprises. The wrapper also caps `n_threads = 1`, so
+ // OpenMP can't help anyway. Explicit OFF makes the
+ // bundled build deterministic across runners.
+ .define("GGML_OPENMP", "OFF")
+ // ggml fast-math + Apple Accelerate / OpenBLAS are decided
+ // per-feature below.
+ .profile("Release");
+
+ if cfg!(feature = "metal") {
+ cfg.define("GGML_METAL", "ON");
+ cfg.define("GGML_METAL_NDEBUG", "ON");
+ // Embed the metal shader library bytes into libggml-metal.a
+ // so the runtime doesn't need a sibling `default.metallib`.
+ cfg.define("GGML_METAL_EMBED_LIBRARY", "ON");
+ } else {
+ cfg.define("GGML_METAL", "OFF");
+ }
+
+ if cfg!(feature = "coreml") {
+ cfg.define("WHISPER_COREML", "ON");
+ // Enable the post-init fallback: if the `.mlmodelc`
+ // companion is missing at runtime, fall back to the GGML
+ // encoder rather than aborting. This is what whisper-cli
+ // does by default.
+ cfg.define("WHISPER_COREML_ALLOW_FALLBACK", "ON");
+ }
+
+ if cfg!(feature = "openblas") {
+ cfg.define("GGML_BLAS", "ON");
+ cfg.define("GGML_BLAS_VENDOR", "OpenBLAS");
+ } else if cfg!(target_vendor = "apple") && !cfg!(feature = "metal") {
+ // Apple CPU build: prefer the system Accelerate framework.
+ cfg.define("GGML_BLAS", "ON");
+ cfg.define("GGML_BLAS_VENDOR", "Apple");
+ }
+
+ // ── Vendor-specific GPU backends ────────────────────────
+ // Each `-DGGML_*=ON` triggers cmake's matching find_package
+ // / FetchContent for the SDK (CUDA Toolkit, ROCm, oneAPI,
+ // etc.). The user is expected to have the SDK installed; we
+ // don't auto-fetch.
+ if cfg!(feature = "cuda") {
+ cfg.define("GGML_CUDA", "ON");
+ }
+ if cfg!(feature = "hipblas") {
+ // Renamed `GGML_HIPBLAS` → `GGML_HIP` upstream around
+ // ggml 0.10. We keep the Rust feature name `hipblas` to
+ // match the convention whisper-rs / llama-cpp-rs adopted
+ // before the upstream rename.
+ cfg.define("GGML_HIP", "ON");
+ }
+ if cfg!(feature = "sycl") {
+ cfg.define("GGML_SYCL", "ON");
+ }
+ if cfg!(feature = "musa") {
+ cfg.define("GGML_MUSA", "ON");
+ }
+
+ // ── Cross-platform GPU ─────────────────────────────────
+ if cfg!(feature = "vulkan") {
+ cfg.define("GGML_VULKAN", "ON");
+ }
+ if cfg!(feature = "opencl") {
+ cfg.define("GGML_OPENCL", "ON");
+ }
+
+ // ── Encoder accelerators ───────────────────────────────
+ if cfg!(feature = "openvino") {
+ cfg.define("WHISPER_OPENVINO", "ON");
+ }
+
+ cfg.build()
+}
+
+/// Tell cargo which static libraries to link, in the right
+/// order for the GNU/macos/MSVC linkers. cmake-rs's `build`
+/// returns `/`, with libs under `lib/`.
+fn emit_bundled_link_directives(install_root: &Path) {
+ let lib_dir = install_root.join("lib");
+ println!("cargo:rustc-link-search=native={}", lib_dir.display());
+
+ // Order matters for GNU ld: depending libs first, low-level
+ // last. whisper depends on ggml; ggml's metal/blas/coreml
+ // sub-libs are leaves.
+ println!("cargo:rustc-link-lib=static=whisper");
+ println!("cargo:rustc-link-lib=static=ggml");
+ println!("cargo:rustc-link-lib=static=ggml-base");
+ println!("cargo:rustc-link-lib=static=ggml-cpu");
+
+ // On Apple Silicon, whisper.cpp's CMake also builds the
+ // ggml-blas backend automatically (the BLAS-via-Accelerate
+ // path), even when Metal is the primary backend. We link it
+ // unconditionally on Apple targets so the resulting binary
+ // resolves `ggml_backend_blas_reg`.
+ if cfg!(target_vendor = "apple") {
+ println!("cargo:rustc-link-lib=static=ggml-blas");
+ println!("cargo:rustc-link-lib=framework=Accelerate");
+ }
+ if cfg!(feature = "metal") {
+ println!("cargo:rustc-link-lib=static=ggml-metal");
+ println!("cargo:rustc-link-lib=framework=Metal");
+ println!("cargo:rustc-link-lib=framework=MetalKit");
+ println!("cargo:rustc-link-lib=framework=Foundation");
+ }
+ if cfg!(feature = "coreml") {
+ println!("cargo:rustc-link-lib=static=whisper.coreml");
+ println!("cargo:rustc-link-lib=framework=CoreML");
+ }
+ if cfg!(feature = "openblas") {
+ println!("cargo:rustc-link-lib=dylib=openblas");
+ }
+
+ // ── CUDA ───────────────────────────────────────────────
+ // cmake produces `libggml-cuda.a`; the runtime resolves
+ // CUDA Toolkit symbols via `cudart`/`cublas` dylibs in
+ // `$CUDA_PATH/lib64` (Linux) or `\lib\x64` (Windows). The
+ // user must have the CUDA Toolkit installed; we don't ship
+ // it. `cargo:rustc-link-search` is left to the system
+ // default — `LD_LIBRARY_PATH` / Windows `PATH` covers it.
+ if cfg!(feature = "cuda") {
+ println!("cargo:rustc-link-lib=static=ggml-cuda");
+ println!("cargo:rustc-link-lib=dylib=cudart");
+ println!("cargo:rustc-link-lib=dylib=cublas");
+ println!("cargo:rustc-link-lib=dylib=cublasLt");
+ }
+
+ // ── ROCm / HIP (AMD) ───────────────────────────────────
+ if cfg!(feature = "hipblas") {
+ println!("cargo:rustc-link-lib=static=ggml-hip");
+ println!("cargo:rustc-link-lib=dylib=amdhip64");
+ println!("cargo:rustc-link-lib=dylib=hipblas");
+ println!("cargo:rustc-link-lib=dylib=rocblas");
+ }
+
+ // ── Intel SYCL / oneAPI ────────────────────────────────
+ if cfg!(feature = "sycl") {
+ println!("cargo:rustc-link-lib=static=ggml-sycl");
+ println!("cargo:rustc-link-lib=dylib=sycl");
+ println!("cargo:rustc-link-lib=dylib=OpenCL");
+ println!("cargo:rustc-link-lib=dylib=mkl_sycl");
+ println!("cargo:rustc-link-lib=dylib=mkl_intel_ilp64");
+ println!("cargo:rustc-link-lib=dylib=mkl_tbb_thread");
+ println!("cargo:rustc-link-lib=dylib=mkl_core");
+ }
+
+ // ── Moore Threads MUSA ─────────────────────────────────
+ if cfg!(feature = "musa") {
+ println!("cargo:rustc-link-lib=static=ggml-musa");
+ println!("cargo:rustc-link-lib=dylib=musa");
+ println!("cargo:rustc-link-lib=dylib=musart");
+ println!("cargo:rustc-link-lib=dylib=mublas");
+ }
+
+ // ── Vulkan (cross-platform GPU) ────────────────────────
+ if cfg!(feature = "vulkan") {
+ println!("cargo:rustc-link-lib=static=ggml-vulkan");
+ if cfg!(target_os = "macos") {
+ // MoltenVK ships a `vulkan` dylib that translates to Metal.
+ println!("cargo:rustc-link-lib=dylib=vulkan");
+ } else if cfg!(target_os = "windows") {
+ println!("cargo:rustc-link-lib=dylib=vulkan-1");
+ } else {
+ println!("cargo:rustc-link-lib=dylib=vulkan");
+ }
+ }
+
+ // ── OpenCL (mobile GPUs / Adreno) ──────────────────────
+ if cfg!(feature = "opencl") {
+ println!("cargo:rustc-link-lib=static=ggml-opencl");
+ if cfg!(target_os = "macos") {
+ println!("cargo:rustc-link-lib=framework=OpenCL");
+ } else {
+ println!("cargo:rustc-link-lib=dylib=OpenCL");
+ }
+ }
+
+ // ── OpenVINO (Intel encoder accelerator) ───────────────
+ if cfg!(feature = "openvino") {
+ println!("cargo:rustc-link-lib=static=whisper.openvino");
+ println!("cargo:rustc-link-lib=dylib=openvino");
+ println!("cargo:rustc-link-lib=dylib=openvino_c");
+ }
+
+ // C++ stdlib — whisper.cpp / ggml are C++.
+ if cfg!(target_os = "macos") {
+ println!("cargo:rustc-link-lib=dylib=c++");
+ } else if cfg!(target_os = "linux") {
+ println!("cargo:rustc-link-lib=dylib=stdc++");
+ }
+}
+
+// ─── Bindgen ─────────────────────────────────────────────────
+
+/// Run bindgen against a curated `wrapper.h` and write the
+/// result to `$OUT_DIR/generated.rs`.
+///
+/// **Why OUT_DIR, not in-tree.** flagged the
+/// previous in-tree path (`src/generated.rs`) as breaking
+/// read-only builds — cargo's standard `vendor` workflow,
+/// Nix-style fixed-output derivations, Bazel sandboxes, and
+/// verified-source registry checkouts all forbid build.rs
+/// from mutating the source tree. Per cargo's contract, every
+/// build.rs side-effect goes under `OUT_DIR`. The
+/// `include!` glue lives in `src/lib.rs`.
+///
+/// Trade-off: the FFI surface is no longer grep-able from a
+/// fresh checkout. Inspect via `cargo expand
+/// -p whispercpp-sys` or look at
+/// `target///build/whispercpp-sys-/out/generated.rs`
+/// after a build.
+fn generate_bindings_with_args(clang_args: &[String]) {
+ let crate_dir = PathBuf::from(env!("CARGO_MANIFEST_DIR"));
+ let header = crate_dir.join("wrapper.h");
+
+ let mut builder = bindgen::Builder::default().header(header.to_string_lossy().to_string());
+ for arg in clang_args {
+ builder = builder.clang_arg(arg);
+ }
+ let bindings = builder
+ // Only the symbols the safe wrapper actually consumes.
+ // narrowed this from `whisper_.*` because
+ // the broad allowlist exposed unshimmed throwing C++
+ // entry points (e.g. `whisper_vad_init_*` whose file
+ // loaders throw `std::runtime_error` on truncated
+ // models, and `whisper_full_with_state` whose
+ // exceptions cross `extern "C"` into Rust as UB). New
+ // raw symbols need an explicit allowlist add and a
+ // matching audit: confirm the upstream function cannot
+ // throw, OR add a `whispercpp_*` shim wrapping it in
+ // try/catch.
+ //
+ // No-throw raw entry points (verified):
+ // - `*_default_params` — value-returning
+ // - `*_free`, `*_free_state` — destructors
+ // - `*_n_*`, `*_token_*`,
+ // `*_is_multilingual`,
+ // `*_lang_str`,
+ // `*_model_type_readable`,
+ // `*_full_get_*_from_state` — pure read accessors
+ // - `*_token_to_str` — would throw via
+ // `id_to_token.at` but the safe wrapper validates
+ // the bound first.
+ //
+ // Throwing entry points routed through `whispercpp_*`
+ // shims:
+ // - `whisper_init_from_file_with_params_no_state` →
+ // `whispercpp_init_from_file_no_state`
+ // - `whisper_init_state` →
+ // `whispercpp_init_state`
+ // - `whisper_full_with_state` →
+ // `whispercpp_full_with_state`
+ // - `whisper_print_system_info` →
+ // `whispercpp_print_system_info`
+ //
+ // VAD entry points (`whisper_vad_*`) are NOT exposed —
+ // the safe wrapper doesn't surface VAD, and their file
+ // loaders throw on truncated models.
+ .allowlist_function("whisper_context_default_params")
+ .allowlist_function("whisper_full_default_params")
+ .allowlist_function("whisper_free")
+ .allowlist_function("whisper_free_state")
+ .allowlist_function("whisper_is_multilingual")
+ .allowlist_function("whisper_n_vocab")
+ .allowlist_function("whisper_n_audio_ctx")
+ .allowlist_function("whisper_n_text_ctx")
+ .allowlist_function("whisper_token_eot")
+ .allowlist_function("whisper_token_sot")
+ .allowlist_function("whisper_token_beg")
+ .allowlist_function("whisper_token_to_str")
+ .allowlist_function("whisper_lang_str")
+ .allowlist_function("whisper_model_type_readable")
+ .allowlist_function("whisper_full_n_segments_from_state")
+ .allowlist_function("whisper_full_lang_id_from_state")
+ .allowlist_function("whisper_full_get_segment_t0_from_state")
+ .allowlist_function("whisper_full_get_segment_t1_from_state")
+ .allowlist_function("whisper_full_get_segment_text_from_state")
+ .allowlist_function("whisper_full_get_segment_no_speech_prob_from_state")
+ .allowlist_function("whisper_full_get_segment_speaker_turn_next_from_state")
+ .allowlist_function("whisper_full_n_tokens_from_state")
+ .allowlist_function("whisper_full_get_token_data_from_state")
+ // ggml's logger setter is referenced from our context
+ // init lock comment but not directly called. We expose
+ // the whole ggml_log_* family for diagnostic use.
+ .allowlist_function("ggml_log_.*")
+ // Shim entry points — no-throw at the boundary.
+ .allowlist_function("whispercpp_.*")
+ // Type allowlist: every struct / enum the function
+ // signatures above transitively require.
+ .allowlist_type("whisper_context")
+ .allowlist_type("whisper_state")
+ .allowlist_type("whisper_context_params")
+ .allowlist_type("whisper_full_params")
+ .allowlist_type("whisper_token")
+ .allowlist_type("whisper_token_data")
+ .allowlist_type("whisper_pos")
+ .allowlist_type("whisper_seq_id")
+ .allowlist_type("whisper_sampling_strategy")
+ .allowlist_type("whisper_grammar_element")
+ .allowlist_type("whisper_segment")
+ .allowlist_type("whisper_progress_callback")
+ .allowlist_type("whisper_new_segment_callback")
+ .allowlist_type("whisper_encoder_begin_callback")
+ .allowlist_type("whisper_logits_filter_callback")
+ .allowlist_type("ggml_log_.*")
+ .allowlist_var("WHISPER_.*")
+ // Shim exception sentinels (WHISPERCPP_ERR_*). state.rs
+ // needs them to discriminate "shim caught a C++ exception
+ // → state may be corrupt → poison" from "whisper.cpp
+ // returned a documented error code".
+ .allowlist_var("WHISPERCPP_.*")
+ // CargoCallbacks calls
+ // `println!("cargo:rerun-if-changed=...")` for every
+ // header bindgen pulled. Those land under whisper.cpp/...
+ // (or the system include path) so we DO want them — a
+ // header change should re-bindgen.
+ .parse_callbacks(Box::new(bindgen::CargoCallbacks::new()))
+ .layout_tests(false)
+ .derive_default(true)
+ .derive_debug(true)
+ .generate()
+ .expect("bindgen failed");
+
+ let out_dir = PathBuf::from(env::var("OUT_DIR").expect("OUT_DIR not set"));
+ let dest = out_dir.join("generated.rs");
+ let body = bindings.to_string();
+ let header_comment = format!(
+ "// @generated\n\
+ //\n\
+ // whisper.cpp FFI surface — produced by bindgen against\n\
+ // the bundled submodule (`whispercpp-sys/whisper.cpp/`),\n\
+ // patched in OUT_DIR. Do not edit by hand.\n\
+ //\n\
+ // Source crate: {pkg} {ver}\n\
+ // Source header: wrapper.h -> whisper.h + whispercpp_shim.h\n\
+ //\n\n",
+ pkg = env!("CARGO_PKG_NAME"),
+ ver = env!("CARGO_PKG_VERSION"),
+ );
+
+ let new_contents = format!("{header_comment}{body}");
+ std::fs::write(&dest, new_contents).expect("failed to write OUT_DIR/generated.rs");
+}
diff --git a/whispercpp-sys/src/lib.rs b/whispercpp-sys/src/lib.rs
new file mode 100644
index 0000000..1342b68
--- /dev/null
+++ b/whispercpp-sys/src/lib.rs
@@ -0,0 +1,33 @@
+//! `whispercpp-sys` — raw FFI bindings to whisper.cpp.
+//!
+//! Everything below is `unsafe`-callable C ABI surface. Higher
+//! layers (the `whispercpp` crate) wrap these in safe types;
+//! end users should depend on `whispercpp` rather than this
+//! crate directly.
+//!
+//! `build.rs` cmake-builds the vendored `whisper.cpp/` submodule
+//! (pinned to a patched fork branch) and statically links the
+//! resulting libraries. There is no pkg-config / system-install
+//! path: the safe surface in the upper crate depends on patches
+//! that only the bundled build supplies, and a stock libwhisper
+//! would silently lose those guarantees. Bindgen writes the FFI
+//! surface to `OUT_DIR/generated.rs`.
+
+#![allow(unsafe_code)]
+#![allow(non_camel_case_types)]
+#![allow(non_snake_case)]
+#![allow(non_upper_case_globals)]
+#![allow(dead_code)]
+#![allow(missing_docs)]
+
+// Bindgen output is written to `OUT_DIR` by build.rs and
+// `include!`'d here. An in-tree path (`src/generated.rs`)
+// would break read-only builds (cargo vendor, Nix, Bazel,
+// verified-source registry checkouts) and could race across
+// builds with different feature sets.
+//
+// Trade-off: the FFI surface is no longer grep-able from a
+// fresh checkout. Inspect via `cargo expand -p whispercpp-sys`
+// or look at `target/.../build/whispercpp-sys-*/out/generated.rs`
+// after a build.
+include!(concat!(env!("OUT_DIR"), "/generated.rs"));
diff --git a/whispercpp-sys/whisper.cpp b/whispercpp-sys/whisper.cpp
new file mode 160000
index 0000000..9c4881d
--- /dev/null
+++ b/whispercpp-sys/whisper.cpp
@@ -0,0 +1 @@
+Subproject commit 9c4881d8f5cd2224224e46ec8d012cce348be39d
diff --git a/whispercpp-sys/whispercpp_shim.cpp b/whispercpp-sys/whispercpp_shim.cpp
new file mode 100644
index 0000000..213eefc
--- /dev/null
+++ b/whispercpp-sys/whispercpp_shim.cpp
@@ -0,0 +1,125 @@
+// C++ exception-catching shim around whisper.cpp's public API.
+//
+// See whispercpp_shim.h for the rationale. Every wrapper in
+// this file isolates its whisper_* call inside `try/catch (…)`
+// so a `std::bad_alloc` / `std::system_error` / any other
+// throw inside whisper.cpp becomes a sentinel return value
+// instead of unwinding through the `extern "C"` boundary into
+// Rust — which is undefined behaviour.
+
+#include "whispercpp_shim.h"
+
+#include
+#include
+#include
+
+// Per-thread "most recent caught constructor exception" slot.
+//
+// the constructor shims previously collapsed
+// every failure (including caught exceptions) onto `nullptr`,
+// indistinguishable from an upstream "init failed cleanly"
+// nullptr return. Callers therefore couldn't tell a retryable
+// failure (bad path, missing file) from a partial-init exception
+// that leaked the `new whisper_context` / `new whisper_state`
+// allocations.
+//
+// We expose a thread-local sentinel. Each constructor entry
+// resets it to 0 and writes a `WHISPERCPP_ERR_*` value on catch.
+// Callers pair every `nullptr` observation with
+// `whispercpp_take_last_constructor_exception` to discriminate
+// — and surface the exception case as a non-retryable fatal
+// error so workers don't compound the leak.
+//
+// Why thread-local: concurrent context/state inits on different
+// threads must not interleave their sentinels. Cross-thread
+// reads are forbidden by the API contract (read on the same
+// thread that made the call).
+//
+// Why a single slot for both `init_from_file` and `init_state`:
+// the safe Rust API reads the sentinel synchronously after each
+// constructor call, before any other shim entry on the same
+// thread. There's no observation window where one constructor's
+// exception could be misread as another's.
+static thread_local int g_last_constructor_exception = 0;
+
+extern "C" {
+
+struct whisper_context * whispercpp_init_from_file_no_state(
+ const char * path_model,
+ struct whisper_context_params params)
+{
+ g_last_constructor_exception = 0;
+ try {
+ return whisper_init_from_file_with_params_no_state(path_model, params);
+ } catch (const std::bad_alloc &) {
+ g_last_constructor_exception = WHISPERCPP_ERR_BAD_ALLOC;
+ return nullptr;
+ } catch (const std::system_error &) {
+ g_last_constructor_exception = WHISPERCPP_ERR_SYSTEM_ERROR;
+ return nullptr;
+ } catch (const std::exception &) {
+ g_last_constructor_exception = WHISPERCPP_ERR_STD_EXCEPTION;
+ return nullptr;
+ } catch (...) {
+ g_last_constructor_exception = WHISPERCPP_ERR_UNKNOWN_EXCEPTION;
+ return nullptr;
+ }
+}
+
+struct whisper_state * whispercpp_init_state(struct whisper_context * ctx)
+{
+ g_last_constructor_exception = 0;
+ try {
+ return whisper_init_state(ctx);
+ } catch (const std::bad_alloc &) {
+ g_last_constructor_exception = WHISPERCPP_ERR_BAD_ALLOC;
+ return nullptr;
+ } catch (const std::system_error &) {
+ g_last_constructor_exception = WHISPERCPP_ERR_SYSTEM_ERROR;
+ return nullptr;
+ } catch (const std::exception &) {
+ g_last_constructor_exception = WHISPERCPP_ERR_STD_EXCEPTION;
+ return nullptr;
+ } catch (...) {
+ g_last_constructor_exception = WHISPERCPP_ERR_UNKNOWN_EXCEPTION;
+ return nullptr;
+ }
+}
+
+int whispercpp_take_last_constructor_exception(void)
+{
+ int v = g_last_constructor_exception;
+ g_last_constructor_exception = 0;
+ return v;
+}
+
+int whispercpp_full_with_state(
+ struct whisper_context * ctx,
+ struct whisper_state * state,
+ struct whisper_full_params params,
+ const float * samples,
+ int n_samples)
+{
+ try {
+ return whisper_full_with_state(ctx, state, params, samples, n_samples);
+ } catch (const std::bad_alloc &) {
+ return WHISPERCPP_ERR_BAD_ALLOC;
+ } catch (const std::system_error &) {
+ return WHISPERCPP_ERR_SYSTEM_ERROR;
+ } catch (const std::exception &) {
+ return WHISPERCPP_ERR_STD_EXCEPTION;
+ } catch (...) {
+ return WHISPERCPP_ERR_UNKNOWN_EXCEPTION;
+ }
+}
+
+const char * whispercpp_print_system_info(void)
+{
+ try {
+ return whisper_print_system_info();
+ } catch (...) {
+ return nullptr;
+ }
+}
+
+} // extern "C"
diff --git a/whispercpp-sys/whispercpp_shim.h b/whispercpp-sys/whispercpp_shim.h
new file mode 100644
index 0000000..8965fa8
--- /dev/null
+++ b/whispercpp-sys/whispercpp_shim.h
@@ -0,0 +1,122 @@
+/// C-ABI shims around the whisper.cpp public API.
+///
+/// Every function declared here wraps its whisper.cpp
+/// counterpart in a `try { ... } catch (...) { ... }` block.
+/// flagged that whisper.cpp's `extern "C"`
+/// entry points internally allocate `std::vector` and
+/// construct `std::thread`, both of which can throw
+/// (`std::bad_alloc`, `std::system_error`) under realistic
+/// resource pressure. C++ exceptions propagating across an
+/// `extern "C"` boundary into Rust code that hasn't compiled
+/// with `panic=unwind` ABI compatibility is undefined
+/// behaviour.
+///
+/// Convention:
+///
+/// * Constructors that return `T*` on success return
+/// `nullptr` on caught exception (matches the C API's
+/// existing failure mode).
+/// * `int`-returning `whisper_full_with_state` returns a
+/// negative sentinel for caught exceptions:
+/// * `-100` for `std::bad_alloc` (OOM)
+/// * `-101` for `std::system_error` (thread/system call)
+/// * `-102` for any other `std::exception`
+/// * `-103` for unknown / non-`std::exception` throws
+/// These overlap whisper.cpp's own negative return codes
+/// (which top out at `-7` in v1.8.4) without colliding;
+/// the safe-Rust wrapper translates them into typed
+/// `WhisperError` variants.
+
+#ifndef WHISPERCPP_SHIM_H
+#define WHISPERCPP_SHIM_H
+
+#include "whisper.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/// Exception sentinels returned by `whispercpp_full_with_state`.
+/// Defined as macros (not enums) so bindgen treats them as
+/// plain integer constants the safe wrapper can match on.
+#define WHISPERCPP_ERR_BAD_ALLOC -100
+#define WHISPERCPP_ERR_SYSTEM_ERROR -101
+#define WHISPERCPP_ERR_STD_EXCEPTION -102
+#define WHISPERCPP_ERR_UNKNOWN_EXCEPTION -103
+
+/// `whisper_init_from_file_with_params_no_state` wrapped in
+/// try/catch.
+///
+/// Returns `nullptr` on either:
+/// * the upstream C API's documented failure (file not found,
+/// model corrupt, backend init refused, etc. — these return
+/// nullptr without throwing), OR
+/// * a caught C++ exception inside the upstream init path
+/// (`std::bad_alloc`, `std::system_error`,
+/// `std::exception`, or anything else).
+///
+/// Use [`whispercpp_take_last_constructor_exception`] AFTER
+/// observing `nullptr` to discriminate the two cases — the
+/// caller MUST treat the exception case as fatal (the
+/// upstream code has no RAII around `new whisper_context;`,
+/// so any throw mid-init leaks the partial allocation).
+///
+struct whisper_context * whispercpp_init_from_file_no_state(
+ const char * path_model,
+ struct whisper_context_params params);
+
+/// `whisper_init_state` wrapped in try/catch.
+///
+/// Same `nullptr` discrimination contract as
+/// [`whispercpp_init_from_file_no_state`]: pair every
+/// `nullptr` observation with
+/// [`whispercpp_take_last_constructor_exception`] to
+/// distinguish "upstream returned nullptr cleanly" (retryable)
+/// from "exception caught, partial native allocation leaked"
+/// (fatal).
+struct whisper_state * whispercpp_init_state(struct whisper_context * ctx);
+
+/// Read-and-clear the most recent **constructor** exception
+/// sentinel.
+///
+/// Set by [`whispercpp_init_from_file_no_state`] and
+/// [`whispercpp_init_state`] inside their `catch` blocks; reset
+/// to `0` on entry to those functions and again by this
+/// accessor.
+///
+/// Returns one of:
+/// * `0` — no exception was caught on the most recent
+/// constructor call on this thread (a `nullptr` return means
+/// the upstream C API returned `nullptr` cleanly, no leak).
+/// * `WHISPERCPP_ERR_BAD_ALLOC` — `std::bad_alloc` during init.
+/// * `WHISPERCPP_ERR_SYSTEM_ERROR` — `std::system_error`.
+/// * `WHISPERCPP_ERR_STD_EXCEPTION` — other `std::exception`.
+/// * `WHISPERCPP_ERR_UNKNOWN_EXCEPTION` — non-`std::exception`
+/// throw.
+///
+/// Thread-local: each thread observes its own most-recent
+/// sentinel. Callers must invoke this on the SAME thread that
+/// made the constructor call, immediately after observing the
+/// `nullptr` return. Inserting other shim calls between the
+/// constructor and this read clobbers the sentinel.
+int whispercpp_take_last_constructor_exception(void);
+
+/// `whisper_full_with_state` wrapped in try/catch.
+int whispercpp_full_with_state(
+ struct whisper_context * ctx,
+ struct whisper_state * state,
+ struct whisper_full_params params,
+ const float * samples,
+ int n_samples);
+
+/// `whisper_print_system_info` wrapped in try/catch. Upstream
+/// rebuilds a static `std::string` via `s = ""; s += "..."; s
+/// += std::to_string(...);` which can throw `std::bad_alloc`
+/// across the C ABI. Returns NULL on any caught exception.
+const char * whispercpp_print_system_info(void);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif // WHISPERCPP_SHIM_H
diff --git a/whispercpp-sys/wrapper.h b/whispercpp-sys/wrapper.h
new file mode 100644
index 0000000..02b35c6
--- /dev/null
+++ b/whispercpp-sys/wrapper.h
@@ -0,0 +1,12 @@
+// bindgen entry point. Pulls only what we use. Adding new
+// whisper.cpp surface to the safe wrapper means adding the
+// matching `#include` here AND extending the `allowlist_*`
+// directives in `build.rs` — there is no implicit re-export.
+//
+// `whispercpp_shim.h` exposes the exception-catching C ABI
+// shim layer. Every safe-Rust entry point
+// that can run user-controlled allocations / thread spawns
+// goes through these shims rather than calling whisper.cpp
+// directly.
+#include "whisper.h"
+#include "whispercpp_shim.h"
diff --git a/whispercpp/.gitignore b/whispercpp/.gitignore
new file mode 100644
index 0000000..5929c59
--- /dev/null
+++ b/whispercpp/.gitignore
@@ -0,0 +1,4 @@
+# Per-crate cargo target dir (whisper-cpp uses its own
+# workspace declaration so cargo writes here rather than
+# alongside whispery's main target/).
+target/
diff --git a/whispercpp/Cargo.toml b/whispercpp/Cargo.toml
new file mode 100644
index 0000000..04ed87b
--- /dev/null
+++ b/whispercpp/Cargo.toml
@@ -0,0 +1,96 @@
+[package]
+name = "whispercpp"
+version = "0.1.0"
+edition.workspace = true
+rust-version.workspace = true
+license.workspace = true
+repository.workspace = true
+readme = "README.md"
+description = "Safe Rust bindings for whisper.cpp speech recognition. Bundled patched build with memory-safety hardening, exception-catching FFI shim, and Send + Sync types."
+keywords = ["whisper", "asr", "speech-to-text", "transcription", "audio"]
+categories = ["multimedia::audio", "api-bindings", "science"]
+
+[lib]
+name = "whispercpp"
+path = "src/lib.rs"
+
+[features]
+# Default: CPU build (no GPU dep). `cargo add whispercpp` Just
+# Works on every platform (Apple, Linux, Windows) without a
+# preinstalled whisper.cpp — but ships ZERO GPU acceleration.
+# Opt into GPU backends explicitly per-target via the feature
+# flags below; multi-platform consumers can document which
+# accelerator each target needs in their own `Cargo.toml`
+# rather than silently inheriting an Apple-Silicon-only set.
+#
+# whisper.cpp is **always** built from the vendored submodule
+# in `whispercpp-sys/whisper.cpp/` and patched in `OUT_DIR`.
+# There is no system / pkg-config path: routing safe-Rust
+# code through a stock unpatched libwhisper would silently
+# drop the memory-safety guarantees the bundled patches
+# provide.
+default = []
+
+# Enables `serde::{Serialize, Deserialize}` for `Lang`. The
+# wire format is the lowercase ISO-639-1 string ("en", "yue",
+# etc.) — see `Lang`'s impls.
+serde = ["dep:serde", "smol_str/serde"]
+
+# ── Backends ──────────────────────────────────────────────
+#
+# Each chains 1:1 to `whispercpp-sys`'s matching feature,
+# which is what actually toggles whisper.cpp / ggml's CMake
+# `-DGGML_*=ON` flag.
+
+# Apple-only:
+metal = ["whispercpp-sys/metal"] # GGML_METAL: Metal GPU
+coreml = ["whispercpp-sys/coreml"] # WHISPER_COREML: ANE encoder
+
+# Cross-platform GPU:
+vulkan = ["whispercpp-sys/vulkan"] # GGML_VULKAN: Vulkan compute
+opencl = ["whispercpp-sys/opencl"] # GGML_OPENCL: mobile / Adreno
+
+# Vendor-specific GPU:
+cuda = ["whispercpp-sys/cuda"] # NVIDIA
+hipblas = ["whispercpp-sys/hipblas"] # AMD ROCm/HIP
+sycl = ["whispercpp-sys/sycl"] # Intel oneAPI / Arc
+musa = ["whispercpp-sys/musa"] # Moore Threads
+
+# Encoder accelerators (similar role to CoreML on other vendors):
+openvino = ["whispercpp-sys/openvino"] # Intel OpenVINO
+
+# CPU BLAS:
+openblas = ["whispercpp-sys/openblas"] # OpenBLAS
+
+[dependencies]
+# Low-level FFI to whisper.cpp. Path dep — sibling crate at
+# `../whispercpp-sys/`. All `unsafe extern "C"` declarations
+# live there; this crate only ever calls them behind safe
+# wrappers.
+whispercpp-sys = { version = "0.1", path = "../whispercpp-sys", default-features = false }
+# Public error type. `thiserror` keeps things light.
+thiserror = { version = "2", default-features = false }
+# Inline small strings (≤23 bytes) for error payloads — paths,
+# language hints, single-char interior-NUL diagnostics. Avoids
+# a heap allocation on every `WhisperError::ContextLoad` /
+# `InvalidCString`.
+smol_str = { version = "0.3", default-features = false }
+# Optional serde, gated by the `serde` feature. When enabled,
+# `Lang` round-trips through the canonical lowercase ISO-639-1
+# string (`"en"`, `"yue"`, …).
+serde = { version = "1", optional = true, default-features = false, features = ["alloc"] }
+
+[dev-dependencies]
+# Hound is for the `examples/smoke.rs` WAV reader only — never
+# pulled into a production build of `whispercpp` itself.
+hound = "3"
+# `lang.rs`'s serde tests round-trip through JSON. Dev-only —
+# no runtime cost in production builds.
+serde_json = "1"
+
+[[example]]
+name = "smoke"
+path = "examples/smoke.rs"
+
+[lints]
+workspace = true
diff --git a/whispercpp/README.md b/whispercpp/README.md
new file mode 120000
index 0000000..32d46ee
--- /dev/null
+++ b/whispercpp/README.md
@@ -0,0 +1 @@
+../README.md
\ No newline at end of file
diff --git a/whispercpp/TODO.md b/whispercpp/TODO.md
new file mode 100644
index 0000000..63a03f3
--- /dev/null
+++ b/whispercpp/TODO.md
@@ -0,0 +1,219 @@
+# whispercpp — unsupported surface
+
+This crate intentionally exposes a narrow slice of whisper.cpp.
+Everything below is reachable from the auto-generated FFI in the
+sibling `whispercpp-sys` crate (`whispercpp-sys/src/generated.rs`)
+but is NOT wrapped in safe Rust today.
+
+Three categories:
+
+1. **Deliberately omitted** — whispery doesn't need it; wrapping would
+ add maintenance + surface area without a caller.
+2. **Could add on demand** — small wrapper, not justified yet.
+3. **Larger work** — would need design choices about safety / lifetimes
+ before exposing.
+
+When adding something below, also extend the `allowlist_function`
+/ `allowlist_type` directives in
+`whispercpp-sys/build.rs::generate_bindings()` if the symbol
+isn't already in `whispercpp-sys/src/generated.rs`.
+
+---
+
+## 1. Deliberately omitted
+
+### Built-in VAD
+
+whisper.cpp ships its own VAD (Silero ONNX). Whispery uses the
+`silero` crate for VAD upstream of `whispercpp`, so the in-tree path
+is the canonical one. Re-wrapping whisper.cpp's wrapper duplicates
+state and complicates the call chain.
+
+Symbols: `whisper_vad_*`, `whisper_full_params::vad`,
+`vad_model_path`, `vad_params`, `set_min_speech_duration_ms`,
+`set_max_speech_duration_s`, `set_min_silence_duration_ms`,
+`set_speech_pad_ms`, `set_threshold`.
+
+### Grammar
+
+Whispery doesn't constrain decoding via grammar. The grammar
+machinery in whisper.cpp pulls a sizeable struct hierarchy
+(`whisper_grammar_element`, rules, stacks) and a non-trivial
+ownership model. No caller has asked for it.
+
+Symbols: `whisper_full_params::grammar_rules`, `grammar_n_rules`,
+`grammar_i_start_rule`, `grammar_penalty`, `set_grammar`,
+`set_grammar_penalty`, `set_start_rule`, `whisper_grammar_*`.
+
+### Translate task
+
+Whispery is transcribe-only. `set_translate(true)` (translate audio
+→ English) is wrapped (one-line passthrough), but the full
+translate-task flow (token id remapping, prompt seeding) is not
+exercised by any caller and we don't ship test coverage for it.
+
+### Tinydiarize controls
+
+`Segment::speaker_turn_next()` IS wrapped (it's a 1-byte read).
+Configuring `--tdrz` on the input side (`set_tdrz_enable`) is not
+— it requires a TDRZ-enabled checkpoint which whispery doesn't ship,
+and whispery's diarization runs upstream via pyannote-style
+clustering on word ranges.
+
+Symbols: `whisper_full_params::tdrz_enable`, `set_tdrz_enable`.
+
+### Lower-level entry points
+
+We expose `state.full()` only. The lower-level encode/decode flow
+(running the encoder, then `decode` token-by-token with custom
+sampling) is meaningful for research / custom samplers but doesn't
+fit whispery's pump architecture.
+
+Symbols: `whisper_encode`, `whisper_encode_with_state`,
+`whisper_decode`, `whisper_decode_with_state`, `whisper_get_logits`,
+`whisper_get_logits_from_state`, `whisper_set_mel`,
+`whisper_set_mel_with_state`, `whisper_pcm_to_mel`,
+`whisper_pcm_to_mel_with_state`.
+
+### Mid-decode callbacks
+
+Whisper.cpp can fire callbacks on every new segment, every logits
+emission, and at encoder start. Each requires the same trampoline
+discipline as the abort callback and adds another `Box`
+field to `Params`. None is wired into whispery's pump (which works
+chunk-at-a-time, not token-at-a-time).
+
+Symbols: `set_progress_callback`, `set_progress_callback_safe`,
+`set_progress_callback_user_data`, `set_new_segment_callback`,
+`set_segment_callback_safe`, `set_segment_callback_safe_lossy`,
+`set_new_segment_callback_user_data`, `set_filter_logits_callback`,
+`set_filter_logits_callback_user_data`, `set_start_encoder_callback`,
+`set_start_encoder_callback_user_data`.
+
+### Global logging hooks
+
+Whispery routes diagnostics through its own `eprintln!` / `tracing`
+layer. whisper.cpp's `set_log_callback` is a global hook that fires
+across all instances; mixing it with Rust logging frameworks
+requires more design than a 1:1 port.
+
+Symbols: `whisper_set_log_callback`, `set_debug_mode`,
+`whisper_log_callback`.
+
+### DTW token timestamps
+
+Whispery uses wav2vec2 forced alignment for word-level timing.
+whisper.cpp's DTW path is a parallel mechanism with its own
+configuration (`dtw_aheads`, `dtw_n_top`, `dtw_mem_size`). Wrapping
+it would invite confusion about which timestamping path is
+authoritative.
+
+Symbols: `whisper_full_params::dtw_token_timestamps` (true at
+construction, but `Params::set_dtw_*` and `dtw_aheads` array are
+not exposed), `whisper_aheads`, `whisper_full_get_token_dtw_t0_*`.
+
+### Buffer-load constructors
+
+We support `Context::new(path, params)` only. Loading from an
+in-memory buffer (`whisper_init_from_buffer_with_params`) or via a
+custom `whisper_model_loader` is rare and adds lifetime/ownership
+complexity.
+
+Symbols: `whisper_init_from_buffer_with_params`,
+`whisper_init_with_params` (custom loader), the `whisper_model_loader`
+struct.
+
+### Beam-search + greedy sampler details (advanced)
+
+Symbols: `set_beam_size`, `set_patience` are reachable through
+`SamplingStrategy::BeamSearch { beam_size, patience }` already.
+Direct `whisper_full_params::beam_search.beam_size` / `patience`
+accessors aren't exposed (use `Params::new(strategy)` with the
+right variant).
+
+---
+
+## 2. Could add on demand
+
+These are 5–15-line wrappers around an existing FFI symbol. None is
+required for whispery's current flow; each is justifiable when a
+concrete caller appears.
+
+| Whisper.cpp symbol | Suggested Rust API | Why might we want it |
+|---|---|---|
+| `whisper_token_text(ctx, token)` (alias of `token_to_str`) | already covered | — |
+| `whisper_token_to_bytes` | `Context::token_to_bytes(token) -> Option<&[u8]>` | non-UTF-8 byte sequences from BPE merges |
+| `whisper_lang_id(name)` | `Context::lang_id_for(name: &str) -> Option` | reverse of `detected_lang` |
+| `whisper_lang_max_id()` | `pub const LANG_MAX_ID: i32 = …` | iterate languages |
+| `whisper_lang_str_full(id)` | `Lang::full_name() -> &'static str` | "english" vs "en" |
+| `whisper_token_translate / transcribe / prev / nosp / not / solm` | `Context::token_translate() -> i32`, etc. | force-prefix decoding seeds |
+| `whisper_token_lang(ctx, lang_id)` | `Context::token_for_lang(Lang) -> i32` | language-specific seeds |
+| `whisper_token_id(ctx, token: &str)` | `Context::tokenize_one(text) -> Option` | turn a string back into a token id |
+| `whisper_tokenize(ctx, text, tokens, max)` | `Context::tokenize(text) -> Vec` | batch tokenization for `set_tokens` |
+| `whisper_n_len_from_state(state)` | `State::n_mel_frames() -> i32` | mel buffer length |
+| `whisper_print_timings(ctx)` | `Context::print_timings()` | end-of-run cost breakdown |
+| `whisper_reset_timings(ctx)` | `Context::reset_timings()` | per-chunk timing |
+| `whisper_get_whisper_version()` | `pub fn version() -> &'static str` | diagnostic |
+| Model layer counts (`whisper_model_n_audio_state` / `n_audio_head` / `n_audio_layer` / `n_text_state` / `n_text_head` / `n_text_layer` / `n_mels` / `model_ftype`) | `Context::model_dims() -> ModelDims` (struct of ints) | architecture-aware diagnostics |
+| `whisper_full_get_token_p_from_state` | `Token::posterior() -> f32` | already covered indirectly via `Token::p()` reading `whisper_token_data.p` — verify the two agree under wildcard / temperature |
+| `whisper_full_n_tokens_from_state` | already covered (`Segment::n_tokens()`) | — |
+
+Adding any of these means: extend the safe wrapper module, run the
+existing test suite (`cargo test -p whispercpp --features serde`),
+and confirm no rebuild loop on `src/generated.rs` (build.rs short-
+circuits when the bindgen output is byte-identical).
+
+---
+
+## 3. Larger work
+
+### Token-stream `Iterator`
+
+`State::segments_iter()` and `Segment::tokens_iter()` would be nice
+ergonomics. The lifetime story is non-trivial — each `Segment` /
+`Token` borrows from `State` via raw pointer. A correct iterator
+needs to project through that lifetime without aliasing.
+
+### Async-friendly `full`
+
+`State::full` blocks for the duration of the decode (seconds to
+minutes). A `tokio`-friendly variant that runs the FFI on a
+blocking task pool and yields completion would help server use
+cases. Currently callers spawn their own threads.
+
+### Streaming / partial-result API
+
+whisper.cpp's `whisper_full` is a one-shot call. Streaming
+transcription requires either (a) the new-segment callback path
+(see "Mid-decode callbacks" above), or (b) external chunking + one
+`Context::create_state()` per chunk. Whispery does (b) at the
+runner layer.
+
+### CoreML companion model build
+
+Whispery ships `coreml` as an opt-in feature, but generating the
+`.mlmodelc` companion file (whisper.cpp's `models/generate-coreml-
+model.sh`) is out-of-band. A `whispercpp-tools` crate or a build.rs
+helper that converts a checkpoint at install time would close the
+loop, but it requires `coremltools` (a Python dep) at build time —
+not great.
+
+---
+
+## Audit policy
+
+Before adding new public functions to `whispercpp`:
+
+1. Confirm the FFI symbol is in
+ `whispercpp-sys/src/generated.rs`. If not, extend the
+ allowlists in `whispercpp-sys/build.rs::generate_bindings()`.
+2. Replicate the safety rules used by the closest existing wrapper:
+ pointer is non-null, lifetime tied to the parent struct, no
+ aliasing across threads. Document the SAFETY block.
+3. Keep the public surface minimal — accessors private until a
+ caller materialises. The crate's value is "small, audited, no
+ leaks"; that holds only if every `unsafe` block has an obvious
+ justification.
+
+For deliberately-omitted items, prefer documenting the omission here
+rather than wrapping speculatively.
diff --git a/build.rs b/whispercpp/build.rs
similarity index 100%
rename from build.rs
rename to whispercpp/build.rs
diff --git a/whispercpp/examples/smoke.rs b/whispercpp/examples/smoke.rs
new file mode 100644
index 0000000..b772b25
--- /dev/null
+++ b/whispercpp/examples/smoke.rs
@@ -0,0 +1,92 @@
+//! Smoke test: load a model, transcribe a 16 kHz mono WAV, print
+//! the segment list. Times each phase so we can compare against
+//! whisper-cli end-to-end.
+//!
+//! ```text
+//! whisper-cpp-smoke [language]
+//! ```
+
+use std::time::Instant;
+
+use whispercpp::{Context, ContextParams, Params, SamplingStrategy};
+
+fn main() -> Result<(), Box> {
+ let mut args = std::env::args().skip(1);
+ let model = args.next().ok_or("usage: [lang]")?;
+ let wav = args.next().ok_or("usage: [lang]")?;
+ let lang = args.next().unwrap_or_else(|| "en".to_string());
+
+ // Load 16 kHz mono f32. We rely on the `hound` crate at the
+ // workspace level normally; for the smoke test, do it inline
+ // to keep dependencies on this crate to literally just whisper.
+ let samples = read_wav_16k_mono(&wav)?;
+ let dur_s = samples.len() as f64 / 16_000.0;
+ eprintln!(
+ "[smoke] wav={wav} samples={} dur={dur_s:.2}s",
+ samples.len()
+ );
+
+ let t_load = Instant::now();
+ let ctx = std::sync::Arc::new(Context::new(
+ &model,
+ ContextParams::new().with_use_gpu(true),
+ )?);
+ eprintln!(
+ "[smoke] context loaded in {:.3}s",
+ t_load.elapsed().as_secs_f64()
+ );
+
+ let mut state = ctx.create_state()?;
+
+ let mut params = Params::new(SamplingStrategy::Greedy { best_of: 1 });
+ // `set_language` is fallible (interior NUL); the rest are
+ // infallible chained `&mut Self` setters.
+ params.set_language(&lang)?;
+ params
+ .set_n_threads(1)
+ .set_no_context(true)
+ .set_suppress_blank(true)
+ .set_suppress_nst(true)
+ .set_temperature(0.0)
+ .set_temperature_inc(0.0)
+ .set_no_speech_thold(0.6)
+ .silence_print_toggles();
+
+ let t_full = Instant::now();
+ state.full(¶ms, &samples)?;
+ let full_s = t_full.elapsed().as_secs_f64();
+ eprintln!("[smoke] full() in {full_s:.3}s | rtf={:.3}", full_s / dur_s);
+
+ let n = state.n_segments();
+ eprintln!("[smoke] {n} segments");
+ for i in 0..n {
+ let seg = state.segment(i).expect("idx in range");
+ let t0 = seg.t0() as f64 * 0.01;
+ let t1 = seg.t1() as f64 * 0.01;
+ eprintln!(" [{t0:7.2}s -> {t1:7.2}s] {}", seg.text()?);
+ }
+ Ok(())
+}
+
+fn read_wav_16k_mono(path: &str) -> Result, Box> {
+ // Inline hound usage to keep the crate's runtime deps to zero
+ // beyond `thiserror`. The smoke binary only — production callers
+ // bring their own audio loader (whispery uses ffmpeg-next).
+ let mut reader = hound::WavReader::open(path)?;
+ let spec = reader.spec();
+ if spec.sample_rate != 16_000 {
+ return Err(format!("expected 16 kHz, got {} Hz", spec.sample_rate).into());
+ }
+ if spec.channels != 1 {
+ return Err(format!("expected mono, got {} channels", spec.channels).into());
+ }
+ match spec.sample_format {
+ hound::SampleFormat::Float => Ok(reader.samples::().collect::>()?),
+ hound::SampleFormat::Int => Ok(
+ reader
+ .samples::()
+ .map(|s| s.map(|x| x as f32 / 32768.0))
+ .collect::>()?,
+ ),
+ }
+}
diff --git a/whispercpp/src/context.rs b/whispercpp/src/context.rs
new file mode 100644
index 0000000..9601892
--- /dev/null
+++ b/whispercpp/src/context.rs
@@ -0,0 +1,727 @@
+//! `Context` — the loaded whisper model.
+//!
+//! Owns the `whisper_context*` returned by
+//! `whisper_init_from_file_with_params`. Drop calls
+//! `whisper_free`. Cloning is intentionally NOT supported — the
+//! underlying whisper.cpp object is a unique owned resource. To
+//! run multiple inference threads against the same model, share
+//! `Arc` and call [`Context::create_state`] per thread
+//! (each `State` carries its own KV cache).
+
+#![allow(unsafe_code)]
+
+use core::{
+ ptr::NonNull,
+ sync::atomic::{AtomicBool, Ordering},
+};
+use std::{
+ ffi::CString,
+ path::Path,
+ sync::{Arc, Mutex, MutexGuard},
+};
+
+use crate::{
+ error::{WhisperError, WhisperResult},
+ state::State,
+ sys,
+};
+
+/// Acquire the process-wide mutex guarding every FFI call
+/// that mutates ggml's global logger state.
+///
+/// `whisper_init_state` calls
+/// `whisper_backend_init_gpu`, which unconditionally invokes
+/// `ggml_log_set(g_state.log_callback, …)` — writing to
+/// ggml's file-static logger globals without any
+/// synchronisation. `whisper_init_from_file_with_params_no_state`
+/// is in the same family (touches `g_state` indirectly through
+/// backend probing). With `unsafe impl Sync for Context`, two
+/// safe-Rust threads holding `Arc` could call
+/// `create_state` (or `Context::new`) concurrently and race on
+/// those globals — a C/C++ data race reachable from safe Rust.
+///
+/// The mutex serialises both init paths. Cost: one mutex
+/// acquire per `Context::new` and per `create_state`. Both are
+/// init-time, not hot-path; whispery's worker pool
+/// pre-creates one `State` per worker at startup, so this is
+/// microseconds-per-startup-once.
+pub(crate) fn init_lock() -> MutexGuard<'static, ()> {
+ static LOCK: Mutex<()> = Mutex::new(());
+ // Recover a poisoned lock — we don't hold any state on
+ // the inner ``, so re-acquiring after an unrelated panic
+ // in a sibling thread is fine.
+ LOCK.lock().unwrap_or_else(|e| e.into_inner())
+}
+
+/// Knobs forwarded to `whisper_context_default_params` before
+/// loading. Mirrors the subset of `whisper_context_params` whispery
+/// uses today.
+///
+/// All fields are private; access goes through `const fn`
+/// accessors and `with_*` builder methods so the type's invariants
+/// stay encapsulated and the public surface evolves
+/// independently of the underlying C struct.
+#[derive(Debug, Clone, Copy)]
+pub struct ContextParams {
+ use_gpu: bool,
+ gpu_device: i32,
+ flash_attn: bool,
+}
+
+impl ContextParams {
+ /// Defaults: GPU on (Metal/CUDA where compiled in), device 0,
+ /// flash-attn off.
+ #[cfg_attr(not(tarpaulin), inline(always))]
+ pub const fn new() -> Self {
+ Self {
+ use_gpu: true,
+ gpu_device: 0,
+ flash_attn: false,
+ }
+ }
+
+ /// Whether the encoder dispatches to a GPU backend (Metal /
+ /// CUDA). On Apple Silicon: `true` is required to avoid the
+ /// BLAS-only encode path that hits whisper.cpp's `failed to
+ /// encode` error on `large-v3-turbo`.
+ #[cfg_attr(not(tarpaulin), inline(always))]
+ pub const fn use_gpu(&self) -> bool {
+ self.use_gpu
+ }
+
+ /// Chained setter for [`Self::use_gpu`]. `const fn` so callers
+ /// can build a `ContextParams` in `const` context (e.g. in
+ /// per-runner config statics).
+ #[cfg_attr(not(tarpaulin), inline(always))]
+ pub const fn with_use_gpu(mut self, on: bool) -> Self {
+ self.use_gpu = on;
+ self
+ }
+
+ /// GPU device index (default `0` = primary).
+ #[cfg_attr(not(tarpaulin), inline(always))]
+ pub const fn gpu_device(&self) -> i32 {
+ self.gpu_device
+ }
+
+ /// Chained setter for [`Self::gpu_device`].
+ #[cfg_attr(not(tarpaulin), inline(always))]
+ pub const fn with_gpu_device(mut self, idx: i32) -> Self {
+ self.gpu_device = idx;
+ self
+ }
+
+ /// Whether flash-attention is enabled. Default `false`.
+ #[cfg_attr(not(tarpaulin), inline(always))]
+ pub const fn flash_attn(&self) -> bool {
+ self.flash_attn
+ }
+
+ /// Chained setter for [`Self::flash_attn`].
+ #[cfg_attr(not(tarpaulin), inline(always))]
+ pub const fn with_flash_attn(mut self, on: bool) -> Self {
+ self.flash_attn = on;
+ self
+ }
+}
+
+impl Default for ContextParams {
+ #[cfg_attr(not(tarpaulin), inline(always))]
+ fn default() -> Self {
+ Self::new()
+ }
+}
+
+/// Loaded whisper.cpp model. Cheap to share via `Arc`.
+pub struct Context {
+ // `NonNull` (vs. `*mut`) makes the Drop impl total — there is
+ // no "uninitialised" representation to guard against.
+ ptr: NonNull,
+ // bound the per-Context leak budget under
+ // `WhisperError::StateLost`. A `State::full` exception
+ // poisons the State (we MUST NOT free a possibly-corrupt
+ // `whisper_state`) and leaks that state's native
+ // allocations (~360 MB on `large-v3-turbo`). Without this
+ // flag, callers retrying `create_state` on the same Context
+ // accumulate one leak per attempt until the host runs out
+ // of memory. With it, `create_state` short-circuits to
+ // `ContextPoisoned` after the FIRST `StateLost`, capping
+ // the total leak at one State per Context. Recovery
+ // requires dropping this Context and constructing a fresh
+ // one (model reload — slow but bounded).
+ lost: AtomicBool,
+ // Serialise `State::full` calls through this Context.
+ // Without this lock, multiple
+ // workers each holding their own `State` (the documented
+ // pattern) can ALL be inside `whispercpp_full_with_state`
+ // simultaneously when an OOM / system_error fires. Each
+ // would poison its own state and leak ~360 MB before any
+ // of them got to mark the Context lost — the per-Context
+ // cap claim becomes a per-concurrent-worker cap, defeating
+ // the point. Holding this mutex across the FFI call makes
+ // the cap structural: at most one in-flight call per
+ // Context, so at most one leaked state per Context.
+ //
+ // Throughput cost: serialised inference per Context. On
+ // GPU backends (Metal, CUDA, Vulkan) the underlying
+ // command queue is already serialised, so the cost is
+ // small. On CPU-only inference, throughput drops to one
+ // inference at a time per Context — callers who need
+ // parallel CPU inference should run multiple Contexts
+ // (each loads its own copy of the model).
+ full_lock: Mutex<()>,
+}
+
+// SAFETY: whisper.cpp's context is read-only after init —
+// `whisper_init_from_file_with_params` is the only mutator and
+// runs entirely before we hand out the pointer. Per-thread state
+// (KV cache, scratch buffers) lives in `State`, not in `Context`.
+// Verified against whisper.cpp v1.8.4 (the submodule pin).
+unsafe impl Send for Context {}
+unsafe impl Sync for Context {}
+
+impl Context {
+ /// Load a `.bin` (GGML / GGUF) model from disk.
+ ///
+ /// Returns [`WhisperError::ContextLoad`] when whisper.cpp could
+ /// not parse the file or initialise the requested backend, or
+ /// [`WhisperError::InvalidCString`] if `path` contains an
+ /// interior NUL. **Panic-free.**
+ pub fn new(path: impl AsRef, params: ContextParams) -> WhisperResult {
+ let path_ref = path.as_ref();
+ let path_str = path_ref.to_string_lossy();
+ let cpath = CString::new(path_str.as_ref())
+ .map_err(|_| WhisperError::InvalidCString(smol_str::SmolStr::new(path_str.as_ref())))?;
+
+ // SAFETY: pure C call returning a value-typed defaults struct.
+ let mut cparams = unsafe { sys::whisper_context_default_params() };
+ cparams.use_gpu = params.use_gpu();
+ cparams.gpu_device = params.gpu_device();
+ cparams.flash_attn = params.flash_attn();
+
+ // Serialise init: backend probing inside whisper.cpp
+ // touches ggml's global logger state.
+ let _lock = init_lock();
+
+ // SAFETY: cpath outlives the call (held on the stack);
+ // cparams is value-typed.
+ //
+ // We use the C++ exception-catching shim
+ // `whispercpp_init_from_file_no_state`:
+ // upstream allocates `std::vector` / `std::ifstream`
+ // buffers that can throw `std::bad_alloc` on OOM, and
+ // unwinding C++ exceptions across `extern "C"` into Rust
+ // is undefined behaviour. The shim catches everything
+ // and collapses to a NULL return.
+ //
+ // The shim itself wraps the `_no_state` form — that's
+ // intentional: the default
+ // `whisper_init_from_file_with_params` allocates an
+ // extra ~360 MB `whisper_state` into `ctx->state` that
+ // we never use (every inference path creates its own via
+ // [`Context::create_state`]).
+ // (`src/whisper.cpp:3735`).
+ //
+ // # Leak-on-OOM discrimination
+ //
+ // Upstream's
+ // `whisper_init_from_file_with_params_no_state` does
+ // `whisper_context * ctx = new whisper_context;` and
+ // then performs throwing model-load work (vector
+ // allocations for tensors, GPU buffer allocations on
+ // Apple Silicon / CUDA, file-stream reads). If a
+ // `std::bad_alloc` or `std::system_error` fires AFTER
+ // the raw `new` succeeded but BEFORE the function's own
+ // explicit-cleanup branches run, the partial
+ // `whisper_context` and any tensor/backend buffers
+ // already allocated leak — the shim catches the
+ // exception but has no pointer to clean up.
+ //
+ // The shim keeps a thread-local sentinel that
+ // distinguishes the two flavours of NULL return:
+ //
+ // * `take_last_constructor_exception == 0` →
+ // upstream returned NULL CLEANLY (file-not-found,
+ // wrong magic, backend refused — no `new` happened
+ // yet, or upstream's own bool-failure paths cleaned
+ // up). Surface as `ContextLoad`, retryable.
+ // * `take_last_constructor_exception != 0` → the
+ // shim caught a C++ throw with the `new
+ // whisper_context` already allocated. Surface as
+ // `ConstructorLost`, NOT retryable — see that
+ // variant's docs for the recovery contract.
+ let raw = unsafe { sys::whispercpp_init_from_file_no_state(cpath.as_ptr(), cparams) };
+
+ if let Some(ptr) = NonNull::new(raw) {
+ return Ok(Self {
+ ptr,
+ lost: AtomicBool::new(false),
+ full_lock: Mutex::new(()),
+ });
+ }
+ // SAFETY: pure C call; thread-local read on the same
+ // thread that made the constructor call, with no other
+ // shim entry between them.
+ let exc = unsafe { sys::whispercpp_take_last_constructor_exception() };
+ if exc != 0 {
+ return Err(WhisperError::ConstructorLost {
+ origin: "context",
+ code: exc,
+ });
+ }
+ Err(WhisperError::ContextLoad {
+ path: smol_str::SmolStr::new(path_str.as_ref()),
+ reason: smol_str::SmolStr::new(
+ "whispercpp_init_from_file_no_state returned NULL (upstream load failure, no native exception caught)",
+ ),
+ })
+ }
+
+ /// Create a fresh inference [`State`] tied to this model.
+ ///
+ /// Takes `Arc` because the returned `State` owns a clone
+ /// of the Arc — that's what keeps the Context alive across the
+ /// state's lifetime without forcing callers to thread a `'ctx`
+ /// borrow through every storage location. Construct
+ /// `Arc::new(Context::new(...)?)` once per model, then call
+ /// `create_state` per worker.
+ pub fn create_state(self: &Arc) -> WhisperResult {
+ // refuse if a prior `State::full` on this
+ // Context returned `WhisperError::StateLost`. Each
+ // `StateLost` leaks the State's native allocations
+ // (~360 MB on `large-v3-turbo`); allowing `create_state`
+ // to allocate a fresh one would compound the leak per
+ // retry attempt. Callers must drop this Context and
+ // construct a fresh one (re-loading the model) to
+ // recover.
+ if self.lost.load(Ordering::Acquire) {
+ return Err(WhisperError::ContextPoisoned);
+ }
+ // Serialise init: `whisper_backend_init_gpu` calls
+ // `ggml_log_set(...)` on ggml's file-static logger
+ // globals without any synchronisation. Two threads
+ // creating states concurrently from a shared
+ // `Arc` would race on those globals — a C/C++
+ // data race reachable from safe Rust through
+ // `unsafe impl Sync for Context`.
+ let _lock = init_lock();
+
+ // SAFETY: self.ptr is non-null (NonNull invariant) and
+ // the Arc clone we hand to State keeps the Context (and
+ // therefore the underlying whisper_context*) alive for
+ // the State's lifetime.
+ //
+ // We route through the exception-catching shim
+ // `whispercpp_init_state`: upstream allocates KV-cache
+ // and scratch buffers via `std::vector` (each potentially
+ // throws `std::bad_alloc`), and on Apple Silicon also
+ // initialises the Metal backend (which can throw on
+ // device-init failure).
+ //
+ // # NULL-discrimination contract
+ //
+ // Same flavour split as `Context::new`: upstream's
+ // `whisper_init_state` either returns NULL via its
+ // bool-failure paths (every `if (!whisper_kv_cache_init…)`
+ // branch runs `whisper_free_state(state); return nullptr;`
+ // before returning — leak-free) OR throws a C++
+ // exception that our shim catches AFTER `new
+ // whisper_state` already happened (partial leak).
+ //
+ // Read the thread-local sentinel to distinguish:
+ // * `0` → `StateInit` (retryable, no leak)
+ // * `≠ 0` → `ConstructorLost { origin: "state", … }`
+ // (fatal, partial allocation leaked, do not auto-retry)
+ let raw = unsafe { sys::whispercpp_init_state(self.ptr.as_ptr()) };
+ // TOCTOU close. Between the entry-time
+ // `lost.load` above and `whispercpp_init_state` returning,
+ // another thread may have transitioned an existing State
+ // through `StateLost` and called `mark_lost`. If we
+ // published this fresh State to the caller, they'd add
+ // another leak-prone State to a Context whose poison flag
+ // is now true. Re-check after the alloc; if the flag
+ // flipped, free the just-created state (it's intact —
+ // came straight out of `whisper_init_state`) and return
+ // `ContextPoisoned`. This bounds the leak window to the
+ // duration of the FFI call rather than zero, but the
+ // freshly-allocated state is always freed cleanly so no
+ // permanent leak accumulates.
+ if self.lost.load(Ordering::Acquire) {
+ if let Some(state_ptr) = NonNull::new(raw) {
+ // SAFETY: `raw` is the just-returned, never-published
+ // result of `whispercpp_init_state`; nothing else
+ // holds it. `whisper_free_state` is the matching
+ // deallocator.
+ unsafe { sys::whisper_free_state(state_ptr.as_ptr()) };
+ }
+ // Even if the alloc threw (raw is null), drain the
+ // thread-local sentinel so it doesn't leak across into
+ // the next constructor call's catch-block.
+ let _ = unsafe { sys::whispercpp_take_last_constructor_exception() };
+ return Err(WhisperError::ContextPoisoned);
+ }
+ if let Some(state_ptr) = NonNull::new(raw) {
+ return Ok(State::from_raw(state_ptr, Arc::clone(self)));
+ }
+ // SAFETY: pure C call; thread-local read on the same
+ // thread, no other shim call between.
+ let exc = unsafe { sys::whispercpp_take_last_constructor_exception() };
+ if exc != 0 {
+ // A caught constructor exception means upstream
+ // `whisper_init_state` left partial native allocations
+ // that we cannot reliably free (the throw could have
+ // happened mid-init at any sub-call). Poison the
+ // Context so subsequent
+ // `create_state` calls fail with `ContextPoisoned`
+ // instead of repeating the same OOM / system_error
+ // path and compounding leaks.
+ self.lost.store(true, Ordering::Release);
+ return Err(WhisperError::ConstructorLost {
+ origin: "state",
+ code: exc,
+ });
+ }
+ Err(WhisperError::StateInit)
+ }
+
+ /// Internal: hand the raw pointer to siblings in this crate
+ /// that need to call FFI functions taking `whisper_context*`.
+ pub(crate) fn as_raw(&self) -> *mut sys::whisper_context {
+ self.ptr.as_ptr()
+ }
+
+ /// Internal: mark this Context as poisoned because a
+ /// `State::full` on one of its States returned a
+ /// `WhisperError::StateLost`. Subsequent
+ /// [`Context::create_state`] calls return
+ /// [`WhisperError::ContextPoisoned`].
+ ///
+ /// Idempotent: subsequent calls are cheap atomic stores.
+ /// `Ordering::Release` pairs with the
+ /// `Ordering::Acquire` load in `create_state` so threads
+ /// observing the flag also observe everything that led up
+ /// to the poisoning (per the C++ memory model: writes
+ /// before a Release become visible after the matching
+ /// Acquire).
+ #[cfg_attr(not(tarpaulin), inline(always))]
+ pub(crate) fn mark_lost(&self) {
+ self.lost.store(true, Ordering::Release);
+ }
+
+ /// Whether [`Context::create_state`] will refuse to
+ /// allocate a new [`State`]. `true` after any `State::full`
+ /// on this Context has returned
+ /// [`WhisperError::StateLost`]. Recovery requires dropping
+ /// this Context and constructing a fresh one.
+ pub fn is_poisoned(&self) -> bool {
+ self.lost.load(Ordering::Acquire)
+ }
+
+ /// Acquire the per-Context inference lock for the duration
+ /// of one [`State::full`] FFI call. held
+ /// across the leak-prone shim entry so concurrent workers
+ /// can't each leak under the same OOM event before
+ /// poisoning fires. Recovers from a poisoned mutex (a
+ /// previous holder panicked) by adopting the inner unit —
+ /// the inner state is ``, so there's no value to be
+ /// inconsistent.
+ #[cfg_attr(not(tarpaulin), inline(always))]
+ pub(crate) fn full_lock(&self) -> MutexGuard<'_, ()> {
+ self
+ .full_lock
+ .lock()
+ .unwrap_or_else(|poison| poison.into_inner())
+ }
+
+ // ── Model introspection ────────────────────────────────────
+
+ /// `true` if the loaded checkpoint carries the multilingual
+ /// decoder (e.g. `large-v3-turbo`). `false` for English-only
+ /// checkpoints (`tiny.en`, `base.en`, …).
+ pub fn is_multilingual(&self) -> bool {
+ // SAFETY: ctx pointer invariant.
+ unsafe { sys::whisper_is_multilingual(self.ptr.as_ptr()) != 0 }
+ }
+
+ /// Vocabulary size (number of tokens the decoder can emit).
+ pub fn n_vocab(&self) -> i32 {
+ // SAFETY: ctx pointer invariant.
+ unsafe { sys::whisper_n_vocab(self.ptr.as_ptr()) }
+ }
+
+ /// Audio context window (encoder mel-frame budget). 1500 for
+ /// the vanilla 30 s checkpoints.
+ pub fn n_audio_ctx(&self) -> i32 {
+ // SAFETY: ctx pointer invariant.
+ unsafe { sys::whisper_n_audio_ctx(self.ptr.as_ptr()) }
+ }
+
+ /// Text context window (decoder past-token budget). 448 for
+ /// the standard checkpoints.
+ pub fn n_text_ctx(&self) -> i32 {
+ // SAFETY: ctx pointer invariant.
+ unsafe { sys::whisper_n_text_ctx(self.ptr.as_ptr()) }
+ }
+
+ /// Human-readable model size string baked into the checkpoint
+ /// (`"tiny"`, `"base"`, `"large-v3-turbo"`, …). Returns
+ /// `None` if whisper.cpp returned a NULL pointer or non-UTF-8
+ /// (model corruption).
+ pub fn model_type(&self) -> Option<&'static str> {
+ // SAFETY: pure C accessor; pointer into a static
+ // const-table baked into libwhisper.
+ let raw = unsafe { sys::whisper_model_type_readable(self.ptr.as_ptr()) };
+ if raw.is_null() {
+ return None;
+ }
+ // SAFETY: NUL-terminated; static lifetime per whisper.cpp.
+ let bytes = unsafe { core::ffi::CStr::from_ptr(raw).to_bytes() };
+ core::str::from_utf8(bytes).ok()
+ }
+
+ // ── Special token ids ──────────────────────────────────────
+
+ /// `<|endoftext|>` — emitted at the end of every successful
+ /// decode. Useful for sentinel checks against `Token::id`.
+ pub fn token_eot(&self) -> i32 {
+ // SAFETY: ctx pointer invariant.
+ unsafe { sys::whisper_token_eot(self.ptr.as_ptr()) }
+ }
+
+ /// `<|startoftranscript|>`.
+ pub fn token_sot(&self) -> i32 {
+ // SAFETY: ctx pointer invariant.
+ unsafe { sys::whisper_token_sot(self.ptr.as_ptr()) }
+ }
+
+ /// First timestamp token (`<|0.00|>`). Token ids `>= token_beg`
+ /// encode timestamps; `< token_beg` encode text.
+ pub fn token_beg(&self) -> i32 {
+ // SAFETY: ctx pointer invariant.
+ unsafe { sys::whisper_token_beg(self.ptr.as_ptr()) }
+ }
+
+ /// Decode a single token id back to its surface form. Useful
+ /// for token-level diagnostics. Returns `None` when:
+ ///
+ /// * `token` is outside `[0, n_vocab)` — would otherwise
+ /// throw `std::out_of_range` from
+ /// `id_to_token.at(token)` across the C ABI (UB) per
+ /// `whisper.cpp:4201`. Pre-checking the bound here keeps
+ /// the unwound exception from crossing `extern "C"`.
+ /// * the underlying `c_str` is NULL or non-UTF-8 (model
+ /// corruption).
+ ///
+ /// The returned slice borrows from a `std::string` owned by
+ /// the context's vocab table; it stays valid for as long as
+ /// `self` is alive. (Unlike [`system_info`], this does NOT
+ /// alias mutable C++ state — `id_to_token` is built once at
+ /// load time and never modified.)
+ pub fn token_to_str(&self, token: i32) -> Option<&str> {
+ // Validate before the FFI call — the upstream `at` throw
+ // would cross `extern "C"` and is UB.
+ let n = self.n_vocab();
+ if token < 0 || token >= n {
+ return None;
+ }
+ // SAFETY: token bound checked above; ctx pointer invariant.
+ let raw = unsafe { sys::whisper_token_to_str(self.ptr.as_ptr(), token) };
+ if raw.is_null() {
+ return None;
+ }
+ // SAFETY: NUL-terminated; lives as long as Context.
+ let bytes = unsafe { core::ffi::CStr::from_ptr(raw).to_bytes() };
+ core::str::from_utf8(bytes).ok()
+ }
+}
+
+/// System-info string assembled by libwhisper — backend caps
+/// (BLAS / Metal / CUDA / OpenMP), CPU SIMD flags whisper.cpp
+/// detected, and the build id. Useful at startup-time logging
+/// to confirm which backend the runtime linked against.
+///
+/// Returns `None` if the C++ accessor handed back a NULL pointer
+/// or non-UTF-8 bytes (corrupt build).
+///
+/// # Soundness notes
+///
+/// `whisper_print_system_info` re-builds a file-scope
+/// `static std::string s` on every invocation
+/// (`s = ""; s += "..."; return s.c_str;`). Two unsoundness
+/// problems follow that we paper over here:
+///
+/// 1. The `c_str` returned to a previous caller becomes
+/// dangling on the next call — so we can't return
+/// `&'static str`. We copy into an owned [`SmolStr`](smol_str::SmolStr).
+/// 2. Two concurrent callers race on the static buffer (no
+/// upstream lock). We serialise behind a Rust-side
+/// [`OnceLock`](std::sync::OnceLock) AND a mutex so the
+/// underlying C call runs AT MOST ONCE per process,
+/// eliminating both the race AND the redundant work (the
+/// system info doesn't change after libwhisper loads).
+///
+/// Both hazards are documented against whisper.cpp v1.8.4 at
+/// `src/whisper.cpp:4315`.
+pub fn system_info() -> Option {
+ use std::sync::{Mutex, OnceLock};
+ // OnceLock holds the cached result; the inner Mutex
+ // serialises the FIRST call so two threads can't race the
+ // upstream static buffer. After init, OnceLock returns
+ // without locking on every call.
+ static CACHE: OnceLock