This umbrella issue catalogues whisper.cpp surface that is intentionally not wrapped by the safe whispercpp API. The omission is a deliberate scope choice — whispery (the in-house consumer) doesn't need these, and adding them speculatively expands the audited unsafe surface.
If you have a concrete use case for any item below, comment with your scenario. The decision isn't permanent — it's just deferred until a caller materialises.
See whispercpp/TODO.md § 1 for the full rationale.
Built-in VAD
Whisper.cpp ships its own Silero ONNX VAD. Whispery uses the silero crate upstream, which is the canonical path; re-wrapping whisper.cpp's VAD duplicates state.
Symbols: whisper_vad_*, whisper_full_params::vad, vad_model_path, vad_params, set_min_speech_duration_ms, set_max_speech_duration_s, set_min_silence_duration_ms, set_speech_pad_ms, set_threshold.
Grammar-constrained decoding
Pulls a sizeable struct hierarchy (whisper_grammar_element, rules, stacks) and a non-trivial ownership model. No caller has asked for it.
Symbols: whisper_full_params::grammar_rules, grammar_n_rules, grammar_i_start_rule, grammar_penalty, set_grammar, set_grammar_penalty, set_start_rule, whisper_grammar_*.
Translate task (audio → English)
Params::set_translate(true) is wrapped (1-line passthrough), but the full translate-task flow (token-id remapping, prompt seeding) is not exercised by any caller and we don't ship test coverage.
Tinydiarize input controls
Segment::speaker_turn_next() IS wrapped (1-byte read). Configuring --tdrz on the input side (set_tdrz_enable) is not — it requires a TDRZ-enabled checkpoint we don't ship, and whispery's diarization runs upstream via clustering on word ranges.
Symbols: whisper_full_params::tdrz_enable, set_tdrz_enable.
Lower-level encode/decode entry points
We expose state.full() only. The lower-level encode/decode flow (running the encoder, then decode token-by-token with custom sampling) is meaningful for research / custom samplers but doesn't fit whispery's pump architecture.
Symbols: whisper_encode, whisper_encode_with_state, whisper_decode, whisper_decode_with_state, whisper_get_logits, whisper_get_logits_from_state, whisper_set_mel, whisper_set_mel_with_state, whisper_pcm_to_mel, whisper_pcm_to_mel_with_state.
Mid-decode callbacks (progress / new segment / logits filter / encoder start)
Each requires the same trampoline discipline as the abort callback and adds another Box<dyn FnMut> field to Params. None is wired into whispery's chunk-at-a-time pump.
Symbols: set_progress_callback*, set_new_segment_callback*, set_filter_logits_callback*, set_start_encoder_callback*.
Related: see #3 for an iterator-based output API and #4 for an async variant — but for true streaming partial-result emission, the new-segment callback is the underlying mechanism and would need wrapping.
Global logging hooks
Whispery routes diagnostics through its own eprintln! / tracing layer. whisper_set_log_callback is a global hook that fires across all instances; mixing it with Rust logging frameworks needs more design than a 1:1 port.
Symbols: whisper_set_log_callback, set_debug_mode, whisper_log_callback.
Buffer-load and custom-loader constructors
We support Context::new(path, params) only. Loading from an in-memory buffer or via a custom whisper_model_loader is rare and adds lifetime/ownership complexity.
Symbols: whisper_init_from_buffer_with_params, whisper_init_with_params (custom loader), whisper_model_loader.
Direct beam_search / greedy field accessors
Use Params::new(SamplingStrategy::BeamSearch { beam_size, patience }) / Greedy { best_of }. Direct whisper_full_params::beam_search.beam_size / patience field access isn't exposed.
Removed from this list
- DTW token timestamps — landed on
feat/dtw. Enable via ContextParams::with_dtw_token_timestamps(true) + with_dtw_aheads_preset(...); read per-token timing as Token::t_dtw() -> Option<i64>. See the README's DTW section for the full API.
This umbrella issue catalogues whisper.cpp surface that is intentionally not wrapped by the safe
whispercppAPI. The omission is a deliberate scope choice — whispery (the in-house consumer) doesn't need these, and adding them speculatively expands the auditedunsafesurface.If you have a concrete use case for any item below, comment with your scenario. The decision isn't permanent — it's just deferred until a caller materialises.
See
whispercpp/TODO.md§ 1 for the full rationale.Built-in VAD
Whisper.cpp ships its own Silero ONNX VAD. Whispery uses the
silerocrate upstream, which is the canonical path; re-wrapping whisper.cpp's VAD duplicates state.Symbols:
whisper_vad_*,whisper_full_params::vad,vad_model_path,vad_params,set_min_speech_duration_ms,set_max_speech_duration_s,set_min_silence_duration_ms,set_speech_pad_ms,set_threshold.Grammar-constrained decoding
Pulls a sizeable struct hierarchy (
whisper_grammar_element, rules, stacks) and a non-trivial ownership model. No caller has asked for it.Symbols:
whisper_full_params::grammar_rules,grammar_n_rules,grammar_i_start_rule,grammar_penalty,set_grammar,set_grammar_penalty,set_start_rule,whisper_grammar_*.Translate task (audio → English)
Params::set_translate(true)is wrapped (1-line passthrough), but the full translate-task flow (token-id remapping, prompt seeding) is not exercised by any caller and we don't ship test coverage.Tinydiarize input controls
Segment::speaker_turn_next()IS wrapped (1-byte read). Configuring--tdrzon the input side (set_tdrz_enable) is not — it requires a TDRZ-enabled checkpoint we don't ship, and whispery's diarization runs upstream via clustering on word ranges.Symbols:
whisper_full_params::tdrz_enable,set_tdrz_enable.Lower-level encode/decode entry points
We expose
state.full()only. The lower-level encode/decode flow (running the encoder, thendecodetoken-by-token with custom sampling) is meaningful for research / custom samplers but doesn't fit whispery's pump architecture.Symbols:
whisper_encode,whisper_encode_with_state,whisper_decode,whisper_decode_with_state,whisper_get_logits,whisper_get_logits_from_state,whisper_set_mel,whisper_set_mel_with_state,whisper_pcm_to_mel,whisper_pcm_to_mel_with_state.Mid-decode callbacks (progress / new segment / logits filter / encoder start)
Each requires the same trampoline discipline as the abort callback and adds another
Box<dyn FnMut>field toParams. None is wired into whispery's chunk-at-a-time pump.Symbols:
set_progress_callback*,set_new_segment_callback*,set_filter_logits_callback*,set_start_encoder_callback*.Related: see #3 for an iterator-based output API and #4 for an async variant — but for true streaming partial-result emission, the new-segment callback is the underlying mechanism and would need wrapping.
Global logging hooks
Whispery routes diagnostics through its own
eprintln!/tracinglayer.whisper_set_log_callbackis a global hook that fires across all instances; mixing it with Rust logging frameworks needs more design than a 1:1 port.Symbols:
whisper_set_log_callback,set_debug_mode,whisper_log_callback.Buffer-load and custom-loader constructors
We support
Context::new(path, params)only. Loading from an in-memory buffer or via a customwhisper_model_loaderis rare and adds lifetime/ownership complexity.Symbols:
whisper_init_from_buffer_with_params,whisper_init_with_params(custom loader),whisper_model_loader.Direct
beam_search/greedyfield accessorsUse
Params::new(SamplingStrategy::BeamSearch { beam_size, patience })/Greedy { best_of }. Directwhisper_full_params::beam_search.beam_size/patiencefield access isn't exposed.Removed from this list
feat/dtw. Enable viaContextParams::with_dtw_token_timestamps(true)+with_dtw_aheads_preset(...); read per-token timing asToken::t_dtw() -> Option<i64>. See the README's DTW section for the full API.