Skip to content

Documented omissions: VAD, grammar, callbacks, DTW, low-level encode/decode, etc. #6

@uqio

Description

@uqio

This umbrella issue catalogues whisper.cpp surface that is intentionally not wrapped by the safe whispercpp API. The omission is a deliberate scope choice — whispery (the in-house consumer) doesn't need these, and adding them speculatively expands the audited unsafe surface.

If you have a concrete use case for any item below, comment with your scenario. The decision isn't permanent — it's just deferred until a caller materialises.

See whispercpp/TODO.md § 1 for the full rationale.

Built-in VAD

Whisper.cpp ships its own Silero ONNX VAD. Whispery uses the silero crate upstream, which is the canonical path; re-wrapping whisper.cpp's VAD duplicates state.

Symbols: whisper_vad_*, whisper_full_params::vad, vad_model_path, vad_params, set_min_speech_duration_ms, set_max_speech_duration_s, set_min_silence_duration_ms, set_speech_pad_ms, set_threshold.

Grammar-constrained decoding

Pulls a sizeable struct hierarchy (whisper_grammar_element, rules, stacks) and a non-trivial ownership model. No caller has asked for it.

Symbols: whisper_full_params::grammar_rules, grammar_n_rules, grammar_i_start_rule, grammar_penalty, set_grammar, set_grammar_penalty, set_start_rule, whisper_grammar_*.

Translate task (audio → English)

Params::set_translate(true) is wrapped (1-line passthrough), but the full translate-task flow (token-id remapping, prompt seeding) is not exercised by any caller and we don't ship test coverage.

Tinydiarize input controls

Segment::speaker_turn_next() IS wrapped (1-byte read). Configuring --tdrz on the input side (set_tdrz_enable) is not — it requires a TDRZ-enabled checkpoint we don't ship, and whispery's diarization runs upstream via clustering on word ranges.

Symbols: whisper_full_params::tdrz_enable, set_tdrz_enable.

Lower-level encode/decode entry points

We expose state.full() only. The lower-level encode/decode flow (running the encoder, then decode token-by-token with custom sampling) is meaningful for research / custom samplers but doesn't fit whispery's pump architecture.

Symbols: whisper_encode, whisper_encode_with_state, whisper_decode, whisper_decode_with_state, whisper_get_logits, whisper_get_logits_from_state, whisper_set_mel, whisper_set_mel_with_state, whisper_pcm_to_mel, whisper_pcm_to_mel_with_state.

Mid-decode callbacks (progress / new segment / logits filter / encoder start)

Each requires the same trampoline discipline as the abort callback and adds another Box<dyn FnMut> field to Params. None is wired into whispery's chunk-at-a-time pump.

Symbols: set_progress_callback*, set_new_segment_callback*, set_filter_logits_callback*, set_start_encoder_callback*.

Related: see #3 for an iterator-based output API and #4 for an async variant — but for true streaming partial-result emission, the new-segment callback is the underlying mechanism and would need wrapping.

Global logging hooks

Whispery routes diagnostics through its own eprintln! / tracing layer. whisper_set_log_callback is a global hook that fires across all instances; mixing it with Rust logging frameworks needs more design than a 1:1 port.

Symbols: whisper_set_log_callback, set_debug_mode, whisper_log_callback.

Buffer-load and custom-loader constructors

We support Context::new(path, params) only. Loading from an in-memory buffer or via a custom whisper_model_loader is rare and adds lifetime/ownership complexity.

Symbols: whisper_init_from_buffer_with_params, whisper_init_with_params (custom loader), whisper_model_loader.

Direct beam_search / greedy field accessors

Use Params::new(SamplingStrategy::BeamSearch { beam_size, patience }) / Greedy { best_of }. Direct whisper_full_params::beam_search.beam_size / patience field access isn't exposed.


Removed from this list

  • DTW token timestamps — landed on feat/dtw. Enable via ContextParams::with_dtw_token_timestamps(true) + with_dtw_aheads_preset(...); read per-token timing as Token::t_dtw() -> Option<i64>. See the README's DTW section for the full API.

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentationwontfixThis will not be worked on

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions