Skip to content

Add small accessors to the safe wrapper (tracking) #2

@uqio

Description

@uqio

Tracking issue for the "Could add on demand" section of whispercpp/TODO.md. Each item is a 5–15 line wrapper around an existing FFI symbol; none is required for current callers but each is justifiable when a concrete need appears.

When picking one off:

  • Confirm the FFI symbol is in whispercpp-sys/src/generated.rs. If not, extend the allowlists in whispercpp-sys/build.rs::generate_bindings_with_args().
  • Replicate the safety pattern of the closest existing wrapper (non-null pointer, lifetime tied to parent struct, no aliasing across threads). Document the SAFETY block.
  • cargo test -p whispercpp --features serde --lib should still pass.
  • Add a doc test or unit test for the new accessor.

Token / vocab

  • Context::token_to_bytes(token) -> Option<&[u8]> — non-UTF-8 byte sequences from BPE merges
  • Context::tokenize_one(text) -> Option<i32> — wraps whisper_token_id
  • Context::tokenize(text) -> Vec<i32> — wraps whisper_tokenize, useful as input to Params::set_tokens

Language helpers

  • Context::lang_id_for(name: &str) -> Option<i32> — reverse of detected_lang, wraps whisper_lang_id
  • pub const LANG_MAX_ID: i32 = … — wraps whisper_lang_max_id
  • Lang::full_name() -> &'static str — wraps whisper_lang_str_full ("english" vs "en")

Special-token accessors

  • Context::token_translate() -> i32, token_transcribe, token_prev, token_nosp, token_not, token_solm — force-prefix decoding seeds
  • Context::token_for_lang(Lang) -> i32 — wraps whisper_token_lang

Model + state introspection

  • Context::model_dims() -> ModelDims — small struct exposing whisper_model_n_audio_state, n_audio_head, n_audio_layer, n_text_state, n_text_head, n_text_layer, n_mels, model_ftype
  • pub fn version() -> &'static str — wraps whisper_get_whisper_version
  • State::n_mel_frames() -> i32 — wraps whisper_n_len_from_state

Timing helpers

  • Context::print_timings() — wraps whisper_print_timings
  • Context::reset_timings() — wraps whisper_reset_timings

Verify-against-existing

  • Token::posterior() -> f32 is currently Token::p() reading whisper_token_data.p. Verify it agrees with whisper_full_get_token_p_from_state under temperature / wildcard sampling.

See whispercpp/TODO.md for the full table.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions