Add ergonomic iterators to the safe API:
State::segments_iter() -> impl Iterator<Item = Segment<'_>>
Segment::tokens_iter() -> impl Iterator<Item = Token> (or Token<'_> if borrow-into-state needed)
Why
Today callers iterate via index:
for i in 0..state.n_segments() {
let seg = state.segment(i).unwrap();
for j in 0..seg.n_tokens() {
let tok = seg.token(j).unwrap();
...
}
}
An iterator API would let for seg in state.segments_iter() work directly, and compose with .filter().map().collect().
Why this isn't trivial
Segment and Token borrow from State via raw pointer (NonNull<sys::whisper_state>). A correct iterator needs to project through that borrow without aliasing, which is non-obvious in safe Rust. Specifically:
State::segments_iter(&self) returns an iterator that hands out Segment<'_> borrowing from &self. Each Segment carries a PhantomData<&'a ()>. This needs to be sound when multiple Segments are alive simultaneously (the whisper_state is shared but neither segment mutates it).
Segment::tokens_iter(&self) similar — multiple Token snapshots from one Segment.
Look at the existing State::segment(i) / Segment::token(tok_idx) patterns in whispercpp/src/state.rs for the borrow shape. The iterator just needs to drive an index counter and call those methods, but the lifetime annotations on the iterator type need to be careful.
Tests
- Empty state iterates zero times.
- Iterator length matches
n_segments() / n_tokens().
- Multiple iterators alive concurrently don't fight (the underlying whisper_state isn't mutated by reads).
- Miri (the existing CI job) should pass over the new iterator types.
From whispercpp/TODO.md § 3 "Larger work".
Add ergonomic iterators to the safe API:
State::segments_iter() -> impl Iterator<Item = Segment<'_>>Segment::tokens_iter() -> impl Iterator<Item = Token>(orToken<'_>if borrow-into-state needed)Why
Today callers iterate via index:
An iterator API would let
for seg in state.segments_iter()work directly, and compose with.filter().map().collect().Why this isn't trivial
SegmentandTokenborrow fromStatevia raw pointer (NonNull<sys::whisper_state>). A correct iterator needs to project through that borrow without aliasing, which is non-obvious in safe Rust. Specifically:State::segments_iter(&self)returns an iterator that hands outSegment<'_>borrowing from&self. EachSegmentcarries aPhantomData<&'a ()>. This needs to be sound when multipleSegments are alive simultaneously (thewhisper_stateis shared but neither segment mutates it).Segment::tokens_iter(&self)similar — multipleTokensnapshots from oneSegment.Look at the existing
State::segment(i)/Segment::token(tok_idx)patterns inwhispercpp/src/state.rsfor the borrow shape. The iterator just needs to drive an index counter and call those methods, but the lifetime annotations on the iterator type need to be careful.Tests
n_segments()/n_tokens().From
whispercpp/TODO.md§ 3 "Larger work".