Model
nvidia/parakeet-tdt-0.6b-v3 (NVIDIA Parakeet TDT 0.6B v3, CC-BY-4.0)
Why this model
The current audio catalog covers Whisper (large-v3 and large-v3-turbo), which is excellent for batch transcription but not streaming-native. Parakeet would complement it well for live use cases:
- Streaming-friendly architecture. TDT/RNN-T decoding is designed for incremental, low-latency output, which suits live captioning much better than sliding-window re-transcription with Whisper.
- On-device friendly size. At 0.6B parameters it fits comfortably in memory on iPhone-class devices, leaving headroom for the host app.
- Multilingual. v3 supports 25 European languages, including smaller ones (Dutch, in my case) that are poorly served by on-device alternatives.
- Proven demand on Apple silicon. Community CoreML ports already exist (e.g. FluidAudio), but a first-party Core AI export recipe with the ahead-of-time compilation and instant-load benefits would be a significant step up in reliability.
Use case
I build a live-captioning app for deaf and hard-of-hearing users (real-time, fully on-device, privacy-sensitive). A streaming-capable multilingual ASR model in the Core AI catalog would directly improve accessibility apps in this category.
Thanks for considering it.
Model
nvidia/parakeet-tdt-0.6b-v3 (NVIDIA Parakeet TDT 0.6B v3, CC-BY-4.0)
Why this model
The current audio catalog covers Whisper (large-v3 and large-v3-turbo), which is excellent for batch transcription but not streaming-native. Parakeet would complement it well for live use cases:
Use case
I build a live-captioning app for deaf and hard-of-hearing users (real-time, fully on-device, privacy-sensitive). A streaming-capable multilingual ASR model in the Core AI catalog would directly improve accessibility apps in this category.
Thanks for considering it.