Skip to content

Model request: NVIDIA Parakeet TDT 0.6B v3 (multilingual streaming ASR) #7

@owisscha

Description

@owisscha

Model

nvidia/parakeet-tdt-0.6b-v3 (NVIDIA Parakeet TDT 0.6B v3, CC-BY-4.0)

Why this model

The current audio catalog covers Whisper (large-v3 and large-v3-turbo), which is excellent for batch transcription but not streaming-native. Parakeet would complement it well for live use cases:

  • Streaming-friendly architecture. TDT/RNN-T decoding is designed for incremental, low-latency output, which suits live captioning much better than sliding-window re-transcription with Whisper.
  • On-device friendly size. At 0.6B parameters it fits comfortably in memory on iPhone-class devices, leaving headroom for the host app.
  • Multilingual. v3 supports 25 European languages, including smaller ones (Dutch, in my case) that are poorly served by on-device alternatives.
  • Proven demand on Apple silicon. Community CoreML ports already exist (e.g. FluidAudio), but a first-party Core AI export recipe with the ahead-of-time compilation and instant-load benefits would be a significant step up in reliability.

Use case

I build a live-captioning app for deaf and hard-of-hearing users (real-time, fully on-device, privacy-sensitive). A streaming-capable multilingual ASR model in the Core AI catalog would directly improve accessibility apps in this category.

Thanks for considering it.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions