Whisper doesn’t support streaming — is there a roadmap for it? #2306
Replies: 3 comments
-
|
OpenAI's Whisper API (and the open-source model) does not natively support real-time streaming transcription cuz of offline batch processing, not low-latency streaming and API limitations.
|
Beta Was this translation helpful? Give feedback.
-
|
If you need speech-to-text with Whisper API in streaming mode, you can’t get true “word-by-word” streaming directly from OpenAI’s current Whisper endpoint — it only supports batch transcription. To achieve streaming behavior, the common and effective approach is:
|
Beta Was this translation helpful? Give feedback.
-
Whisper does not support streaming transcriptionThe Whisper API ( Alternatives for streaming/real-time STT1. OpenAI Realtime API (recommended) The Realtime API supports real-time audio streaming with low-latency transcription: import asyncio
from openai import AsyncOpenAI
# Use the Realtime API via WebSocket
# See: https://platform.openai.com/docs/api-reference/realtimeThe Realtime API is designed for live conversations and streams audio in real-time. 2. Use chunked processing with Whisper If you must use Whisper, you can simulate streaming by sending audio chunks: import audioop
import io
from openai import OpenAI
client = OpenAI()
async def transcribe_chunks(audio_stream, chunk_duration_ms=5000):
buffer = b""
for chunk in audio_stream:
buffer += chunk
if len(buffer) >= chunk_duration_ms * 32: # rough byte estimate
transcript = client.audio.transcriptions.create(
model="whisper-1",
file=("audio.wav", buffer),
response_format="text",
)
yield transcript
buffer = b""Note: This loses context between chunks, so words at boundaries may be cut. 3. Google Speech-to-Text or Deepgram For production streaming STT, consider services that natively support it:
SummaryOpenAI has not announced plans to add streaming to the Whisper API. The Realtime API is their solution for real-time audio use cases. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I have a challenge, that I want to convert speech-to-text using whisper API.
Note : I am expecting the streaming response only.
Beta Was this translation helpful? Give feedback.
All reactions