Description
Speech-to-text (STT) transcription works correctly on the first recording attempt, but consistently fails on all subsequent attempts within the same session. This affects the Linux Electron desktop app (v0.15.0) but does not reproduce on mobile browsers.
Steps to reproduce
- Open CodeNomad on Linux
- Click the microphone button and speak → transcription succeeds
- Click the microphone button a second time and speak → transcription fails with no visible error
- Subsequent attempts also fail
- Refresh the app (Ctrl+R) → first attempt works again → subsequent ones fail again
Observed behavior
Server debug logs (--log-level debug) show:
| Attempt |
Audio payload size |
Result |
| 1st |
61,154 bytes |
200 OK (965ms) |
| 2nd |
110 bytes |
400 "The audio file could not be decoded or its format is not supported" |
| 3rd |
110 bytes |
400 same error |
The 110-byte payload is just the WebM container header with no actual audio data.
Relevant server logs:
[INFO] speech.transcribe mimeType=audio/webm;codecs=opus bytes=61154 → status=200
[INFO] speech.transcribe mimeType=audio/webm;codecs=opus bytes=110
[WARN] speech.transcribe verbose_json failed; retrying default format
err=BadRequestError: 400 The audio file could not be decoded or its format is not supported.
→ status=502
Root cause
The MediaRecorder (or its underlying MediaStream) is reused between recordings without being re-created. On Chromium/Linux, stop() → start() on the same MediaRecorder instance does not properly reset the audio capture pipeline, resulting in near-empty audio chunks (~110 bytes) on all attempts after the first.
The same code path works correctly on mobile (Android/iOS) because their WebRTC/MediaRecorder implementations handle the lifecycle differently.
Expected behavior
Each microphone session should produce a valid audio recording, regardless of how many times the user starts/stops recording.
Suggested fix
Re-create a fresh MediaStream and MediaRecorder instance for each recording session rather than reusing the previous instance. Ensure proper cleanup (stream.getTracks().forEach(t => t.stop())) of the previous stream before creating a new one.
Environment
- CodeNomad version: 0.15.0 (Electron)
- OS: Arch Linux, PipeWire 1.6.6
- Browser/Engine: Electron/Chromium
- Microphone: Roland UA-22 (USB audio interface), also reproduced with USB webcam mic
- STT provider: OpenAI whisper-1 via openai-compatible adapter
Description
Speech-to-text (STT) transcription works correctly on the first recording attempt, but consistently fails on all subsequent attempts within the same session. This affects the Linux Electron desktop app (v0.15.0) but does not reproduce on mobile browsers.
Steps to reproduce
Observed behavior
Server debug logs (
--log-level debug) show:The 110-byte payload is just the WebM container header with no actual audio data.
Relevant server logs:
Root cause
The
MediaRecorder(or its underlyingMediaStream) is reused between recordings without being re-created. On Chromium/Linux,stop()→start()on the sameMediaRecorderinstance does not properly reset the audio capture pipeline, resulting in near-empty audio chunks (~110 bytes) on all attempts after the first.The same code path works correctly on mobile (Android/iOS) because their WebRTC/MediaRecorder implementations handle the lifecycle differently.
Expected behavior
Each microphone session should produce a valid audio recording, regardless of how many times the user starts/stops recording.
Suggested fix
Re-create a fresh
MediaStreamandMediaRecorderinstance for each recording session rather than reusing the previous instance. Ensure proper cleanup (stream.getTracks().forEach(t => t.stop())) of the previous stream before creating a new one.Environment