Serverless GPU worker for Qwen3-TTS-VoiceDesign model.
cd runpod_worker
docker build -t gleitfreude/script-read-voicedesign:latest .
docker push gleitfreude/script-read-voicedesign:latestGo to runpod.io/console/serverless:
- Click "New Endpoint"
- Docker image:
gleitfreude/script-read-voicedesign:latest - GPU: A40 (cheapest that fits 1.7B model) or RTX 4090
- Min workers: 0 (scale to zero when idle)
- Max workers: 1 (or more for concurrency)
- Idle timeout: 60s (keeps warm for 1 min after last request)
After creating, copy the endpoint ID and add to .env:
RUNPOD_ENDPOINT_ID=your_endpoint_id_here
In the app's Settings panel, change TTS Mode to "RunPod GPU".
- Cold start: ~30s (model loads into GPU memory)
- Warm request: ~5-10s per voice design
- A40: $0.39/hr → ~$0.003 per call (vs $0.20 on DashScope)