feat(voice): pluggable voice backend with Gemini Live & Qwen Realtime#401
feat(voice): pluggable voice backend with Gemini Live & Qwen Realtime#401AIMO3D-ob wants to merge 1 commit intotiann:mainfrom
Conversation
Add a strategy-based voice backend architecture that allows switching between ElevenLabs ConvAI, Gemini Live API, and Qwen Realtime via the VOICE_BACKEND environment variable. New backends: - Gemini 2.5 Live (gemini-live): WebSocket + AudioWorklet audio pipeline, full function calling support for messageCodingAgent/processPermissionRequest - Qwen Realtime (qwen-realtime): DashScope API via Hub WebSocket proxy, voice conversation support (function calling not yet supported by model) Architecture: - VoiceBackendSession dynamically selects backend via GET /voice/backend - React.lazy() code splitting — alternative backends not bundled when unused - Hub routes: GET /voice/backend, POST /voice/gemini-token, POST /voice/qwen-token - Hub WebSocket proxy at /api/voice/qwen-ws for Qwen (browser can't set Auth header) - Inline Blob URL AudioWorklet for Vite compatibility - Auto mic mute during model speech to prevent barge-in from ambient noise - Tool-call-optimized system prompt (Chinese, no greeting turn) - PWA skipWaiting + clientsClaim for immediate deployment activation Switch via environment: VOICE_BACKEND=gemini-live GEMINI_API_KEY=xxx VOICE_BACKEND=qwen-realtime DASHSCOPE_API_KEY=xxx VOICE_BACKEND=elevenlabs ELEVENLABS_API_KEY=xxx (default, unchanged)
There was a problem hiding this comment.
Findings
- [Blocker] Qwen WebSocket proxy bypasses API auth and can be opened without a JWT, which lets any reachable client consume the hub's DashScope credentials through
/api/voice/qwen-ws. Evidencehub/src/web/server.ts:328. - [Major] The fallback voice backend is now
gemini-live, so existing installs that only configured ElevenLabs will be routed away from the existing token flow and fail voice startup. Evidenceshared/src/voice.ts:280,hub/src/web/routes/voice.ts:121. - [Major] The Qwen frontend still requires the hub to return a raw DashScope key even though the browser never uses it after switching to the hub WebSocket proxy, so every authenticated web client now receives a long-lived provider secret unnecessarily. Evidence
web/src/realtime/QwenVoiceSession.tsx:84,hub/src/web/routes/voice.ts:162.
Summary
Review mode: initial
Three findings. Added coverage does not exercise the new /api/voice/qwen-ws auth boundary, and the new route tests would not catch the default-backend regression because they do not assert against DEFAULT_VOICE_BACKEND.
Testing
- Not run (automation):
bunis not installed in this runner.
HAPI Bot
| return socketHandler.fetch(req, server as never) | ||
| } | ||
| // Qwen Realtime WebSocket proxy | ||
| if (url.pathname === '/api/voice/qwen-ws') { |
There was a problem hiding this comment.
[BLOCKER] This branch upgrades /api/voice/qwen-ws before the request ever reaches app.fetch(req), so it bypasses createAuthMiddleware() entirely. As written, any client that can reach the hub can open a proxied DashScope session with the server-side API key.
Suggested fix:
const token = url.searchParams.get('token')
if (!token) {
return new Response('Missing authorization token', { status: 401 })
}
await jwtVerify(token, options.jwtSecret, { algorithms: ['HS256'] })
const upgraded = server.upgrade(req, {
data: { _qwenProxy: true, apiKey, model }
})| export const QWEN_REALTIME_MODEL = 'qwen3-omni-flash-realtime' | ||
| export const QWEN_REALTIME_VOICE = 'Mia' | ||
|
|
||
| export const DEFAULT_VOICE_BACKEND: VoiceBackendType = 'gemini-live' |
There was a problem hiding this comment.
[MAJOR] This flips the implicit backend from ElevenLabs to Gemini Live. On a hub that only has ELEVENLABS_API_KEY configured, /api/voice/backend now resolves to gemini-live, and VoiceBackendSession will send users into /api/voice/gemini-token instead of the existing ElevenLabs flow.
Suggested fix:
export const DEFAULT_VOICE_BACKEND: VoiceBackendType = 'elevenlabs'|
|
||
| // Get API key from hub | ||
| const tokenResp = await fetchQwenToken(this.api) | ||
| if (!tokenResp.allowed || !tokenResp.apiKey) { |
There was a problem hiding this comment.
[MAJOR] The browser does not use this provider key after the Qwen proxy change. startSession() only checks that apiKey exists, then opens /api/voice/qwen-ws, where the hub injects its own Authorization header. Returning a long-lived DashScope key here needlessly leaks it to every authenticated web client.
Suggested fix:
const tokenResp = await fetchQwenToken(this.api)
if (!tokenResp.allowed) {
const msg = tokenResp.error ?? 'DashScope API key not available'
state.statusCallback?.('error', msg)
throw new Error(msg)
}
state.wsBaseUrl = tokenResp.wsUrl || null
tiann
left a comment
There was a problem hiding this comment.
Thank you for your contribution. I believe this is a good feature. Please fix the comments first.
Summary
Add a pluggable voice backend architecture that extends the existing ElevenLabs ConvAI integration with two new voice providers:
gemini-live): Google's real-time audio streaming API via WebSocket, with full function calling support formessageCodingAgentandprocessPermissionRequestqwen-realtime): Alibaba's DashScope real-time voice API via Hub WebSocket proxy, supporting voice conversation (function calling pending model support)Users can switch backends via the
VOICE_BACKENDenvironment variable. The existing ElevenLabs integration remains the default and is completely unchanged.Key Design Decisions
GET /voice/backendlets the frontend detect the active backend without Vite rebuildReact.lazy()ensures alternative backends are only loaded when active?urlimport to avoid MIME type issues in production builds/api/voice/qwen-wsbecause browser WebSocket API cannot setAuthorizationheadersskipWaiting+clientsClaimto service worker for instant deployment updatesConfiguration
Files Changed
shared/src/voice.tshub/src/web/routes/voice.tshub/src/web/server.tsweb/src/api/client.ts,voice.tsweb/src/realtime/GeminiLiveVoiceSession.tsxweb/src/realtime/QwenVoiceSession.tsxweb/src/realtime/gemini/web/src/realtime/VoiceBackendSession.tsxweb/src/components/SessionChat.tsxVoiceBackendSessioninstead ofRealtimeVoiceSessionweb/src/sw.tsskipWaiting+clientsClaimhub/src/web/routes/voice.test.ts,pcmUtils.test.ts,toolAdapter.test.tsTest Plan