Skip to content

feat(tts): add Microsoft Edge TTS provider#168

Open
hagope wants to merge 1 commit into
grinev:mainfrom
hagope:feat/edge-tts-provider
Open

feat(tts): add Microsoft Edge TTS provider#168
hagope wants to merge 1 commit into
grinev:mainfrom
hagope:feat/edge-tts-provider

Conversation

@hagope

@hagope hagope commented Jun 21, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds Microsoft Edge TTS as a fourth TTS provider (TTS_PROVIDER=edge). It uses Microsoft Edge's online Read Aloud service via WebSocket — no API key, account, or TTS_API_URL required; only outbound HTTPS/WebSocket access to speech.platform.bing.com.

Implementation ports the protocol from the Python edge-tts reference.

How it works

  • DRM token (Sec-MS-GEC): SHA256 of Windows file-time ticks (rounded to 5 min) + trusted client token. Includes a single 403 retry that corrects clock skew from the server's Date header.
  • SSML framing: prosody rate/volume/pitch, with 4096-byte UTF-8-safe chunking that never splits multi-byte characters or XML entities.
  • WebSocket protocol: binary audio frames use a 2-byte big-endian header-length prefix (length value includes the trailing \r\n), so audio starts at offset 2 + headerLength; text frames signal turn.end per chunk.
  • Output format: audio-24khz-48kbitrate-mono-mp3.

Changes

  • New: src/app/services/edge-tts.ts + tests/app/services/edge-tts.test.ts
  • Wire 'edge' provider into src/config.ts and src/app/services/tts-service.ts
  • isTtsConfigured() returns true for edge (no credentials needed)
  • Document provider in .env.example and PRODUCT.md
  • Adds ws as a runtime dependency (Node 20+ target; the global WebSocket is only stable from Node 22+)

Configuration

TTS_PROVIDER=edge
TTS_VOICE=en-US-EmmaMultilingualNeural

Verification

  • npm run build
  • npm run lint ✓ (zero warnings)
  • npm test — all new tests pass (16 in edge-tts.test.ts + 4 edge cases in tts-service.test.ts)
    • Note: 1 pre-existing test failure in tests/config.test.ts (env-var leakage of TTS_MODEL) confirmed to also fail on main before this PR
  • Verified end-to-end against the real Edge TTS service: produced a valid 2.5s MP3 (48 kbps, 24 kHz mono)

Adds a fourth TTS provider that uses Microsoft Edge's online Read
Aloud service via WebSocket. No API key, account, or TTS_API_URL
required; only outbound HTTPS/WebSocket access to
speech.platform.bing.com.

Implementation ports the protocol from the Python edge-tts reference
(https://github.com/rany2/edge-tts):
- Sec-MS-GEC DRM token: SHA256 of Windows file-time ticks (rounded to
  5 min) + trusted client token, with single 403 retry that corrects
  clock skew from the server Date header
- SSML framing with prosody rate/volume/pitch and 4096-byte UTF-8-safe
  chunking that never splits multi-byte chars or XML entities
- WebSocket message handling: binary frames use a 2-byte big-endian
  header-length prefix (length includes the trailing CRLF), audio
  starts at offset 2 + headerLength; text frames signal turn.end per
  chunk

Adds ws as a runtime dependency (Node 20+ target; global WebSocket is
only stable from Node 22+).

- New: src/app/services/edge-tts.ts + tests
- Wire 'edge' provider into config.ts and tts-service.ts
- isTtsConfigured() returns true for edge (no credentials needed)
- Document provider in .env.example and PRODUCT.md
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant