Skip to content

Latest commit

 

History

History
233 lines (157 loc) · 6.19 KB

File metadata and controls

233 lines (157 loc) · 6.19 KB

FlowTTS

Next-Generation Low-Latency Conversational Speech Synthesis

TRTC Python Node.js License PRs Welcome

English | 简体中文


FlowTTS: A next-generation low-latency speech synthesis system with voice cloning and human-like expression capabilities. It naturally presents filler words, emotions, and paralinguistic details, making AI in dialogue scenarios "sound like a real person".

Online Demo

Features

  • Ultra-Low Latency: Streaming SSE API with Keep-Alive connection
  • Voice Cloning: Create custom voice by submitting audio samples
  • Human-like Expression: Natural filler words, emotions, and paralinguistic details
  • Multi-language Support: Chinese/English/Japanese/Korean/Cantonese/Arabic/Indonesian/Thai

Models

Model Use Case Features
flow_02_turbo Conversational (Latest) Ultra-low latency, high quality, supports Chinese/English/Japanese/Korean/Cantonese/Arabic/Indonesian/Thai
flow_01_turbo Conversational Ultra-low latency, high quality, supports Chinese/English/Japanese/Korean/Cantonese/Arabic/Indonesian/Thai

Recommended: Pass an empty string "" for the Model field to automatically use the latest model without specifying a version.

Voice List

Quick Start

1. Enable Service

FlowTTS is built on TRTC AI Conversation solution. You need to enable one of the following:

  • AI Recognition Package (Lite/Premium)
  • TRTC Monthly Plus Plan

See TRTC Activation & Billing

2. Install Dependencies

Python

cd examples/python
pip install -r requirements.txt

Note: Please ensure you install the latest version of Tencent Cloud SDK (>=3.0.1200) for full TTS feature support.

Node.js

cd examples/nodejs
npm install

Requires Node.js >= 18.

3. Configure Environment Variables

cp .env.example .env

Edit .env with your Tencent Cloud credentials:

TENCENTCLOUD_SECRET_ID=your_secret_id_here
TENCENTCLOUD_SECRET_KEY=your_secret_key_here
TENCENTCLOUD_SDK_APP_ID=1400000000

Get credentials from Tencent Cloud Console

4. Run Examples

Python

# Streaming TTS
python examples/python/example_streaming.py

# Non-streaming TTS
python examples/python/example_non_streaming.py

# Voice cloning
python examples/python/example_voice_clone.py

# WebSocket bidirectional streaming
python examples/python/example_ws_bidirection.py

Node.js

cd examples/nodejs

# Streaming TTS
node example_streaming.js

# Non-streaming TTS
node example_non_streaming.js

# Voice cloning
node example_voice_clone.js

# WebSocket bidirectional streaming
node example_ws_bidirection.js

Voice Clone Example

# 1. Prepare audio sample (16kHz mono WAV, 10-180 seconds)
cp your_voice.wav test_data/clone_sample.wav

# 2. Clone voice and get voice_id
python examples/python/example_voice_clone.py

# 3. Use the returned voice_id in example_streaming.py for TTS
# Update VOICE_CONFIG["VoiceId"] with the cloned voice_id
python examples/python/example_streaming.py

Configuration

Voice Parameters

Parameter Range Description
Speed 0.5 ~ 2.0 Speech speed
Volume 0.01 ~ 10 Volume level (must be > 0)
Pitch -12 ~ 12 Pitch adjustment

Audio Format

API Type Formats Sample Rates
Streaming (SSE) pcm 16000, 24000
Non-streaming pcm, wav, mp3 16000, 24000

Default format: pcm, default sample rate: 24000

API Endpoint

Different APIs use different endpoints:

API Endpoint
Streaming SSE (TextToSpeechSSE) trtc.ai.tencentcloudapi.com
Non-streaming (TextToSpeech) trtc.tencentcloudapi.com
Voice Clone (VoiceClone) trtc.tencentcloudapi.com

Keep-Alive Connection

The SDK supports HTTP Keep-Alive to reuse TCP connections and reduce latency:

Python

http_profile = HttpProfile()
http_profile.keepAlive = True        # Enable Keep-Alive
http_profile.pre_conn_pool_size = 3  # Connection pool size
Parameter Description
keepAlive Reuses TCP connections, avoids repeated handshakes, reduces latency for subsequent requests
pre_conn_pool_size Pre-established connection pool size, connections are ready before first request

With Keep-Alive enabled, consecutive requests save approximately 50-100ms of connection establishment time

Node.js

Node.js HTTP agent supports connection reuse by default, no additional configuration needed.

API Documentation

TRTC AI Conversation Integration

Add TTS configuration in TRTC AI Conversation settings, TTSConfig:

{
  "TTSType": "flow",
  "VoiceId": "your_voice_id",
  "Model": "",
  "Speed": 1.0,
  "Volume": 1.0,
  "Pitch": 0,
  "Language": "zh"
}

Language Codes (ISO 639-1)

Language Code
Chinese zh
English en
Japanese ja
Korean ko
Cantonese yue
Arabic ar
Indonesian id
Thai th

License

MIT License - see LICENSE for details.