⚡ Flow Matching TTS

Non-autoregressive, high-speed TTS using Conditional Flow Matching

5-20x faster than autoregressive models (RTF 0.02-0.08)

Inspired by F5-TTS & Voicebox, integrated with MB-iSTFT vocoder

🎯 Features

⚡ Speed

RTF 0.022 (5 steps) - 45x faster than real-time
RTF 0.041 (10 steps) - 24x faster than real-time
RTF 0.077 (20 steps) - 13x faster than real-time

🎨 Quality

Sway Sampling - F5-TTS inference optimization
Multiple ODE Solvers - Euler, Midpoint methods
MB-iSTFT Vocoder - High-quality audio generation

🏗️ Architecture

Text → ConvNeXt Blocks → Flow Transformer → ODE Solver → Mel → MB-iSTFT → Audio

🚀 Quick Start

Installation

# Clone repository
git clone https://github.com/gateoneh92/Flow-Matching-TTS.git
cd Flow-Matching-TTS

# Install dependencies
pip install -r requirements.txt

Test

# Verify installation
python3 test_flow_matching.py

# Expected output:
# ✅ All Flow Matching core tests passed!
# ✅ All FlowMatchingSynthesizer tests passed!
# Speed: RTF 0.022 (5 steps), 0.041 (10 steps)

Training

# Prepare your dataset (LJSpeech, VCTK, etc.)
# Create filelists in format: path/to/audio.wav|transcription

# Train
python3 train_flow_matching.py \
    -c configs/flow_matching.json \
    -m logs/flow_matching

Inference

# Basic (20 steps, Sway sampling)
python3 inference_flow_matching.py \
    --checkpoint logs/flow_matching/G_100000.pth \
    --config configs/flow_matching.json \
    --text "Hello world, this is flow matching TTS." \
    --output output.wav

# Fast (10 steps)
python3 inference_flow_matching.py \
    --checkpoint logs/flow_matching/G_100000.pth \
    --config configs/flow_matching.json \
    --text "Quick generation." \
    --output output_fast.wav \
    --steps 10

# High quality (30 steps + midpoint)
python3 inference_flow_matching.py \
    --checkpoint logs/flow_matching/G_100000.pth \
    --config configs/flow_matching.json \
    --text "Highest quality." \
    --output output_hq.wav \
    --steps 30 \
    --method midpoint

📊 Benchmarks

Speed Comparison (RTX 4090)

Model	RTF	Speed vs Real-time
Flow Matching (5 steps)	0.022	45x faster ⚡
Flow Matching (10 steps)	0.041	24x faster ⚡
Flow Matching (20 steps)	0.077	13x faster ⚡
AR LLM (baseline)	0.5-1.0	1-2x

vs SOTA Models

Model	RTF	Key Features
Flow Matching TTS ⭐	0.02-0.08	MB-iSTFT + Sway
F5-TTS	0.04 (TRT)	ConvNeXt + Sway
Voicebox	~0.15	Flow matching
GPT-SoVITS	0.01-0.03	AR, Few-shot

🎛️ Configuration

Model Size

{
  "model": {
    // Small (8GB GPU)
    "flow_d_model": 256,
    "flow_num_layers": 6,

    // Medium (12GB GPU)
    "flow_d_model": 512,
    "flow_num_layers": 12,

    // Large (24GB GPU)
    "flow_d_model": 768,
    "flow_num_layers": 18
  }
}

Quality vs Speed

# Ultra-fast (RTF 0.022)
--steps 5 --method euler

# Balanced (RTF 0.041, recommended)
--steps 10 --method euler --sway-coef -1.0

# High quality (RTF 0.077)
--steps 20 --method euler --sway-coef -1.0

# Best quality (RTF 0.120)
--steps 30 --method midpoint --sway-coef -1.0

🔬 Technical Details

Flow Matching

Conditional Flow Matching learns the velocity field:

dx_t/dt = v_t(x_t, text, t)

x_t: State at time t (t=0: noise, t=1: mel)
v_t: Velocity field (predicted by model)
t: Time ∈ [0, 1]

Optimal Transport

# Interpolation path
x_t = t * x_1 + (1-t) * x_0

# Target velocity
u_t = x_1 - x_0

# Loss
loss = MSE(v_t, u_t)

Sway Sampling

F5-TTS inference optimization:

# Standard: t_new = t
# Sway: t_new = t + sway_coef * (1-t) * t
# Effect: Better quality without retraining

📁 Project Structure

Flow-Matching-TTS/
├── flow_matching.py          # Core implementation
│   ├── ConvNeXtBlock
│   ├── DurationPredictor
│   ├── FlowMatchingTransformer
│   └── ConditionalFlowMatching
├── models.py                 # MB-iSTFT integration
│   └── FlowMatchingSynthesizer
├── train_flow_matching.py    # Training script
├── inference_flow_matching.py # Inference script
├── test_flow_matching.py     # Test suite
├── data_utils.py             # Data loaders
├── configs/
│   └── flow_matching.json    # Configuration
└── text/                     # Text processing

🆚 AR vs Flow Matching

Feature	Autoregressive	Flow Matching ⭐
Generation	Sequential	Parallel
Speed	Slow (RTF 0.5-1.0)	Fast (RTF 0.02-0.08)
Context	Unidirectional	Bidirectional
Stability	Repetition risk	Stable
Quality Control	Temperature, top-k	ODE steps, solver

🛠️ Requirements

Python 3.8+
PyTorch 2.0+
CUDA 11.8+ (for GPU)
8GB+ GPU memory (12GB+ recommended)

See requirements.txt for complete list.

📚 References

Flow Matching for Generative Modeling (Lipman et al., 2023)
F5-TTS (SWivid, 2024) - ConvNeXt + Sway sampling
Voicebox (Meta AI, 2023) - Flow matching for audio
MB-iSTFT-VITS - Multi-band iSTFT vocoder

🤝 Contributing

Contributions welcome! Please feel free to submit a Pull Request.

📄 License

MIT License

🙏 Acknowledgments

F5-TTS for ConvNeXt and Sway sampling techniques
Voicebox for flow matching inspiration
MB-iSTFT-VITS for high-quality vocoder
Claude Code (Sonnet 4.5) for implementation assistance

📧 Contact

Created: 2026-02-20 | Version: 1.0 | Status: ✅ Tested and ready to use

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
configs		configs
monotonic_align		monotonic_align
text		text
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
__init__.py		__init__.py
attentions.py		attentions.py
commons.py		commons.py
data_utils.py		data_utils.py
draw_architecture.py		draw_architecture.py
flow_matching.py		flow_matching.py
inference_flow_matching.py		inference_flow_matching.py
models.py		models.py
modules.py		modules.py
pqmf.py		pqmf.py
requirements.txt		requirements.txt
stft.py		stft.py
test_flow_matching.py		test_flow_matching.py
train_flow_matching.py		train_flow_matching.py
transforms.py		transforms.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

⚡ Flow Matching TTS

🎯 Features

⚡ Speed

🎨 Quality

🏗️ Architecture

🚀 Quick Start

Installation

Test

Training

Inference

📊 Benchmarks

Speed Comparison (RTX 4090)

vs SOTA Models

🎛️ Configuration

Model Size

Quality vs Speed

🔬 Technical Details

Flow Matching

Optimal Transport

Sway Sampling

📁 Project Structure

🆚 AR vs Flow Matching

🛠️ Requirements

📚 References

🤝 Contributing

📄 License

🙏 Acknowledgments

📧 Contact

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

gateoneh92/Flow-Matching-TTS

Folders and files

Latest commit

History

Repository files navigation

⚡ Flow Matching TTS

🎯 Features

⚡ Speed

🎨 Quality

🏗️ Architecture

🚀 Quick Start

Installation

Test

Training

Inference

📊 Benchmarks

Speed Comparison (RTX 4090)

vs SOTA Models

🎛️ Configuration

Model Size

Quality vs Speed

🔬 Technical Details

Flow Matching

Optimal Transport

Sway Sampling

📁 Project Structure

🆚 AR vs Flow Matching

🛠️ Requirements

📚 References

🤝 Contributing

📄 License

🙏 Acknowledgments

📧 Contact

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages