A comprehensive ComfyUI integration for Microsoft's VibeVoice text-to-speech model, enabling high-quality single and multi-speaker voice synthesis directly within your ComfyUI workflows.
-
Updated
Oct 2, 2025 - Python
A comprehensive ComfyUI integration for Microsoft's VibeVoice text-to-speech model, enabling high-quality single and multi-speaker voice synthesis directly within your ComfyUI workflows.
A ComfyUI custom node integration for multi-engine multi-language Text-to-Speech and Voice Conversion. Supports: RVC, Qwen3-TTS, Cozy Voice 3, Step Audio EditX, IndexTTS-2, Chatterbox (classic and multilingual 23-lang), F5-TTS, Higgs Audio 2 and VibeVoice with unlimited text length, SRT timing, Character support, and many audio tools
ComfyUI custom node for the VibeVoice TTS. Expressive, long-form, multi-speaker conversational audio
VibeVoiceFusion is a full-stack, multi-speaker voice generation web system featuring LoRA fine-tuning, batch generation, and VRAM optimization. Based on Microsoft's VibeVoice (AR + diffusion architecture)
Audiobook creation tool with support for multiple TTS models (MiraTTS, GLM-TTS, IndexTTS2, VibeVoice, Higgs V2, Fish S1-mini, Chatterbox, Oute), focused on high-quality output. Plus player/reader web app.
Beautiful voice app: record or upload to train a voice, generate speech from text or files, save & download voices.
Archive of the official Microsoft VibeVoice repository (7B & 1.5B). Backup of the deleted source code for the open-source TTS models, including the removed 7B version. Try the VibeVoice online service
HOW TO RUN MICROSOFT VIBEVOICE LOCALLY
Create multi-voice podcasts with AI text-to-speech
A Gradio-based demo for end-to-end vision-to-speech inference: Extract text or descriptions from images using Qwen2.5-VL-7B-Instruct, then convert to natural speech audio via Microsoft VibeVoice-Realtime-0.5B.
🐟 Enhance communication with Fish Speech, a powerful multilingual Text-to-Speech system featuring speaker management, auto-transcription, and emotion control.
🌊 Simplify configuration with VIBE, a readable, fast-format that eliminates complexity while enhancing clarity and structure in your development workflow.
A ready-to-use Google Colab notebook for running the open-source VibeVoice TTS model from Microsoft, using the quantized Large Q8 variant (~12 GB VRAM) for multi-speaker long-form audio generation
🎙️ Enhance voice synthesis with ComfyUI-Qwen3-TTS, featuring advanced voice cloning, emotion-aware ASR, and unlimited multi-role dubbing.
Add a description, image, and links to the vibevoice topic page so that developers can more easily learn about it.
To associate your repository with the vibevoice topic, visit your repo's landing page and select "manage topics."