A demonstration application for TCSinger2 - Customizable Multilingual Zero-shot Singing Voice Synthesis, optimized for macOS with Apple Metal GPU support.
β¨ Zero-shot Singing Voice Synthesis: Synthesize singing voices without training on specific singers π€ Timbre Control: Use audio prompts to define voice style and timbre π΅ Musical Control: Specify lyrics and musical notes for synthesis π₯οΈ macOS Optimized: Adapted for Apple Metal Performance Shaders (MPS) π¨ User-Friendly UI: Built with Gradio for easy interaction
TCSinger 2 is a state-of-the-art multilingual zero-shot singing voice synthesis model that supports:
- Style transfer from audio prompts
- Multi-level style control
- Cross-lingual synthesis
- Speech-to-singing conversion
Paper: TCSinger 2: Customizable Multilingual Zero-shot Singing Voice Synthesis (ACL 2025)
- macOS with Apple Silicon (M1/M2/M3) or Intel Mac
- Python 3.10 or higher
- At least 8GB RAM (16GB recommended)
git clone https://github.com/audiohacking/TCsinger-App.git
cd TCsinger-Apppython3 -m venv venv
source venv/bin/activateFor Apple Silicon (M1/M2/M3):
# Install PyTorch with MPS support
pip install torch torchvision torchaudio
# Install other dependencies
pip install -r requirements.txtFor Intel Macs:
pip install -r requirements.txtpython -c "import torch; print('PyTorch:', torch.__version__); print('MPS Available:', torch.backends.mps.is_available())"Launch the demo application:
python app/demo.pyThe application will start and open in your default browser at http://127.0.0.1:7860
python app/demo.py --helpAvailable options:
--model-path: Path to pretrained model checkpoint (optional)--share: Create a public share link for remote access--server-name: Server host (default: 127.0.0.1)--server-port: Server port (default: 7860)
Example:
python app/demo.py --share --server-port 8080-
Upload Audio Prompt:
- Upload a short audio clip (3-10 seconds) or record using your microphone
- This defines the timbre/voice style you want to replicate
-
Enter Lyrics:
- Type the text you want to be sung
- Supports multiple languages
-
Specify Musical Notes:
- Use note names:
C4 D4 E4 F4 G4 - Or MIDI numbers:
60 62 64 65 67 - Or mixed with rests:
C4 rest E4 G4
- Use note names:
-
Adjust Settings (Optional):
- CFG Scale: Controls adherence to prompts (1.0-10.0)
- Higher values = more faithful to input prompts
-
Synthesize:
- Click "Synthesize Singing Voice"
- Wait for processing
- Download or play the result
TCsinger-App/
βββ app/
β βββ __init__.py
β βββ config.py # Configuration settings
β βββ demo.py # Main Gradio application
βββ utils/
β βββ __init__.py
β βββ audio_utils.py # Audio processing utilities
β βββ model_loader.py # Model loading and inference
βββ models/ # Model checkpoints directory
βββ requirements.txt # Python dependencies
βββ README.md # This file
This demo is designed to work with TCSinger2 pretrained models. To use a pretrained model:
- Download model checkpoints from TCSinger2 repository
- Place checkpoints in the
models/directory - Launch with:
python app/demo.py --model-path models/your_checkpoint.pt
Note: Currently, the demo runs with a placeholder model for demonstration. For full functionality, you'll need to:
- Clone and set up the TCSinger2 repository
- Train or download pretrained models
- Integrate the actual model code
This application is optimized for Apple Metal GPUs:
- Automatically detects and uses MPS (Metal Performance Shaders) when available
- Falls back to CPU if MPS is not available
- Significantly faster inference on Apple Silicon Macs
To verify Metal support:
import torch
print(torch.backends.mps.is_available()) # Should print True on Apple Siliconpython -m pytest tests/# Format code
black app/ utils/
# Check style
flake8 app/ utils/- Custom model training interface
- Batch processing support
- Advanced audio preprocessing options
- More language support
- Export in multiple formats
- Integration with DAWs
- Ensure you're on macOS 12.3 or later with Apple Silicon
- Update to the latest PyTorch version
- Make sure all dependencies are installed:
pip install -r requirements.txt - Activate the virtual environment:
source venv/bin/activate
- Try adjusting the CFG scale
- Use a higher quality audio prompt
- Ensure lyrics and notes are properly aligned
If you use TCSinger2 in your research, please cite:
@article{zhang2025tcsinger,
title={TCSinger 2: Customizable Multilingual Zero-shot Singing Voice Synthesis},
author={Zhang, Yu and Guo, Wenxiang and Pan, Changhao and Yao, Dongyu and Zhu, Zhiyuan and Jiang, Ziyue and Wang, Yuhan and Jin, Tao and Zhao, Zhou},
journal={arXiv preprint arXiv:2505.14910},
year={2025}
}This project is for demonstration and research purposes. Please refer to the TCSinger2 repository for licensing information.
- Do not generate singing voices of public figures without permission
- Respect copyright and intellectual property rights
- Obtain consent before using someone's voice
- Follow all applicable laws and regulations
- TCSinger2 by Zhejiang University
- Gradio for the UI framework
- PyTorch for deep learning
- Apple Metal Performance Shaders for GPU acceleration
For issues related to:
- This demo app: Open an issue in this repository
- TCSinger2 model: Visit the official repository
Made with β€οΈ for the audio synthesis community