Skip to content

Tsukasane/SLMSVS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Adapting Speech Language Model to Singing Voice Synthesis

SLMSVS Recipe

  1. Tokenization of music score conditions and singing waveforms.
  2. Multi-stream language model token prediction.
  3. Conditional flow matching-based mel-spectrogram generation.
  4. A mel-to-wave vocoder.

This repo contain scripts for stage3.

Usage

We use a conditional flow matching model, converting the source Gaussian noise to the target mel spectrogram conditioned on the codec token predicted by SLM.

Please directly modify the path and config at the main entry flow.py.

python flow.py

TODOs

  • Update stage1 and stage2 processing scripts to a ESPnet local fork.
  • Update stage4 processing scripts to a ParallelWaveGAN local fork.

Acknowledgements

We thank INSPIREMUSIC, Matcha-TTS for releasing their code. Our work also based on OpusLM and ESPnet-Codec.

About

Adapting Speech Language Model to Singing Voice Synthesis. (NeurIPSW 2025)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors