Adapting Speech Language Model to Singing Voice Synthesis

SLMSVS Recipe

Tokenization of music score conditions and singing waveforms.
Multi-stream language model token prediction.
Conditional flow matching-based mel-spectrogram generation.
A mel-to-wave vocoder.

This repo contain scripts for stage3.

For stage1, please follow the ACE-Opencpop Recipe. For stage2, please follow the instruction at ESPnet-Speechlm branch. You may also refer to the local fork for these two stages.
For stage4, please follow HIFIGAN training in ParallelWaveGAN or refer to the local fork.

Usage

We use a conditional flow matching model, converting the source Gaussian noise to the target mel spectrogram conditioned on the codec token predicted by SLM.

Please directly modify the path and config at the main entry flow.py.

python flow.py

TODOs

Update stage1 and stage2 processing scripts to a ESPnet local fork.
Update stage4 processing scripts to a ParallelWaveGAN local fork.

Acknowledgements

We thank INSPIREMUSIC, Matcha-TTS for releasing their code. Our work also based on OpusLM and ESPnet-Codec.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
datasets		datasets
modules		modules
utils		utils
.gitignore		.gitignore
README.md		README.md
flow.py		flow.py
run_flow.sh		run_flow.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Adapting Speech Language Model to Singing Voice Synthesis

SLMSVS Recipe

Usage

TODOs

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Adapting Speech Language Model to Singing Voice Synthesis

SLMSVS Recipe

Usage

TODOs

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages