Hi, we would like share the GRPO RL training recipe based on cosyvoice2 llm.
Recipe: https://github.com/nvidia-china-sae/mair-hub/tree/main/rl-tutorial/cosyvoice_llm
Here are the initial training results:
| Model |
Seed-TTS test_zh CER |
Cosyvoice3 zero_shot_zh |
Comment |
| Official CosyVoice2 LLM |
1.45 % |
4.08% |
See the paper |
| + GRPO |
1.37% |
3.36% |
|
| SFT (initialized from Qwen2-0.5B-Instruct) |
1.81 % |
4.83% |
See PR #1887 |
| + GRPO |
1.06 % |
4.03% |
|
We will add more experimental results as we continue refining the recipe.
Hi, we would like share the GRPO RL training recipe based on cosyvoice2 llm.
Recipe: https://github.com/nvidia-china-sae/mair-hub/tree/main/rl-tutorial/cosyvoice_llm
Here are the initial training results:
test_zhCERzero_shot_zhWe will add more experimental results as we continue refining the recipe.