Cosyvoice2 GRPO RL training recipe

Hi, we would like share the GRPO RL training recipe based on cosyvoice2 llm.


Recipe: https://github.com/nvidia-china-sae/mair-hub/tree/main/rl-tutorial/cosyvoice_llm

Here are the initial training results:

| Model                                                 | Seed-TTS `test_zh` CER | Cosyvoice3 `zero_shot_zh` |Comment                                                                        |
|-|------------------------------------------------------|------------------------|--------------------------------------------------------------------------------|
| Official CosyVoice2 LLM                               | 1.45 %             |4.08%| See the [paper](https://arxiv.org/abs/2412.10117)                              |
| + GRPO                | 1.37%             |**3.36%**|        
| SFT (initialized from Qwen2-0.5B-Instruct)            | 1.81 %                 |4.83%| See [PR #1887](https://github.com/k2-fsa/icefall/pull/1887)                    |
|  + GRPO        | **1.06 %**             |4.03%|        

We will add more experimental results as we continue refining the recipe.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cosyvoice2 GRPO RL training recipe #1463

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Model	Seed-TTS `test_zh` CER	Cosyvoice3 `zero_shot_zh`	Comment
Official CosyVoice2 LLM	1.45 %	4.08%	See the paper
+ GRPO	1.37%	3.36%
SFT (initialized from Qwen2-0.5B-Instruct)	1.81 %	4.83%	See PR #1887
+ GRPO	1.06 %	4.03%

Cosyvoice2 GRPO RL training recipe #1463

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions