For CosyVoice 3 posted metrics, what is the prompting setup for them? Is it following the exact setup as in example.py ('You are a helpful assistant.<|endofprompt|>" + target text)? Also, does the text normalization method (WeText vs TTSFRD) matter?
I ask because the results I am getting are slightly different from published for CosyVoice3 0.5B no RL, (seedtts-zh bolded are mine: 0.780 -> 0.7786 SIM and 1.16 -> 1.341 WER).
If I am prompting Cosyvoice correctly I might need to check my evaluation pipeline. (For example, if Paraformer transcribes a different character with exact same pronunciation, should that count towards WER?)
For CosyVoice 3 posted metrics, what is the prompting setup for them? Is it following the exact setup as in example.py ('You are a helpful assistant.<|endofprompt|>" + target text)? Also, does the text normalization method (WeText vs TTSFRD) matter?
I ask because the results I am getting are slightly different from published for CosyVoice3 0.5B no RL, (seedtts-zh bolded are mine: 0.780 -> 0.7786 SIM and 1.16 -> 1.341 WER).
If I am prompting Cosyvoice correctly I might need to check my evaluation pipeline. (For example, if Paraformer transcribes a different character with exact same pronunciation, should that count towards WER?)