Support GRPO Training Recipe for Speech LLM by yuekaizhang · Pull Request #117 · wenet-e2e/west

yuekaizhang · 2026-01-30T08:02:47Z

Support Matrix

Models

Model	HuggingFace
Qwen2.5-Omni-3B	Qwen/Qwen2.5-Omni-3B
Qwen2.5-Omni-7B	Qwen/Qwen2.5-Omni-7B
Qwen2-Audio-7B-Instruct	Qwen/Qwen2-Audio-7B-Instruct

Results

Model	MMAU (v05.15.25)	MMSU
Qwen2.5-Omni-3B	69.8	59.1
+ GRPO	71.6	60.46
Qwen2.5-Omni-7B	72.1	58.56
+ GRPO	73.4	65.38
Qwen2-Audio-7B	56.9	30.38
+ GRPO	67.2	54.12

Copilot

Pull request overview

Adds a GRPO (Group Relative Policy Optimization) training/eval recipe for speech/audio-capable LLMs (Qwen2-Audio / Qwen2.5-Omni), including dataset adapters, reward functions, trainer implementation, and runnable example scripts.

Changes:

Introduces a custom GRPOTrainer implementing rollout, reward computation, KL penalty, and GRPO loss.
Adds HuggingFace audio dataset wrapper + prompt templates + reward functions for <answer> / <think> formatting.
Adds training and vLLM-based evaluation scripts plus an example recipe (DeepSpeed configs, README, helper scripts).

Reviewed changes

Copilot reviewed 10 out of 14 changed files in this pull request and generated 12 comments.

Show a summary per file

File	Description
west/utils/rewards.py	Adds reward functions used for GRPO training.
west/utils/constants.py	Adds prompt templates and a template map.
west/trainer/grpo_trainer.py	Implements GRPO training loop on top of `transformers.Trainer`.
west/trainer/init.py	Package marker (currently empty).
west/dataset/hf_dataset.py	Adds HF dataset loader + collator for audio QA training/eval.
west/bin/train_grpo.py	Adds GRPO training entrypoint for Qwen audio/omni models.
west/bin/decode_mmsu.py	Adds vLLM-based MMSU decoding + accuracy reporting.
west/bin/decode_mmau.py	Adds vLLM-based MMAU decoding.
examples/grpo/scripts/download_mmau_test.sh	Adds helper to download MMAU test-mini audio set.
examples/grpo/run.sh	Adds end-to-end example runner (prepare/train/eval stages).
examples/grpo/requirements.txt	Adds example-specific Python dependencies.
examples/grpo/conf/ds_zero3.json	Adds DS ZeRO-3 config for the example.
examples/grpo/conf/ds_zero1.json	Adds DS ZeRO-1 config for the example.
examples/grpo/README.md	Documents the GRPO recipe and reported results.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

root and others added 19 commits January 26, 2026 17:30

init grpo

d4f5b84

clean code

0fbb93d

support qwen omni processor

32faa43

unified parameters

bcd111a

refactor decoding file

b8ee5eb

using new template for decoding

b7903b3

fix qwen omni training

9b83aa1

disable decode logging

67c7771

init opd

ad1cb3e

start to implement kd

b446c2c

add audios into meta_data

d23e723

add opd trainer

ce7aafd

add reward template, reward weight

9c1229c

add min max response len metrics

e14fa1a

fix kd metrics reward func name

fb2e452

add mmsu data, add template constant

00ad021

support max_audio for qwen2-audio encoder

2cf416c

remove kd trainer

6ba4982

update results

96ac59c

Copilot AI review requested due to automatic review settings January 30, 2026 08:02

Copilot started reviewing on behalf of yuekaizhang January 30, 2026 08:03 View session

Copilot AI reviewed Jan 30, 2026

View reviewed changes

yuekaizhang added 2 commits January 30, 2026 18:10

change default ds to zero1 for h20 gpu

023fdfc

fix lint

b9f478f

robin1001 merged commit 65edd9d into wenet-e2e:main Feb 3, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support GRPO Training Recipe for Speech LLM#117

Support GRPO Training Recipe for Speech LLM#117
robin1001 merged 21 commits into
wenet-e2e:mainfrom
yuekaizhang:rl

yuekaizhang commented Jan 30, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

yuekaizhang commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Support Matrix

Models

Results

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yuekaizhang commented Jan 30, 2026 •

edited

Loading