support on policy distillation#118
Merged
Merged
Conversation
There was a problem hiding this comment.
Pull request overview
This PR implements on-policy knowledge distillation training for audio language models, where a student model learns to match a teacher model's distribution on its own generated samples to avoid distribution shift issues.
Changes:
- Adds two trainer classes for on-policy distillation: local teacher mode and remote API-based teacher mode
- Extends datasets to include question and choices metadata for reward computation
- Adds training script and complete example with evaluation pipeline
Reviewed changes
Copilot reviewed 11 out of 13 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| west/utils/constants.py | Adds audio caption template for captioning tasks |
| west/trainer/kd_trainer.py | Implements KnowledgeDistillationTrainer and RemoteKnowledgeDistillationTrainer classes |
| west/dataset/hf_dataset.py | Extends dataset to include question/choices metadata in collate function |
| west/bin/train_knowledge_distillation.py | Main training script supporting both local and remote teacher modes |
| west/bin/decode_mmau.py | Adds caption template to available choices |
| examples/on_policy_distillation/run.sh | Complete training/evaluation pipeline script |
| examples/on_policy_distillation/cascaded_audio_capiton_llm_eval.py | Cascaded evaluation using LLM to answer questions from captions |
| examples/on_policy_distillation/README.md | Documentation with results and usage instructions |
| examples/grpo/run.sh | Updates default deepspeed config reference |
| examples/grpo/conf/ds_zero3_omni.json | New DeepSpeed ZeRO-3 configuration file |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR supports on policy distillation training. See on policy distillation.
Task1. Audio Caption
Note: Used cascade evaluation where the model first generates detailed captions, which are subsequently used to perform downstream question-answering tasks
mmau_test.wav
Before OPD: Qwen2.5-Omni-3B (Baseline)
After OPD: Qwen2.5-Omni-3B + On-Policy Distillation (Omni-captioner teacher)
Task2. Audio QA