A curated collection of papers, technical reports, frameworks, and tools for on-policy distillation (OPD) of large language models
awesome reinforcement-learning rl awesome-list knowledge-distillation post-training opd distillation self-distillation llm rlhf gkd llm-training speculative-decoding on-policy-distillation minillm llm-distillation
-
Updated
May 28, 2026 - Shell