Add support for Python 3.14#4225
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
Currently, there are some issues with Check upstream issue on |
|
Related PR: |
|
Ok so we need to wait until python 3.14 is supported then |
|
The However, there is still another issue with tests/experimental/test_bco_trainer.py:27: in <module>
from ..testing_utils import TrlTestCase, require_no_wandb, require_peft, require_sklearn
tests/testing_utils.py:83: in <module>
not is_bitsandbytes_multi_backend_available() and not torch_device == "cuda",
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
tests/testing_utils.py:75: in is_bitsandbytes_multi_backend_available
import bitsandbytes as bnb
.venv/lib/python3.14/site-packages/bitsandbytes/__init__.py:19: in <module>
from .backends.default import ops as default_ops
.venv/lib/python3.14/site-packages/bitsandbytes/backends/default/ops.py:324: in <module>
@torch.compile
^^^^^^^^^^^^^
.venv/lib/python3.14/site-packages/torch/__init__.py:2590: in compile
raise RuntimeError("torch.compile is not supported on Python 3.14+")
E RuntimeError: torch.compile is not supported on Python 3.14+ |
|
Python 3.14 is finally fully supported after the release of
Context:
|
commit 489331e703e1e8d39534957f465fadce7f00ff99
Author: Quentin Gallouédec <gallouedec.quentin@gmail.com>
Date: Tue Mar 3 14:50:42 2026 +0000
Replace deprecated asyncio.iscoroutinefunction with inspect.iscoroutinefunction in RLOO/GRPO trainers
commit 484c1c1acf0b437c20e230d5e135613daf1a59fa
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date: Tue Mar 3 08:42:04 2026 -0600
CI: Add Qwen 3.5 tiny model to tests (#5204)
commit 7eebb294a9175ea2f0ffbf20cf759f772491d815
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Tue Mar 3 07:35:22 2026 +0100
Decouple rollout dispatch from vLLM backend in GRPO _generate_single_turn (#5122)
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: swappy <59965507+rycerzes@users.noreply.github.com>
commit 0bf875c0cbb879c4b264f66a6e556769d42e2f52
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Mon Mar 2 15:54:35 2026 +0100
Mark CI test_training_vlm_and_liger as xfail (#5202)
commit 7544c3a784147dbfc53bb1314558137320ecc3ed
Author: Michael Royzen <45830328+michaelroyzen@users.noreply.github.com>
Date: Fri Feb 27 14:42:57 2026 -0500
Support sequence sampling in Liger Kernel and pass importance_samplin… (#5190)
Co-authored-by: Michael Royzen <michaelroyzen@mac.mynetworksettings.com>
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
commit 5cffd59a8a814b9132c6d08e5aa88347a41c66e3
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Fri Feb 27 15:43:33 2026 +0100
Set CI PYTORCH_ALLOC_CONF env variable to avoid OOM (#5197)
commit eb8b8a510b3ee0e7e83e33f8cfbb6eada8eb7f34
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date: Fri Feb 27 08:11:51 2026 -0600
Re-add liger-kernel to dev deps (#5164)
Co-authored-by: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
commit 68f807b5e2ba4994898a7ef21ba631b64fb7c4b5
Author: Zhenkun Cai <zekucai@gmail.com>
Date: Fri Feb 27 05:11:51 2026 -0800
Add `pad_to_multiple_of` to GRPOTrainer and RLOOTrainer (#5180)
commit e53c98feb463c0897451b307432360c1616a8905
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Fri Feb 27 13:34:17 2026 +0100
Fix CI tests patching BaseTrainer (#5192)
commit bd2d21e02cc722221c0c7f91f4ddc7cbd9d271fa
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Fri Feb 27 08:27:22 2026 +0100
Refactor CLI [6/N]: Refactor env/vllm-serve commands with delayed imports (#5187)
commit e63cd79c68fc62edf63f01904ee02b0e63ab4336
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Fri Feb 27 08:25:30 2026 +0100
Refactor CLI [5/N]: Refactor TrainingCommand with delayed imports (#5186)
commit e941ff58121d382b470f8c8011dd76088192c46b
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Fri Feb 27 08:23:38 2026 +0100
Fix deprecation warning of fork in multi-threaded process (#5185)
commit b9263efa25e05ebf1c8c1525a9d5a6a7e94efbb2
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Fri Feb 27 08:22:58 2026 +0100
Fix deprecation warning of create_reference_model (#5184)
commit 410c00bfaead36b0048921a123739bd0cb4c3e7c
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date: Thu Feb 26 10:38:56 2026 -0600
Align documentation with the intended public API (#5162)
commit 519225384f9aaa7acf3959fbf6a218c2490d4a0e
Author: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
Date: Thu Feb 26 15:44:50 2026 +0100
Add minimal CARLA example script (#5161)
commit 64b47513982e2845c8cb6f4d5d611037f605d9bf
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Thu Feb 26 11:11:52 2026 +0100
Refactor CLI [4/N]: Replace top-level TrlParser with ArgumentParser (#5170)
commit f00379fa221689d67a3736c44eaf07137c11d5f9
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Thu Feb 26 10:02:45 2026 +0100
Make _BaseConfig and _BaseTrainer explicitly private (#5169)
commit eb973af2d1109c84600c7fdddf259e06a547f583
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Thu Feb 26 09:11:09 2026 +0100
Document parameters with differing default values in core configs (#5168)
commit b2b3045dfe3a3b6a0c52785b055b60e9a1a0e73b
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Thu Feb 26 07:56:32 2026 +0100
Handle mm_token_type_ids in SFT/GRPO/RLOO to fix IndexError (#5178)
commit 27e3e2ff68929b25045caf8af32799b2e1dc3965
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date: Wed Feb 25 16:03:45 2026 -0600
⬆️ Bump dev version (#5182)
commit d24e19424da2837d435a7884c0b307b605413829
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date: Wed Feb 25 15:56:26 2026 -0600
Release: v0.29 (#5181)
commit 70cf097fb8a39b8ad86aa6e27d49f081e96da4a5
Author: LeonEricsson <70749762+LeonEricsson@users.noreply.github.com>
Date: Wed Feb 25 22:01:16 2026 +0100
feature: Configurable num logprobs in vLLM generation (#5107)
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
commit 57d749336487d7ece06e58b941e4180f13649d8f
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date: Wed Feb 25 11:17:13 2026 -0600
Rename input keys in `RewardTrainer` collator from `chosen/rejected_input_ids` to `chosen/rejected_ids` (#5179)
commit a0d7d8e1257dea15fae6df434285958d22ce9c4e
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Wed Feb 25 17:31:50 2026 +0100
Update upstream tracking info about CI PyTorch JIT deprecation warnings (#5166)
commit 51fdc53e08b0ee39b65ba699fb49281d183701ce
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Wed Feb 25 17:16:55 2026 +0100
Document parameters with differing default values in experimental configs (#5172)
commit dd15cbb04a47c8efb4c8ed13e315dc4f2e1f853e
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Wed Feb 25 17:07:45 2026 +0100
Fix default learning_rate in BCO according to paper (#5173)
commit 0b2cd5c04e26e13358413c00e98a56e2c2914eb9
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Wed Feb 25 16:42:43 2026 +0100
Accept mm_token_type_ids in GRPO/RLOO _get_per_token_logps_and_entropies (#5176)
commit 95cedba36e5e015f9402bb997529337d6c90b0bb
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Wed Feb 25 16:39:36 2026 +0100
Fix default learning_rate in PPO according to paper (#5174)
commit 6d78858d176b9fb385b6d0f332d369e1ee2e27fb
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Wed Feb 25 16:26:16 2026 +0100
Fix experimental TestUpdateWithReplayBuffer: ValueError: `train_dataset` is required (#5171)
commit 0efaec33fbd3445eb1142c306e797940fad4de28
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date: Wed Feb 25 09:25:39 2026 -0600
Revert changes in vLLM client/server (#5165)
commit e540d687f8df6f3596fa6eb3cc50116b41d58f42
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Wed Feb 25 10:48:43 2026 +0100
Refactor CLI [3/N]: Self-contain VllmServeCommand argument parsing (#5160)
commit 9cc95a97927e59c3532ce2be3babcfd8a35adcd9
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Wed Feb 25 09:57:53 2026 +0100
Refactor CLI [2/N]: Move accelerate concerns into TrainingCommand (#5159)
commit 827457ce5845c5a5b02dab164e12f55cd1c4c532
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Wed Feb 25 09:34:28 2026 +0100
Raise ValueError for None train_dataset in core trainers (#5157)
commit 8b3934ce1681c9f959167804692d6d94fbb36eb0
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Wed Feb 25 07:57:00 2026 +0100
Fix Liquid syntax error in DPO trainer docs caused by double braces in LaTeX (#5153)
commit 4cd198e856b98cae6ed6d0632ab86ca22b432e23
Author: Blake Ledden <47259830+bledden@users.noreply.github.com>
Date: Tue Feb 24 19:41:05 2026 -0800
fix: wake up vLLM weights before sync to prevent writes to freed memory (#5147)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
commit 1149b746db9f39dc28859e426b16f0e6557db240
Author: ehofm <ella@rilix.ai>
Date: Tue Feb 24 20:36:07 2026 -0500
Fix structured_outputs handling and tool normalization in vLLM backend (#5155)
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
commit ea2b4958d0165e01a11b6f07ec024ee8c1d1835d
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date: Tue Feb 24 18:10:55 2026 -0600
Fix CI by removing liger-kernel from dev deps (#5163)
commit cfbdd3bea4448cde878c0da0de49551f553c61fe
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date: Mon Feb 23 22:27:02 2026 -0600
Fix `SFTTrainer` support for single-image data (#5132)
commit fa313fd57244008953753047795c954a782f9cfc
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Mon Feb 23 16:06:04 2026 +0100
Add support for Python 3.14 (#4225)
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
commit bc4edf6f02e6f07549d43b3543cb54d597cd3d91
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Mon Feb 23 15:06:58 2026 +0100
Fix type of TrainingArguments.logging_steps in docs (#5149)
commit 5269393f4269462ce5d4a9227a97af6911da7939
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Mon Feb 23 15:06:14 2026 +0100
Use BaseConfig in all experimental configs (#5148)
commit ef08730432721d67139e273775acf14846fa95d9
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Mon Feb 23 15:04:11 2026 +0100
Fix PPOTrainer.save_model (#5151)
commit ae97f06954b274f582f82ac60e444897f73f14c3
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date: Mon Feb 23 08:00:44 2026 -0600
Fix wording in DPO and SFT trainer documentation for clarity (#5140)
commit f150780cda7b0a82a4840c44b7026732ee17c4bb
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Mon Feb 23 09:06:38 2026 +0100
Move common fields from stable trainer configs to BaseConfig (#5136)
commit 93f2e480daa0ee9962a8a5deff6ea3da347fe911
Author: casinca <47400729+casinca@users.noreply.github.com>
Date: Sun Feb 22 20:09:44 2026 +0100
refactor(gkd_trainer): small optim (#5143)
commit 8067ea7558ed4477afce710bbf2f8a1a79973ba7
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date: Fri Feb 20 11:16:55 2026 -0600
Add `environment_factory` to `GRPOTrainer` (#5093)
Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
commit 7a4156a1bb3224a3c7f5861d39ef76367273a26b
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Fri Feb 20 16:05:55 2026 +0100
Fix `trl <command> --help` TypeError caused by unescaped `%` in `TrainingArguments` help strings (#5135)
commit c3ead5b556d9ea588b4a95cae1775913118ddbc6
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Fri Feb 20 11:09:02 2026 +0100
Fix NameError: name 'importlib' is not defined (#5134)
commit b7fa6bf17322f03d5ec47d12efc142de9ea5981a
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Fri Feb 20 09:46:30 2026 +0100
Fix import latency [2/N]: Implement native _is_package_available (#5129)
commit bb147645fad777c01ce1ccd2f10350b3cc50fceb
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Fri Feb 20 09:45:59 2026 +0100
Fix import latency [1/N]: Extract _LazyModule to dedicated module (#5128)
commit e3b7897c873f94c26bf1a661df19e428239be114
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date: Thu Feb 19 20:53:15 2026 -0600
Refactor DPO (#3906)
Co-authored-by: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
commit a68fb896f008f58a3d37abf11ae665357b7c679a
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date: Thu Feb 19 13:15:42 2026 -0600
Remove revision references in dataset loading for toolcall tests (#5133)
commit 699b8420cd6601474788effdab063d3d5e7bbc3b
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Thu Feb 19 17:53:30 2026 +0100
Refactor TRL CLI into modular command architecture (#5124)
commit b46614e235f126c1c8d0fd9f41f4d217a8299c34
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Thu Feb 19 17:08:14 2026 +0100
Implement Agent Skills [4/N]: Create skills CLI (#5103)
commit f8181886c6a59f5f8c2a2bf31bb4cb7bda225d39
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date: Wed Feb 18 11:59:13 2026 -0600
Update tool handling to support JSON string schemas in trainers (#5118)
commit 9fc9a7dcebe3938a273e18ca3ed5b2cfdb6c0839
Author: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
Date: Wed Feb 18 18:11:42 2026 +0100
Add Tiny Aya tool calling examples (script/notebook) (#5123)
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
commit 27431134e3447181821ffaf94c405a44d87d1bc1
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date: Wed Feb 18 10:10:14 2026 -0600
Add GLM-4.5 model to tests (#5114)
commit 0e531bdd1eb654bed32d12b474ea998285cb1253
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date: Wed Feb 18 09:22:06 2026 -0600
Add check for `None` in `get_trackio_space_url()` to prevent errors (#5115)
commit 8b082bb2d4d599d66cf36df0b754c2f4de0371de
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date: Wed Feb 18 09:18:02 2026 -0600
Fix Qwen3 schema (#5111)
commit 269217f92092e4497260e34bd535756cb9e76f64
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date: Wed Feb 18 08:27:27 2026 -0600
Add test for Cohere2 models (#5116)
commit 57df014377bef538c87c616379d4c11aeaf05b30
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date: Wed Feb 18 06:45:38 2026 -0600
Add more tests for `get_training_chat_template` (#5108)
commit 70efa963f1c9bb88ec3144b051db6fdf4ffc10a4
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date: Tue Feb 17 10:51:47 2026 -0600
Update version check for transformers to 5.2.0 in online_dpo_trainer.py (#5110)
commit 269ed992dca0858f290e08b5ebae271a15df8aa6
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date: Tue Feb 17 06:49:34 2026 -0600
Add validation for conversational prompts in multimodal training (#5067)
commit 997536a2b56d4a0824bb55f9265e6561c3fd1e43
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Tue Feb 17 10:30:06 2026 +0100
Implement Agent Skills [3/N]: Create skills installer (#5100)
commit 8b9b972878243505d26b3dc69945613ff5ddc98b
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date: Mon Feb 16 15:50:59 2026 -0600
Remove outdated liger-kernel compatibility checks and warnings in tests and SFTTrainer (#5105)
commit 8c232f64b5bb00ef854bff157f6857241d415fe0
Author: Harikrishna KP <harikp2002@gmail.com>
Date: Tue Feb 17 02:09:35 2026 +0530
Fix SFT loss type rewards being overwritten in dpo_loss() (#5079)
commit 99b26fb2e6f241195fc9b378ee8d50a6219083b2
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date: Mon Feb 16 14:34:54 2026 -0600
Add Trackio integration for model card visualization (#5101)
commit c94c032129af436c55764fef66389f30856df3d0
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Mon Feb 16 21:13:09 2026 +0100
Fix style (#5106)
commit 3d1c785762ce87892a7eaf18d1c0fb8771a74bc3
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Mon Feb 16 19:17:55 2026 +0100
Implement Agent Skills [2/N]: Create skills module (#5097)
commit 1702fc07b2d0c8ba23ad3299879d4edaaccb3b30
Author: LeonEricsson <70749762+LeonEricsson@users.noreply.github.com>
Date: Mon Feb 16 19:09:59 2026 +0100
feature: top_k selective_log_softmax (#5104)
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
commit 29ace1ad72152f4d648bf23763a319ce2400b9e6
Author: flutist <30485581+flutist@users.noreply.github.com>
Date: Tue Feb 17 00:56:52 2026 +0800
Fix DPO and RLOO incompatibility with FSDP2 (#4838)
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
commit 3d2e898ce4d538b5502f735a9c5116c8b45aaa46
Author: Yuki Uehara <74698040+yukiu00@users.noreply.github.com>
Date: Tue Feb 17 01:28:43 2026 +0900
Pass vllm_is_ratio to LigerFusedLinearGRPOLoss in compute_liger_loss (#5031)
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
commit b6957fc4c04a100e6829cb56e28f20668e4ad1ae
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Mon Feb 16 17:02:42 2026 +0100
Implement Agent Skills [1/N]: Create training skill (MVP) (#5096)
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
commit abf6033b99fc75eb9d58458b44b81f7f7faebdc1
Author: Behrooz Azarkhalili <80390531+behroozazarkhalili@users.noreply.github.com>
Date: Mon Feb 16 04:49:30 2026 -0800
docs: Unify model examples to use trl-lib namespace (#4431)
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
commit 7694e0e878361c0de6c6cb566c123f46af13d91d
Author: Nabin Oli <107109731+nabin2004@users.noreply.github.com>
Date: Sat Feb 14 00:36:27 2026 +0545
docs: add Multi-Node Training subsection (#4384) (#5091)
Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
commit 28fc3f2c336bb7f734aab49c1ad073e152dccf61
Author: Behrooz Azarkhalili <80390531+behroozazarkhalili@users.noreply.github.com>
Date: Thu Feb 12 10:37:15 2026 -0800
docs: Add MPO paper (2411.10442) to paper index (#5089)
Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
commit 051b52fbf2c68edcae092357d0d4118b35a5f60b
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Thu Feb 12 18:53:17 2026 +0100
Validate reward model has 1 num_labels (#5087)
commit a558fba8a5700933207c3963a1dab8a28291f2f1
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Thu Feb 12 18:04:13 2026 +0100
Fix BFD packing for SFT datasets (#5076)
commit 0073db963788d6cc77d51789f4b3d2c34930cfdc
Author: Behrooz Azarkhalili <80390531+behroozazarkhalili@users.noreply.github.com>
Date: Thu Feb 12 01:55:06 2026 -0800
docs: Add PPO paper (1707.06347) to paper index (#5085)
Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
commit 29ed9cb4ff346100bee004acd1ce7cc97554f064
Author: Behrooz Azarkhalili <80390531+behroozazarkhalili@users.noreply.github.com>
Date: Thu Feb 12 01:45:57 2026 -0800
docs: Add T5 packing paper (1910.10683) to paper index (#5084)
Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
commit d0e06fcda40607ad7bd1a3b639a3018c3bb4bfca
Author: Behrooz Azarkhalili <80390531+behroozazarkhalili@users.noreply.github.com>
Date: Thu Feb 12 01:38:35 2026 -0800
docs: Add PRM paper (2211.14275) to paper index (#5083)
Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
commit ee979a9d2f23c100cb9d4010b4ad99275a6c726c
Author: Behrooz Azarkhalili <80390531+behroozazarkhalili@users.noreply.github.com>
Date: Thu Feb 12 01:29:29 2026 -0800
docs: Add GKD paper (2306.13649) to paper index (#5082)
Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
commit ff84817d27241643abfe3f7691448e509e093320
Author: Behrooz Azarkhalili <80390531+behroozazarkhalili@users.noreply.github.com>
Date: Thu Feb 12 01:23:04 2026 -0800
docs: Add CPO paper (2401.08417) to paper index (#5081)
Co-authored-by: sergiopaniego <sergiopaniegoblanco@gmail.com>
commit fe890df6e2a84345a29be849bb5f27ca72052034
Author: Behrooz Azarkhalili <80390531+behroozazarkhalili@users.noreply.github.com>
Date: Thu Feb 12 01:05:10 2026 -0800
docs: Add ORPO paper (2403.07691) to paper index (#5080)
Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
commit c88f8c550137ddd1ddde56baebae2d9b97b9d54d
Author: Behrooz Azarkhalili <80390531+behroozazarkhalili@users.noreply.github.com>
Date: Thu Feb 12 01:01:37 2026 -0800
docs: Add TR-DPO paper (2404.09656) to paper index (#5078)
commit 0562c3fa26c1bc827aff83800b046f9a2af925a6
Author: Logan Vegna <logan.vegna@shopify.com>
Date: Wed Feb 11 16:17:04 2026 -0500
[SFT] Fix high vRAM consumption during eval with liger kernel (#5069)
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
commit 6b38db6ad85cf67ce1b7d4f037e5e5840d474587
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Wed Feb 11 19:22:00 2026 +0100
Fix CI ValueError: Pointer argument (at 0) cannot be accessed from Triton (cpu tensor?) (#5074)
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
commit 060fbfebbddf1e539e5dcee456bef643c29036d3
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date: Wed Feb 11 10:44:54 2026 -0600
Update model from SequenceClassification to CausalLM in `RewardTrainer` tests (#5060)
commit 0933b7fc5ddb933c632708bba5936b99238168d8
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Wed Feb 11 17:18:12 2026 +0100
Fix logging warning suppression for transformers 4.56.2 (#5077)
commit a07fb82b9a4333ea91cfe289a697b9b178d99021
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date: Wed Feb 11 09:21:48 2026 -0600
fix: Set `num_labels` to 1 in causal model initialization for RewardTrainer (#5066)
commit 29fe68205caf4acbf888307487e8423d692ee496
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Wed Feb 11 15:08:11 2026 +0100
Fix GRPO multi-turn training with liger kernels (#4975)
commit 68399dfa6a03e4dea6ea5087c4d181b3b400cab5
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date: Wed Feb 11 07:39:27 2026 -0600
fix: Use `launch_args` for all trainers (#5059)
commit d1b066fdc4a8a7d0bde59e3bf1aeaac8803746d1
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date: Wed Feb 11 07:37:54 2026 -0600
Fix logging warning suppression with scoped override for seq-clf head key (#5058)
commit 0c3d33b955730308bab3d28ba2ef6eebe704c7f8
Author: Behrooz Azarkhalili <80390531+behroozazarkhalili@users.noreply.github.com>
Date: Wed Feb 11 02:41:07 2026 -0800
docs: Add SimPO paper (2405.14734) to paper index (#5071)
Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
commit f23e3a775155f458108cb01ebcfde085ecca4733
Author: Behrooz Azarkhalili <80390531+behroozazarkhalili@users.noreply.github.com>
Date: Wed Feb 11 02:33:11 2026 -0800
docs: Add RPO paper (2405.16436) to paper index (#5070)
Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
commit e46005c07c1a8194f4ff71b749dceb44f17d7eb7
Author: Behrooz Azarkhalili <80390531+behroozazarkhalili@users.noreply.github.com>
Date: Wed Feb 11 02:22:52 2026 -0800
docs: Add XPO (2405.21046) to Paper Index (#5068)
Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
commit 6d9bba1b3b9f181d9cf53eb9092a3d52b66de93b
Author: Behrooz Azarkhalili <80390531+behroozazarkhalili@users.noreply.github.com>
Date: Wed Feb 11 02:15:59 2026 -0800
docs: Add REINFORCE++ (2501.03262) to Paper Index (#5062)
Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
commit b992e9284aac5979ab5716d14587067857663398
Author: Behrooz Azarkhalili <80390531+behroozazarkhalili@users.noreply.github.com>
Date: Wed Feb 11 01:37:06 2026 -0800
docs: Add INTELLECT-2 (2505.07291) to Paper Index (#5061)
Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
commit c985dbadd3a499dc049f8c39c61034525d381006
Author: Jen Wei <45276133+JenWei0312@users.noreply.github.com>
Date: Wed Feb 11 02:25:55 2026 -0700
docs: add DeepSeek-R1 training dynamics and GRPO example (#5053)
Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
commit d934eb757806501a5106b6e4374d920961dc4e9f
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Tue Feb 10 19:14:40 2026 +0100
Remove deprecated mergekit_utils moved to experimental (#5057)
commit 991fd0755aa1cce7a800d271dae4b525b6357bfd
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date: Tue Feb 10 11:57:11 2026 -0600
Remove duplicated tests for SFT and add gradient checkpointing tests (#5054)
commit d42b23f63f164af241c34c79e6a855d1eb896d4d
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Tue Feb 10 18:17:10 2026 +0100
Remove deprecated classes moved to experimental (#5044)
commit e1a84cf626d249e9b55447d63eece88cbf92d100
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Tue Feb 10 18:06:05 2026 +0100
Remove deprecated RLOOConfig.max_prompt_length (#5056)
commit fc560370d97042cb7f90a9bf0e2e30d29a304240
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Tue Feb 10 17:47:21 2026 +0100
Remove deprecated XPO after moved to experimental (#5055)
commit 13bd37e1426eb81aca81cd68eb8d0efcbc6351b9
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Tue Feb 10 17:36:36 2026 +0100
Remove deprecated PRM after moved to experimental (#5052)
commit 0aea3144031abacb0efadd8aab5a3ca9fe6380e8
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Tue Feb 10 17:33:24 2026 +0100
Remove deprecated PPO after moved to experimental (#5051)
commit d705ac4d0f13168724f548c9ecbb1a586c187e16
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Tue Feb 10 17:06:11 2026 +0100
Remove deprecated ORPO after moved to experimental (#5050)
commit b393c6bf6605d04f00a110085bf59feef59ffa6a
Author: Salman Chishti <13schishti@gmail.com>
Date: Tue Feb 10 15:29:23 2026 +0000
Upgrade GitHub Actions to latest versions (#4893)
Signed-off-by: Salman Muin Kayser Chishti <13schishti@gmail.com>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
commit ce2ea744c5f504a026aa9ea41815bfa382417a9d
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Tue Feb 10 15:47:05 2026 +0100
Remove deprecated Judges after moved to experimental (#5048)
commit 4620e91d21ad0dd3d885abe29446d7a52a9d368e
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Tue Feb 10 15:18:49 2026 +0100
Remove deprecated CPO after moved to experimental (#5046)
commit 6e47225d012aba64efb16789356bee9d037ec171
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Tue Feb 10 15:18:33 2026 +0100
Remove deprecated BCO after moved to experimental (#5045)
commit 17277e2d963611603eb2655af429975feade3b5c
Author: LeonEricsson <70749762+LeonEricsson@users.noreply.github.com>
Date: Tue Feb 10 15:13:24 2026 +0100
[GRPO] fix: remove SAPO temperature check (#5042)
commit 7267b2d3589bcdccb46e8cfd51d63416d4378c76
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Tue Feb 10 14:42:18 2026 +0100
⬆️ Bump dev version (#5049)
commit 4aaaf064c15ad80bea91895c8f202d44ad17cdb4
Author: casinca <47400729+casinca@users.noreply.github.com>
Date: Tue Feb 10 14:27:45 2026 +0100
[minor] docs: typo in `grpo_trainer.md` (#5047)
commit 49ef33428c47235991acb4e185ea599b70c6dab4
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Tue Feb 10 14:20:48 2026 +0100
Release: 0.28 (#5043)
commit a958acc1e92d9ccf8404d800bf073c4cc7e5dd85
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date: Tue Feb 10 03:43:23 2026 -0600
Add Online Direct Preference Optimization section to paper index (#5037)
commit 8b935c6378b78adcb4fda9e66a944e05cc99b681
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date: Tue Feb 10 03:41:41 2026 -0600
Fix multiprocessing start method to 'spawn' for test compatibility with Python 3.12+ (#5036)
commit 40fff2e3bab905c2e7096360f4a8f014aab4cd14
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date: Tue Feb 10 03:32:24 2026 -0600
Deprecate FDivergenceType in DPOConfig; update f_divergence_type to use string values (#5039)
Co-authored-by: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
commit fe1949b4da60dc7adbcf3a0bb17a4f42280c5c28
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date: Tue Feb 10 03:22:43 2026 -0600
Deprecate string usage for `ref_model` in DPOTrainer (#5040)
Co-authored-by: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
commit 9f7c33600b7555a72234926a846ee65ff2508624
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date: Tue Feb 10 02:51:40 2026 -0600
Rename AOT loss type 'aot_pair' to 'aot_unpaired' in DPO (#5038)
Co-authored-by: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
commit 19c5f4460cd9b405a95fabb05322ceb98864e915
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Tue Feb 10 08:04:20 2026 +0100
Allow testing with transformers 5.1.0 via xfail marks (#5034)
commit 442509524b4e7c8ee4d9f1d6f1f1087b5dcd1a0f
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Mon Feb 9 23:45:11 2026 +0100
Fix CI FutureWarning: max_prompt_length is deprecated (#5019)
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
commit 0ef315a0f992a30f82792e87b5e3f0fab58a6107
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Mon Feb 9 22:45:23 2026 +0100
Filter max_prompt_length UserWarning in all test cases (#5035)
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
commit 8a27a17e583756b75ce71eb40a9ed295a610ade1
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Mon Feb 9 22:43:34 2026 +0100
Fix CI FutureWarning: tools is deprecated (#5015)
commit ff55949cccaaa23a752d951d184c04ff579aad89
Author: Haseeb Asif <149416177+Haseebasif7@users.noreply.github.com>
Date: Tue Feb 10 02:42:55 2026 +0500
Add length-unbiased GRPO loss (LUSPO) (#4988)
Co-authored-by: Haseeb Asif <haseeb@Haseebs-MacBook-Air.local>
Co-authored-by: Leon Ericsson <leon.ericsson@icloud.com>
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
commit 765e397ed83f344a8ba7082673d4c4616beeb3a2
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date: Mon Feb 9 15:00:49 2026 -0600
[CI] Silence PyTorch JIT and DataLoader deprecation warnings (#4999)
Co-authored-by: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
commit b5bd2b98ed615676a7bee40c8ae17de62421bb4a
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Mon Feb 9 21:12:57 2026 +0100
Mark Qwen3VL tests as xfail for transformers 5.0.x (#5029)
commit 7189bc68d8ab0d19b67c0fc0b849c83679389b46
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Mon Feb 9 21:04:35 2026 +0100
Fix CI FutureWarning: use_logits_to_keep is deprecated (#5013)
commit db0d95523e5b8039c94c50df1b6286ff8b7e29ce
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Mon Feb 9 15:53:56 2026 +0100
Fix CI FutureWarning: rpo_alpha is deprecated (#5011)
commit fa06506f9d1c9546f63ae513cf7a5ba1be3247ae
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Mon Feb 9 15:52:31 2026 +0100
Fix typo in xfail test reason (#5028)
commit 4abd67951f996b511f4b913ead2386fbe357061b
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Mon Feb 9 15:51:21 2026 +0100
Fix CI FutureWarning: generate_during_eval is deprecated (#5017)
commit 9f1e7dd7fd58be3327748234fb952599c4bd4f09
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Mon Feb 9 15:48:59 2026 +0100
Pin transformers < 5 in judges extra due to incompatibility (#5024)
commit 7c4e7f86047b82ad0e5ff7c8e3bb280b73024f31
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Mon Feb 9 15:22:02 2026 +0100
Fix vision model prompt truncation bug in DPOTrainer (#5023)
commit a68c82a617be59086b83f5ce941175270926de3f
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Mon Feb 9 15:00:22 2026 +0100
Fix typo in DPO max_prompt_length deprecation warning message (#5020)
commit 5eb25938d44b781687061a69586a06501b11e915
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Mon Feb 9 14:35:44 2026 +0100
Fix CI FutureWarning: ref_model_init_kwargs is deprecated (#5009)
commit 58f467babd998fe5fe41598b535ceacda690cef0
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date: Mon Feb 9 07:34:48 2026 -0600
Add support for `nested_gather` in OnlineDPOTrainer for transformers v5.2.0 and above (#4981)
commit 71a349335ce554180b2b4947d33594090f74d5cf
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Mon Feb 9 14:34:44 2026 +0100
Fix CI TRLExperimentalWarning in regular tests (#5007)
commit a7333c8c68f564005ab74fd999ad246de405122f
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Mon Feb 9 14:34:15 2026 +0100
Filter CI SWIG deprecation warnings (#5004)
commit 98b00171b81411cd8bf7d6a9135af70c9879aaee
Author: Nabin Oli <107109731+nabin2004@users.noreply.github.com>
Date: Mon Feb 9 18:59:18 2026 +0545
docs: add CGPO/Mixture of Judges (2409.20370) to Paper Index + link ref to AllTrueJudge (#5002)
Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
commit 728b0e372fb7de141093aff5513697a4fa743137
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date: Mon Feb 9 07:13:47 2026 -0600
[tests] Remove xfail for transformers version >= 5.0.0 due to upstream bug resolution (#5000)
commit 637de450e748d1f612c0f6fed6be4df9cbbf1c39
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date: Sun Feb 8 14:36:10 2026 -0600
Add `sanitize_logprob` function for NaN handling in vLLM log probabilities (#5001)
commit bfb94262b81fd28017a11d7b9ddc61e3095cc2b6
Author: Akshay Ballal <61191840+akshayballal95@users.noreply.github.com>
Date: Sat Feb 7 14:48:58 2026 +0100
Fix GRPO tool calling for corrupted tool calls (#4890)
commit 7a39ff3995f2f8b7cb4f8ca29a09390ac587a43d
Author: casinca <47400729+casinca@users.noreply.github.com>
Date: Fri Feb 6 23:05:14 2026 +0100
perf: Qwen SAPO loss optimization (#4956)
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
commit bd206704f9fc2c08039c522b40b0f68654bb006f
Author: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
Date: Fri Feb 6 23:00:24 2026 +0100
Update sampling mode to token level for safety (#4989)
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
commit aa7d457f9bec736c75439f78d081a6dc012ce353
Author: cmunley1 <cmunley@nvidia.com>
Date: Fri Feb 6 13:37:53 2026 -0800
Update NeMo-Gym to use `env_mask` (#4986)
Signed-off-by: Christian Munley <cmunley@nvidia.com>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
commit 5db1c11c52bc95255ea73e7eae3840fbeeb293a2
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date: Fri Feb 6 11:27:47 2026 -0600
Add distributed smoke tests workflow for Transformers branch (#4996)
commit 90a35d12c9c64129eb023b499a812c5e638db846
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date: Fri Feb 6 10:47:49 2026 -0600
Add GitHub Actions workflow for testing against Transformers branch (#4995)
commit f11b4c3fdd511d9adfda74ceec02042cee65a0f3
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date: Fri Feb 6 09:46:14 2026 -0600
Fix ZeRO-3 + PEFT + gradient checkpointing (#4951)
commit 27cbe98ac7487f326be51e180a1ee078c23b3836
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Fri Feb 6 16:14:18 2026 +0100
Fix post_init warning stacklevel to 3 (#4993)
commit 57cac251bdde714f97458b39b24702cf624dec66
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Fri Feb 6 16:13:47 2026 +0100
Fix deprecation of DPOConfig.max_completion_length (#4992)
commit ce72c067f6b55d4352c71939d9be6f4dfdaf68a0
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Fri Feb 6 16:13:14 2026 +0100
Assert chat_template is applied in test_train_with_chat_template_kwargs (#4991)
commit ffdaba3a97299c0c381512f807c57aa753f6314a
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Fri Feb 6 16:11:40 2026 +0100
Fix import of AutoModelForCausalLMWithValueHead from experimental (#4990)
commit 97a8a9672c0d5fbacb5c60934f40a7af404adecf
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date: Fri Feb 6 07:10:06 2026 -0600
Use local variable instead of attribute in collator tests (#4957)
commit 4e212bdeed6c7bf081e494960c65a84a35a85d6e
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date: Fri Feb 6 07:07:28 2026 -0600
Update dataset configuration name in toolcall dataset loading (#4984)
commit c82f6aa4766f83b66ea3f37e7bbf3b30453a1cda
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Fri Feb 6 14:03:31 2026 +0100
Fix passing tokenizer in test_train_with_chat_template_kwargs (#4987)
commit c581c1e8829904c6838c38d70d7b5fa646e2f0fc
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Fri Feb 6 14:03:02 2026 +0100
Pin transformers!=5.1.0 in deepspeed extra due to incompatibility (#4985)
commit 98aca7f4fdb2c2879c86aa9bd18ccddece112b70
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date: Fri Feb 6 06:54:48 2026 -0600
Replace `warmup_ratio` with `warmup_steps` (#4983)
commit 032ee139d90d1279549e2b538f11a2c3b7c22aa7
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date: Fri Feb 6 00:34:04 2026 -0600
[CI] Disallow installation of transformers 5.1.0 due to compatibility issues with DeepSpeed (#4982)
commit a0e5f265604356d2a107edccd920b5236582a3d7
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date: Thu Feb 5 13:42:37 2026 -0600
Deprecate parameters in `DPOConfig` (#4969)
Co-authored-by: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
commit f0a738d954775e50d4bd4a4df4fc5d1826e2f0b4
Author: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
Date: Thu Feb 5 20:12:15 2026 +0100
Simplify instructions of installation of OpenEnv (#4980)
commit a92d14336e5380f1ce2b8cbca78ebd224933d2ba
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date: Thu Feb 5 13:05:32 2026 -0600
Replace `torch.allclose` with `torch.testing.assert_close` (#4977)
commit b0b798a82953094fb4d254915e4c360789ac2838
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Thu Feb 5 19:53:14 2026 +0100
Support truncated completions in GRPO multi-turn training (#4976)
commit ac194a917b2a8b7d514097d34078ec191ef4a0e3
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Thu Feb 5 16:56:50 2026 +0100
Fix add_column in test_train_with_chat_template_kwargs (#4979)
commit 3a76b7a8690e25838f2332b81d7efbfed2615277
Author: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
Date: Thu Feb 5 16:24:29 2026 +0100
Set specific OpenEnv version when installed (#4978)
commit 0113ad7022118e4a7afe62b4e190f0b9aee4cadf
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Thu Feb 5 13:28:21 2026 +0100
Remove truncation from tokenizer calls if no max_length (#4972)
commit eee98f77a25bb386a0aba85dcb93ad50511224f9
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Thu Feb 5 13:27:01 2026 +0100
Remove padding_value from experimental CPO and use pad_token_id (#4962)
commit 1354860c5c33bdee7a89a8845474ac4074094ed6
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date: Thu Feb 5 06:07:05 2026 -0600
Fix test_train_with_chat_template_kwargs (#4971)
commit 1bd2a52ec2d8344050af736d60cdc735181ae4b8
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date: Thu Feb 5 04:18:38 2026 -0600
Revert change in GRPO from NeMo-Gym Integration (#4970)
commit 22ad7e6b3f2ec7dfc2567fa0535955812fe69a42
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Thu Feb 5 08:14:54 2026 +0100
Remove max_prompt_length from experimental ORPO (#4966)
commit 657babd9300007308c7b9ad329790f0564daf94d
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Thu Feb 5 08:13:10 2026 +0100
Remove max_prompt_length from experimental CPO (#4965)
commit 50e35de16578daa9aee75758fad8bdc0f707e37d
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Thu Feb 5 08:09:25 2026 +0100
Remove max_prompt_length from experimental BCO (#4964)
commit 35bcab1d4da9fca092f9cefbc611fcd9eddaaf42
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Thu Feb 5 08:02:34 2026 +0100
Remove max_prompt_length from experimental PRM (#4963)
commit e4995b2d26122879c03605f8ee136bcb241b4171
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date: Wed Feb 4 14:17:22 2026 -0600
Add test for training with `compute_metrics` in `RewardTrainer` (#4958)
commit cb5a73bfd97a3cb36712cd540545c2512cfa96a6
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date: Wed Feb 4 14:14:14 2026 -0600
Add test for tool call data in `RewardTrainer` (#4959)
commit 90b875c575b816e9015670ad812db43c8ab9a0e3
Author: cmunley1 <cmunley@nvidia.com>
Date: Wed Feb 4 08:56:55 2026 -0800
NeMo-Gym Integration (#4848)
Signed-off-by: Christian Munley <cmunley@nvidia.com>
Signed-off-by: cmunley1 <cmunley@nvidia.com>
Signed-off-by: Lawrence Lane <llane@nvidia.com>
Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
Co-authored-by: Lawrence Lane <llane@nvidia.com>
commit 5cb7eee1548bc72ed6fd84080c200a0adf74add2
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date: Tue Feb 3 11:53:35 2026 -0600
Remove access to `warnings_issued` (#4960)
commit 2a55ed701122f3d210669c70b441e3baff6184b6
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date: Tue Feb 3 08:28:32 2026 -0600
Add test for training with `compute_metrics` in `SFTTrainer` (#4950)
commit 7b54e7253093610ab69bb8c32b2a4eb6926721ea
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Tue Feb 3 10:43:40 2026 +0100
Minor fix docs style (#4953)
commit 2a9fb3f22a8bbad1412af3bb2526febd7160b85f
Author: mel3c <gaozh1988@live.com>
Date: Tue Feb 3 16:02:07 2026 +0800
Fix PPO run_name parameter not taking effect (#4945)
commit a03c2fcda3a328bde9af4abc5c02f6e7e942140f
Author: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
Date: Mon Feb 2 16:27:47 2026 +0100
Update wordle.py example with masking of env tokens (#4895)
Co-authored-by: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
commit 68bc37700d2b66e1fbfa49282495f5419dd8abeb
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Mon Feb 2 15:25:02 2026 +0100
Remove ref_model_init_kwargs from experimental BCO (#4946)
commit 239c74d9ffb8ca67a9a667fb7a2a91576d554f28
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Mon Feb 2 09:25:41 2026 +0100
Fix SFTTrainer init logic: remove TrainingArguments.push_to_hub_token only for transformers < v5 (#4942)
commit 035c3ff151b953ca72cdfe0ee966bc1469a26fde
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date: Thu Jan 29 14:08:08 2026 -0600
[GRPO] Add parquet logging for completions with individual rewards (#4818)
Co-authored-by: Daniel van Strien <davanstrien@gmail.com>
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
Co-authored-by: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
commit 414e60f557eb0d0888db841c5e0e8f568e7607a8
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Thu Jan 29 18:57:12 2026 +0100
Set default top_k to 0 in VLLMClient (#4927)
commit df332dc924e1bdc75bcfc5573950a17648db2eb4
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date: Thu Jan 29 09:03:32 2026 -0600
Fix import statement for import_utils in vllm_client.py (#4932)
commit 27998e9584df0102b849878506be7d4808486771
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Thu Jan 29 15:51:19 2026 +0100
Fix profiling of VLLMGeneration.sync_weights (#4931)
commit 43fb8d310633448a0c4c731a2efe9c1ca55e6184
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Thu Jan 29 15:11:24 2026 +0100
Set model dtype to float32 in experimental tests of trainers (#4925)
commit 5a7481ec9340dfad5f23c54f90e15a139c1dff85
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Thu Jan 29 14:35:56 2026 +0100
Move VLLMClient to generation module (#4928)
commit 21a0d70400179e4047c60183d7fb61988a249989
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Thu Jan 29 14:32:59 2026 +0100
Require transformers<5 with PairRMJudge (#4926)
commit 4348375ab2c6bad36ef90e1061b804b0449148f1
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Thu Jan 29 14:30:13 2026 +0100
Set model dtype to float32 in tests of trainers (#4924)
commit a6cbf279d7d3bc4024e6e6273d967509e7221e83
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date: Thu Jan 29 07:06:06 2026 -0600
Support tool call data in `is_conversational` (#4923)
commit ad91c6ffa91073684c4cf6dc2008e2994dd940e7
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date: Wed Jan 28 11:47:15 2026 -0600
Add validation for `sync_ref_model` in `GRPOTrainer` and `RLOOTrainer` when using PEFT models (#4912)
Co-authored-by: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
commit 04717ffca8fd91a0fa5ee610fbdc75ef8f3c5a22
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date: Wed Jan 28 11:16:12 2026 -0600
Update learning rate comments and add assertions for reference model parameters in GRPO and RLOO tests (#4914)
Co-authored-by: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
commit b322d9ba8092399b956882f61978ab3e90868c77
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date: Wed Jan 28 10:54:04 2026 -0600
Remove chat template setup in dpo_vlm.py (#4906)
commit a70b4e014756dc8595ac226d833deaba9784f756
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date: Wed Jan 28 09:55:19 2026 -0600
Fix extra EOS appended in DPO preprocessing for conversational data (#4908)
Co-authored-by: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
commit 8464b0e4b22c571bbf565a03ee154a5692c8d056
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Wed Jan 28 16:03:53 2026 +0100
Fix CI ValueError for 0 temperature (#4916)
commit 5461a74bc622660039e2038b6b0e5a43bdc712ae
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Wed Jan 28 15:58:12 2026 +0100
Fix CI AssertionError: assert not True (#4921)
commit d54381a4a90cb18152842158c62aad9895022448
Author: Boyi Zhang <68804418+billycrapediem@users.noreply.github.com>
Date: Wed Jan 28 09:54:29 2026 -0500
docs: add DoRA (2402.09353) to Paper Index (#4892)
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
commit f2f6b32bdc3688b124d72caa412d60a8f12d80c0
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date: Wed Jan 28 08:52:58 2026 -0600
Remove gradient checkpointing option from various training scripts (#4905)
commit 6cbc102f5fe94804e5a7579ff1aec270b97e4f5f
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date: Wed Jan 28 08:35:23 2026 -0600
Comment about overriding prediction_step in GRPOTrainer and RLOOTrainer (#4913)
commit f40edf9328adbe6c85acfb9dd9745e9c1393197e
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date: Wed Jan 28 08:17:35 2026 -0600
`device_map` init consistency in GRPO/RLOO/KTO (#4909)
commit a7070f940e8e0565adfbe9bbedd68b7850334b03
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date: Wed Jan 28 08:14:30 2026 -0600
Fix help text formatting for `max_length` in `RewardConfig` and `SFTConfig` (#4910)
commit 66efc0e52e55d77c2edf3e67c6c1f08e274ac9f8
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date: Wed Jan 28 08:08:11 2026 -0600
Rearrange variable assignments in `DataCollatorForVisionLanguageModeling` (#4911)
commit e9a2f16004a00a50e69e5779f58bf0bc24937de7
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Wed Jan 28 11:43:26 2026 +0100
Fix CI TypeError in llm-blender tests (#4919)
commit 4f8232098c10c98ad7febe971da4eb362d13433c
Author: adityachallapally <avasanthc@gmail.com>
Date: Wed Jan 28 01:10:29 2026 -0800
Created new PTT integration docs as requested (#4907)
Co-authored-by: Aditya Challapally <adchalla@microsoft.com>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
commit 0eb66d8f2fc63b3d00d8dbc18f99c3f48750bd16
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Tue Jan 27 16:53:29 2026 +0100
Refactor vLLM generation [1/N]: Extract vLLM generation (#4700)
commit 226ef57192b49801c3be8c55c798c6d5b134b080
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Tue Jan 27 14:34:11 2026 +0100
Fix CI AssertionError: Parameter has not changed (#4904)
commit 956986ebd53ff0d8dfa688e9d1033488dcad55d6
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Tue Jan 27 14:33:56 2026 +0100
Fix CI NotImplementedError for bfloat16 (#4902)
commit 4322778d7f696a4fc1fc33612b02eeb5ec700109
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date: Mon Jan 26 12:57:43 2026 -0600
Transformers v5 release: extend xfail condition for `TestGRPOTrainer.test_training_vlm_and_liger` and update version checks (#4898)
commit e106972dd6d839f4a3d3fcaffc1f386b4fbe66bf
Author: Cola Chan (SII) <57797863+141forever@users.noreply.github.com>
Date: Mon Jan 26 17:56:38 2026 +0800
GOLD training speed up (#4888)
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
commit c477e88e05023dbcd45211c1a802788650598909
Author: Yi-Chen Li <ychenli.X@gmail.com>
Date: Fri Jan 23 21:25:36 2026 +0800
Fix RewardTrainer's results not reproducible (#4887)
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
commit ba053232324b207554116f806edbb2ec8b6ab9f5
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date: Thu Jan 22 08:15:31 2026 -0600
Fix import path for `get_open_port` based on vLLM version (#4883)
commit e66a138438a3beba08756543fa41b7a90054ee8c
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date: Thu Jan 22 08:14:48 2026 -0600
Mark ZeRO 2 as xfail in distributed tests due to current failure (#4885)
commit a60d75aa1efa6ac5330649aafd425859da685a63
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date: Wed Jan 21 15:44:01 2026 -0600
Test distributed training for `RewardTrainer`, `RLOOTrainer` and `GRPOTrainer` (#4823)
Co-authored-by: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
commit 60e46742576876209658446b50f144541873301b
Author: Wing Lian <wing@axolotl.ai>
Date: Wed Jan 21 16:15:03 2026 -0500
Enable vLLM sleep mode for generation in Online DPO (#4882)
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
commit 0a881bcee992a25e2fc0e980cc43a7428ce17373
Author: Kirill Dubovikov <dubovikov.kirill@gmail.com>
Date: Thu Jan 22 01:09:34 2026 +0400
Bugfix: Logprob drift in vLLM serving mode (compared to colocate mode) (#4873)
Co-authored-by: Kirill Dubovikov <kirill.dubivokov@mbzuai.ac.ae>
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
commit 16b090302b8fd408870baa7452b5c3a29e03c346
Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Date: Wed Jan 21 03:00:04 2026 -0600
Fix SFT training for prompt-completion type and transformers v5 (#4880)
commit b080a4c27a60988be213354f551e26d3a4b2eef9
Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Date: Wed Jan 21 08:07:42 2026 +0100
Remove label_pad_token_id from experimental trainers (#4878)
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Add support for Python 3.14, which was first released yesterday: