Prevent corruption of DPO VLM training if "keep_end" truncation_mode by albertvillanova · Pull Request #5286 · huggingface/trl

albertvillanova · 2026-03-13T09:00:45Z

Prvent corruption of DPO VLM training if "keep_end" truncation_mode:

Raise ValueError when truncation_mode="keep_end" is used for VLM training in DPO.

This PR addresses a regression related to vision-language models (VLMs) and sequence truncation. It ensures that using the keep_end truncation mode with VLMs raises a clear error at initialization, preventing silent corruption of training data. The update includes both a code fix and a regression test.

Changes

Validation improvements for vision-language models:

Added a check in the DPOTrainer.__init__ method to raise a ValueError if a vision-language dataset is used with truncation_mode='keep_end', explaining that image tokens would be dropped and recommending alternatives.

Testing enhancements:

Introduced a regression test (test_train_vlm_keep_end_raises) to verify that initializing training with truncation_mode='keep_end' for a vision-language model raises the expected error, preventing silent data corruption.

Note

Low Risk
Low risk: adds a defensive init-time validation and a regression test; only affects the VLM + max_length + truncation_mode='keep_end' configuration by failing fast instead of proceeding.

Overview
Prevents silent corruption in DPO vision-language training by failing fast when a vision dataset is used with max_length set and truncation_mode='keep_end', raising a clear ValueError during DPOTrainer initialization.

Adds a regression test to ensure VLM trainer construction with keep_end truncation reliably errors (fix for #5285).

^{Written by Cursor Bugbot for commit f36b3c3. This will update automatically on new commits. Configure here.}

HuggingFaceDocBuilderDev · 2026-03-13T09:03:48Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

qgallouedec · 2026-03-13T15:11:12Z

I'm not sure to understand this one. The image tokens are indeed in the beginning, but they are usually not the first tokens. Eg if the data looks like

<user><img>What is it?<assistant>A flower<end>
<user><img>What is it?<assistant>A bus<end>

What prevents from truncating the first token (<user>)?

albertvillanova · 2026-03-16T07:33:24Z

@qgallouedec, even in your very edge case, how could you be sure that you are just removing the first token (and not any image token) with "keep_end" in every example when they have different lengths? max_length is a single scalar applied uniformly to the whole dataset.

What this PR is trying to solve is slightly narrower: with keep_end, truncation removes the varying-length prefix of the sequence, so in a vision example it can drop the whole prompt prefix including the image placeholder/tokens. In your example, that could indeed mean dropping first, then , then "What is it?", depending on how much needs to be truncated.

The reason I called out image tokens explicitly is that, for VLM inputs, losing them is especially problematic: once the visual tokens are truncated away, the example is no longer a valid multimodal sample and can become semantically inconsistent with the remaining text. By contrast, truncating text-only prefixes is still undesirable, but it is the usual trade-off of sequence truncation and not something specific to vision inputs.

I could improve the wording of the error to make it more precise: the core argument is not that image tokens are necessarily the very first tokens, but that they live in the prefix region that keep_end is designed to discard.

- "truncation_mode='keep_end' is not supported for vision-language models. Image tokens reside in "
- "the prompt at the beginning of the sequence; keeping the end would drop them. Use "
- "truncation_mode='keep_start' (the default) or set max_length=None."
+ "truncation_mode='keep_end' is not supported for vision-language models. Image tokens reside "
+ "inside the prompt portion of the sequence; depending on the example, keep_end may silently "
+ "drop them, causing pixel_values to be forwarded to the model with no corresponding visual "
+ "tokens in input_ids. Use truncation_mode='keep_start' (the default) or set max_length=None."

keep_start does not have this problem: as long as max_length >= prompt_len, image tokens are always safe.

albertvillanova added 2 commits March 13, 2026 09:55

Add regression test

d636963

Raise ValueError

60dcb8f

albertvillanova changed the title ~~Prvent corruption of DPO VLM training if "keep_end" truncation_mode~~ Prevent corruption of DPO VLM training if "keep_end" truncation_mode Mar 13, 2026

albertvillanova mentioned this pull request Mar 16, 2026

Support max_length in DPO VLM training #5284

Open

albertvillanova added 4 commits March 16, 2026 08:39

Rephrase error message

1530157

Merge remote-tracking branch 'upstream/main' into fix-5285

221421e

Merge remote-tracking branch 'upstream/main' into fix-5285

ceccbe3

Merge remote-tracking branch 'upstream/main' into fix-5285

f36b3c3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prevent corruption of DPO VLM training if "keep_end" truncation_mode#5286

Prevent corruption of DPO VLM training if "keep_end" truncation_mode#5286
albertvillanova wants to merge 6 commits intohuggingface:mainfrom
albertvillanova:fix-5285

albertvillanova commented Mar 13, 2026 •

edited by cursor bot

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Mar 13, 2026

Uh oh!

qgallouedec commented Mar 13, 2026

Uh oh!

albertvillanova commented Mar 16, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

albertvillanova commented Mar 13, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Uh oh!

HuggingFaceDocBuilderDev commented Mar 13, 2026

Uh oh!

qgallouedec commented Mar 13, 2026

Uh oh!

albertvillanova commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

albertvillanova commented Mar 13, 2026 •

edited by cursor bot

Loading

albertvillanova commented Mar 16, 2026 •

edited

Loading