Official PyTorch implementation of Rainbow Padding, a simple yet powerful strategy that resolves <eos> overflow in diffusion language models (dLLMs).
Visit our Project page and arxiv for paper if you are interested! This repository provides a step-by-step pipeline for SFT LoRA training, evaluation using Rainbow Padding.
If you have any questions, please contact the authors.
conda env create -f environment.yamlconda activate rainbowWe follow the curation recipe introduced in Dream (arXiv:2508.15487).
The training corpus consists of 0.5M public examples curated from:
Details are provided in Appendix C.1 of the paper.
⚠️ Note: Specific SFT configurations for both Dream and LLaDA were not publicly released (to the best of our knowledge).
You can directly download our preprocessed datasets from Google Drive:
# Same data with tokenization per model type.
# LLaDA SFT data
gdown --folder 1U8kVGYiWRsqWCDRsHUjeKTiDPrL0FsMp
# Dream SFT data
gdown --folder 1-oei1KRTFADMRljPqX5rPuTGcJ7fpHdIWe use 🤗 Accelerate for multi-GPU training.
| Argument | Description |
|---|---|
batch_size |
Batch size per GPU. Control the total batch size using gradient_accumulation_steps in ./method/sft.py. |
pad_num |
Number of cyclic padding tokens. Use 0 for <eos> padding or any positive integer (e.g., 3, 7) for Rainbow Padding. |
CUDA_VISIBLE_DEVICES=0,1,2,3 accelerate launch --num_processes=4 main.py --model_type=llada_base --pad_num=7CUDA_VISIBLE_DEVICES=0,1,2,3 accelerate launch --num_processes=4 main.py --model_type=llada_base --pad_num=7 --resume_dir model/llada_base/sft_5e-05_lora_epoch3_rank32_pad7We upload our checkpoint to Hugging Face: quasar529/rainbow-padding-llada.
We use widely used library LM-Eval-Harness, and we modified the evaluation script from LLaDA's eval script.
You can find the evaluation script in eval/eval_llada_instruct.py.
To run evaluation, you must install specific versions of datasets and lm-eval due to dependency constraints:
pip install datasets==3.6.0 lm-eval==0.4.9.1- lm-eval==0.4.9.1 requires datasets>=2.16.0,<4.0.
- However, the latest datasets (≥4.0.0) is incompatible.
- Therefore, you need to downgrade datasets to 3.6.0, which satisfies lm-eval's requirements and ensures stable evaluation. If you skip this step, evaluation scripts may still run but can break unexpectedly due to mismatched APIs.
# Example: Humaneval_instruct
accelerate launch --num_processes=1 eval/eval_llada_instruct.py \
--tasks humaneval_instruct \
--model llada_dist \
--batch_size 1 \
--log_samples \
--output_path "/home/quasar529/rainbow-padding/eval/output" \
--confirm_run_unsafe_code \
# If you want to use wandb, set wandb_log, wandb_project, wandb_entity
--model_args model_path='GSAI-ML/LLaDA-8B-Base',steps=1024,gen_length=1024,block_length=1024,lora_path='quasar529/rainbow-padding-llada',device='cuda',wandb_log=True,wandb_project='llada-eval',wandb_entity='your-wandb-entity'If you want to reproduce all evaluation tasks performed in the paper at once, you can simply run the provided shell script:
sh eval/eval.shIf you find this work useful, please cite:
@article{kim2025rainbow,
title={Rainbow Padding: Mitigating Early Termination in Instruction-Tuned Diffusion LLMs},
author={Kim, Bumjun and Jeon, Dongjae and Kim, Dueun and Jeung, Wonje and No, Albert},
journal={arXiv preprint arXiv:2510.03680},
year={2025}
}
This code builds upon the open-sourced implementations of
Dream and LLaDA.
We thank the authors for releasing their resources and inspiring this work.

