Official implementation of Rejection Mixing: Fast Semantic Propagation of Mask Tokens for Efficient DLLM Inference.
ReMix is a training-free decoding method for efficient Diffusion Large Language Model (DLLM) inference. This repository provides code and scripts for reproducing demos and evaluations on LLaDA and MMaDA.
.
├── LLaDA/ # Text-domain demos, decoding, configs, and evaluation scripts
├── MMaDA/ # Multimodal demos, decoding, and lmms-eval based evaluation
├── LICENSE
└── README.md
We recommend using uv for dependency and virtual environment management.
pip install uv
cd LLaDA
uv venv --python 3.11 dev
source dev/bin/activate
uv pip install -r requirements.txtpip install uv
cd MMaDA
uv venv --python 3.11 dev
source dev/bin/activate
uv pip install -r requirements.txt
cd lmms_eval
uv pip install -e .Before running inference or evaluation, download the following model and datasets from Hugging Face into the specified local directories, such as ./LLaDA/models/ and ./LLaDA/data/.
You may use either huggingface-cli or the Python datasets library to download them.
| Model Name | Hugging Face Repo | Local Path |
|---|---|---|
| LLaDA-8B-Instruct | GSAI-ML/LLaDA-8B-Instruct | ./LLaDA/models/LLaDA-8B-Instruct/ |
| Dataset Name | Hugging Face Repo | Local Path |
|---|---|---|
| GSM8K | openai/gsm8k | ./LLaDA/data/gsm8k/ |
| MATH-500 | HuggingFaceH4/MATH-500 | ./LLaDA/data/math500/ |
| HumanEval | openai/openai_humaneval | ./LLaDA/data/humaneval/ |
| AI2 ARC | allenai/ai2_arc | ./LLaDA/data/ai2_arc/ |
Datasets not listed above are already included in ./LLaDA/data/.
Run the LLaDA demo after setting model_path to your local model path.
cd LLaDA
python demo.pyConfiguration files are located in ./LLaDA/configs/. Before evaluation, complete data_root and model_path in the corresponding YAML file.
Run the default evaluation script:
cd LLaDA
bash eval.shTo adjust generation parameters such as gen_length, steps, and threshold, either edit the corresponding YAML file in ./LLaDA/configs/ or pass command-line overrides through --gen-kwargs in eval.sh:
torchrun --nproc_per_node=8 eval.py \
--config configs/gsm8k.yaml \
--method remix \
--gen-kwargs threshold=0.8,js_threshold=0.2,beta_mix=0.6Note
Parameters passed via --gen-kwargs override values specified in the YAML configuration.
To compare ReMix with other DLLM inference acceleration techniques, implement additional decoding functions in ./LLaDA/model/decoding.py.
Run the MMaDA demo after setting model_path to your local model path.
cd MMaDA
python demo.pyWe use lmms-eval for MMaDA evaluation.
Important
Some benchmarks, such as MathVista, require an auxiliary model for evaluation. Ensure that OPENAI_API_KEY and OPENAI_API_URL are configured before running the scripts.
Run the evaluation script:
cd MMaDA
bash eval.shTo compare ReMix with other DLLM inference acceleration techniques, implement additional decoding functions within the MMadaModelLM class:
- For
demo.py, modify./MMaDA/models/modeling_mmada.py. - For evaluation, modify
./MMaDA/lmms_eval/lmms_eval/models/model_mmada/modeling_mmada.py.
To reproduce TPS and latency metrics or apply custom modifications, refer to the MMaDA.generate_until method in ./MMaDA/lmms_eval/lmms_eval/models/mmada.py.
If you find this work useful, please cite:
@inproceedings{ye2026remix,
title = {Rejection Mixing: Fast Semantic Propagation of Mask Tokens for Efficient DLLM Inference},
author = {Ye, Yushi and Hong, Feng and Zheng, Huangjie and Chen, Xu and Chen, Zhiyong and Wang, Yanfeng and Yao, Jiangchao},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2026}
}This implementation is based on the WINO codebase.