conda env create -f environment.yml
python scripts/download_coco17.py
python scripts/download_cub200.py
python scripts/download_flickr30k.py
For COCO17 and Flickr30K, we use the CLIP model released by OpenAI to evaluate CLIPScores for each image-caption pair. Then, we use the caption with the best CLIPScore for inference and evaluation purposes, i.e.,
python scripts/measure_clipscore.py
python scripts/extract_one_caption_to_txt_best_clipscore.pyhuggingface-cli login
CUDA_VISIBLE_DEVICES=0 bash inference_sd35_flickr30k.sh
CUDA_VISIBLE_DEVICES=0 bash inference_sd35_coco17.sh
CUDA_VISIBLE_DEVICES=0 bash inference_sd35_cub200.sh
CUDA_VISIBLE_DEVICES=0 bash inference_sd35_lookahead_flickr30k.sh
CUDA_VISIBLE_DEVICES=0 bash inference_sd35_lookahead_coco17.sh
CUDA_VISIBLE_DEVICES=0 bash inference_sd35_lookahead_cub200.sh
CUDA_VISIBLE_DEVICES=0 bash inference_sd35_lookback_flickr30k.sh
CUDA_VISIBLE_DEVICES=0 bash inference_sd35_lookback_coco17.sh
CUDA_VISIBLE_DEVICES=0 bash inference_sd35_lookback_cub200.shWe also provide the code of two baselines: AFloPS and Self-Guidance.
If you find this repository useful, please cite:
@article{luo2026look,
title={Look-Ahead and Look-Back Flows: Training-Free Image Generation with Trajectory Smoothing},
author={Luo, Yan and Huang, Henry and Zhou, Todd Y and Wang, Mengyu},
journal={arXiv preprint arXiv:2602.09449},
year={2026}
}