Skip to content

yjy415/Spectral-Evolution-Search

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Spectral Evolution Search

Official implementation of Spectral Evolution Search (SES):

Spectral Evolution Search: Efficient Inference-Time Scaling for Reward-Aligned Image Generation

SES is a training-free inference-time scaling framework for text-to-image generation. It improves reward alignment by searching over the initial noise in a low-frequency wavelet subspace with the Cross-Entropy Method (CEM), without updating any generator or reward-model parameters.

The paper has been accepted to ICML 2026.

Spectral Evolution Search overview

What SES Does

Inference-time scaling allocates extra compute at inference time to improve generated outputs. In text-to-image models, a direct option is to search over the initial noise. Full-space noise search is expensive because the latent space is high-dimensional and many perturbation directions have weak visual impact.

SES reduces this search space by decomposing the initial noise with a Discrete Wavelet Transform (DWT). It optimizes only the low-frequency coefficients, which strongly affect global image structure, while keeping high-frequency coefficients fixed. A gradient-free CEM loop then samples candidates, decodes images, scores them with a reward model, and updates the search distribution toward higher-reward regions.

Key properties:

  • training-free and plug-and-play
  • gradient-free reward optimization
  • works with diffusion and flow-matching text-to-image models
  • supports single-prompt and CSV batch evaluation
  • compatible with multiple reward models

Results

Qualitative results

Quantitative results

Installation

Use Python 3.10 or later. A fresh conda environment is recommended.

conda create -n ses python=3.10
conda activate ses

Install PyTorch for your CUDA version. For example, with CUDA 12.1:

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

Then install the remaining dependencies:

pip install -r requirements.txt

Quick Start

Run SES on a single prompt with the default backbone, sdxl-turbo, and default reward model, pick.

bash run.sh \
  --prompt "An orange colored sandwich." \
  --model_id sdxl-turbo \
  --reward_model pick \
  --save_dir outputs/demo_single \
  --total_eval_budget 200

Run SES on a CSV file:

bash run.sh \
  --prompt_csv prompts.csv \
  --model_id sdxl \
  --reward_model hps \
  --save_dir outputs/demo_batch \
  --total_eval_budget 50

Supported Models

Use --model_id to choose the image generator.

model_id Public model source Default size
sdxl-turbo stabilityai/sdxl-turbo 512 x 512
sd1-4 CompVis/stable-diffusion-v1-4 512 x 512
sdxl stabilityai/stable-diffusion-xl-base-1.0 1024 x 1024
flux black-forest-labs/FLUX.1-dev 1024 x 1024
qwen-image Qwen/Qwen-Image 1024 x 1024

Supported Reward Models

Use --reward_model to choose the optimization objective.

reward_model Default source
pick yuvalkirstain/PickScore_v1
clip openai/clip-vit-large-patch14
hps adams-story/HPSv2-hf
aes camenduru/improved-aesthetic-predictor
ir ImageReward package

Proxy Reward Evaluation

Reward evaluation is often the runtime bottleneck because each candidate must be decoded before scoring. SES supports proxy evaluation by using fewer diffusion steps during search and more steps for the final image.

Example:

bash run.sh \
  --prompt "A beautiful girl." \
  --model_id qwen-image \
  --reward_model aes \
  --num_inference_steps 10 \
  --final_num_inference_steps 30 \
  --total_eval_budget 50 \
  --save_dir outputs/proxy_demo

This evaluates candidate images with 10-step generations, then decodes the final selected noise with 30 steps.

DiffSynth-Studio Integration

SES has also been integrated into DiffSynth-Studio for inference-time scaling research:

https://github.com/modelscope/DiffSynth-Studio/blob/main/docs/en/Research_Tutorial/inference_time_scaling.md

Citation

If you find this repository useful, please cite:

@article{ye2026spectral,
  title={Spectral Evolution Search: Efficient Inference-Time Scaling for Reward-Aligned Image Generation},
  author={Ye, Jinyan and Duan, Zhongjie and Li, Zhiwen and Chen, Cen and Chen, Daoyuan and Li, Yaliang and Chen, Yingda},
  journal={arXiv preprint arXiv:2602.03208},
  year={2026}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors