SemanticRouting: Exploring Multi-Layer LLM Feature Weighting for Diffusion Transformers

Bozhou Li, Yushuo Guan, Haolin Li, Bohan Zeng, Yiyan Ji, Yue Ding, Pengfei Wan, Kun Gai, Yuanxing Zhang, Wentao Zhang

Peking University | Kuaishou Technology | Fudan University | Nanjing University | UCAS

Preprint

If this repository helps you, please give it a ⭐ for updates!

📋 Overview

SemanticRouting enhances text-to-image DiT models by introducing a dynamic semantic routing mechanism for multi-layer LLM features. Traditional methods rely on static or single-layer text conditioning, failing to account for the semantic hierarchy in LLMs and the non-stationary denoising dynamics of diffusion models (over time and network depth).

This repository implements a unified normalized convex fusion framework equipped with lightweight gates. We systematically explore:

Time-wise Fusion: Adapting fusion weights to the diffusion timestep $t$.
Depth-wise Fusion: Adapting weights to the DiT block index $d$.
Joint Fusion: Combining both time and depth adaptivity.

Our findings establish Depth-wise Semantic Routing as the superior strategy, significantly improving text-image alignment and compositional generation (e.g., +9.97 on GenAI-Bench Counting), while purely time-wise fusion can suffer from trajectory mismatch issues during inference.

✨ Key Features

🚀 Enhanced Alignment: Delivers strong GenEval and GenAI-Bench results by leveraging hierarchical LLM semantics.
🧩 Unified Framework: Easily switch between fusion strategies (uniform, static, time-wise, depth-wise, joint) via simple config files.
🧠 Semantic Routing: Introduces learnable gating mechanisms to route information from appropriate LLM layers to specific DiT blocks.

📊 Performance

We evaluated our fusion strategies against strong baselines including Penultimate layer (B1), Uniform averaging (B2), Static fusion (B3), and FuseDiT.

Overall Results

Method	GenEval ↑	GenAI-Bench ↑	UnifiedReward ↑
Baselines
B1: Penultimate	64.54	74.96	3.02
B2: Uniform	66.51	76.82	3.06
B3: Static	64.77	76.31	3.05
Deep-fusion Baseline
FuseDiT	60.95	75.02	3.05
Our Strategies
S1: Time	63.41	76.20	2.97
S2: Depth	67.07	79.07	3.06
S3: Joint	66.05	77.44	3.06

Table 1: Comparison on GenEval, GenAI-Bench, and UnifiedReward. S2 (Depth-wise) achieves the best overall performance.

🔬 Methodology

We introduce a unified formulation for multi-layer fusion. The final fused representation $H_{\text{cond}}(t,d)$ is formed via a softmax-normalized convex combination of normalized layer features:

$$ H_{\text{cond}}(t,d) = \sum_{l \in \mathcal{L}} \alpha_{t,d}^{(l)} \cdot \text{LN}(H^{(l)}) $$

Where the weights $\alpha_{t,d}^{(l)}$ are derived by applying a softmax function to learned logits $z_{t,d}$:

$$ \alpha_{t,d} = \mathrm{Softmax}(z_{t,d}) $$

We parameterize $z_{t,d}$ differently for each strategy:

Time-wise: $z_{t,d} = g_{\psi}(\phi(t))$ (Time-Conditioned Fusion Gate).
Depth-wise: $z_{t,d} = \beta_{d}$ (Block-specific learnable weights).
Joint: $z_{t,d} = g_{\psi_d}(\phi(t))$ (Depth-specific TCFG).

🚀 Installation

1. Clone & create environment (Python 3.12)

git clone https://github.com/zooblastlbz/SemanticRouting.git
cd SemanticRouting

conda create -n semanticrouting python=3.12 -y
conda activate semanticrouting

Install deps:

pip install -r requirements.txt

📁 Data Format

The repository expects data in JSON or JSONL format. Default keys are image and text, which can be overridden in config via data.image_key and data.text_key.

image: File path (resolved relative to data.image_root if provided).
text: String or list of strings.

Example:

{
  "image": "images/sample_0001.jpg", 
  "text": "A scenic mountain lake at sunrise"
}

🏋️ Training

1. Select a Fusion Preset

Choose a configuration file from configs/ to define your fusion strategy:

configs/uniform.yaml: Simple averaging of layers.
configs/static.yaml: Learnable global weights.
configs/time-wise.yaml: Time-dependent gating.
configs/depth-wise.yaml: Depth-dependent gating (Recommended).
configs/joint.yaml: Combined time and depth gating.

2. Launch Training

Run via the launcher script (set env vars as needed):

ACCELERATE_CONFIG=./accelerate_config.yaml \
CONFIG_FILE=./configs/depth-wise.yaml \
PYTHON_BIN=python \
MASTER_ADDR=127.0.0.1 \
MASTER_PORT=29500 \
bash scripts/train.sh

🎨 Inference

Export Pipeline

First, export the trained model and necessary components:

python utils/save_pipeline.py \
  --checkpoint /path/to/checkpoint-dir \
  --type adafusedit \
  --vae /path/to/vae \
  --scheduler /path/to/scheduler

Run Generation

Generate images from text prompts:

python inference.py \
  --checkpoint /path/to/exported/pipeline \
  --prompt "A city skyline at dusk" \
  --resolution 512 \
  --num_inference_steps 25 \
  --guidance_scale 6.0

📈 Evaluation

Generate evaluation samples with Accelerate (for multi-GPU/multi-node) using the scripts in evaluation/:

GenEval:

accelerate launch evaluation/sample_geneval.py evaluation/geneval.yaml

GenAIBench:

accelerate launch evaluation/sample_genaibench.py evaluation/genaibench.yaml

DrawBench:

accelerate launch evaluation/sample_drawbench.py evaluation/drawbench.yaml

📝 Citation

If you use this code or the SemanticRouting paper, please cite:

@misc{li2026semanticroutingexploringmultilayer,
      title={Semantic Routing: Exploring Multi-Layer LLM Feature Weighting for Diffusion Transformers}, 
      author={Bozhou Li and Yushuo Guan and Haolin Li and Bohan Zeng and Yiyan Ji and Yue Ding and Pengfei Wan and Kun Gai and Yuanxing Zhang and Wentao Zhang},
      year={2026},
      eprint={2602.03510},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2602.03510}, 
}

🙏 Acknowledgements

This codebase is adapted from and extends tang-bd/fuse-dit. We sincerely thank the original authors for their foundational work.

📄 License

This project is licensed under the Apache-2.0 License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
configs		configs
diffusion		diffusion
evaluation		evaluation
scripts		scripts
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
SemanticRouting.pdf		SemanticRouting.pdf
example_accelerate.py		example_accelerate.py
inference.py		inference.py
requirements.txt		requirements.txt
train.py		train.py
zero2.json		zero2.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SemanticRouting: Exploring Multi-Layer LLM Feature Weighting for Diffusion Transformers

📋 Overview

✨ Key Features

📊 Performance

Overall Results

🔬 Methodology

🚀 Installation

📁 Data Format

🏋️ Training

1. Select a Fusion Preset

2. Launch Training

🎨 Inference

Export Pipeline

Run Generation

📈 Evaluation

📝 Citation

🙏 Acknowledgements

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SemanticRouting: Exploring Multi-Layer LLM Feature Weighting for Diffusion Transformers

📋 Overview

✨ Key Features

📊 Performance

Overall Results

🔬 Methodology

🚀 Installation

📁 Data Format

🏋️ Training

1. Select a Fusion Preset

2. Launch Training

🎨 Inference

Export Pipeline

Run Generation

📈 Evaluation

📝 Citation

🙏 Acknowledgements

📄 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages