Official repository for the CVPR 2026 paper "Text-Image Conditioned 3D Generation".
Jiazhong Cen1,2, Jiemin Fang2,✉, Sikuang Li1,2, Guanjun Wu3,2, Chen Yang2, Taoran Yi3,2, Zanwei Zhou1,2, Zhikuan Bao2, Lingxi Xie2, Wei Shen1,✉, Qi Tian2
1 MoE Key Lab of Artificial Intelligence, AI Institute, School of Computer Science, Shanghai Jiao Tong University
2 Huawei Inc.
3 Huazhong University of Science and Technology
Contact: jaminfong@gmail.com, wei.shen@sjtu.edu.cn
TIGON is a text-image conditioned 3D generation framework that supports:
- text-to-3D generation
- image-to-3D generation
- interleaved text-image conditioned 3D generation
The repository currently provides the inference pipeline and demo entry for interactive generation.
Please create the runtime environment from environment.yml:
conda env create -f environment.yml
conda activate tigonAfter the base environment is ready, create an external directory under the repository root and install the required external dependencies:
mkdir -p external
cd external
git clone https://github.com/autonomousvision/mip-splatting.git
pip install mip-splatting/submodules/diff-gaussian-rasterization --no-build-isolation
pip install flash-attn --no-build-isolation
git clone https://github.com/NVlabs/nvdiffrast.git
pip install ./nvdiffrast --no-build-isolation
git clone https://github.com/facebookresearch/dinov3.gitThen place the DINOv3 ViT-H/16+ checkpoint at:
./external/dinov3_vith16plus_pretrain_lvd1689m-7c1da9a5.pthThe environment used by TIGON is the same as the environments used by TRELLIS and UniLat3D. If you have already prepared either of those environments, you can directly use this repository in most cases.
You still need to make sure the extra components required by TIGON are correctly prepared, especially:
- CLIP-related dependencies in the environment
- DINOv3 codebase and weight file under
external
The pretrained checkpoint is available at the Hugging Face repository below:
After downloading the checkpoint, place the mix_e2e_pipe folder under the repository root:
tigon/
|-- mix_e2e_pipe/
|-- demo.py
|-- trellis/
|-- configs/
|-- ...The demo script loads the checkpoint from:
./mix_e2e_pipeAfter the environment and checkpoint are ready, run:
python demo.pyThe script supports three generation modes:
- text only
- image only
- text + image interleaved conditioning
During execution, the script will ask for:
- random seed
- text prompt
- image path
Generated results are saved under interactive_output/, including:
- rendered 3D video in
.mp4 - four-view rendered images in
.png - input metadata in
_info.txt - saved reference condition image in
_ref.png
demo.pydefaults toCUDA_VISIBLE_DEVICES=0.- The script enables pipeline offloading by default through
TIGON_ENABLE_OFFLOAD=1. - The checkpoint is expected to provide the
gaussianoutput format for rendering and visualization.
TIGON/
|-- demo.py
|-- environment.yml
|-- configs/
|-- trellis/
|-- condition_images/
|-- external/ # created manually during setup
|-- mix_e2e_pipe/ # downloaded checkpoint folder
If you find this repository useful, please cite:
@inproceedings{cen2026tigon,
title = {Text-Image Conditioned 3D Generation},
author = {Cen, Jiazhong and Fang, Jiemin and Li, Sikuang and Wu, Guanjun and Yang, Chen and Yi, Taoran and Zhou, Zanwei and Bao, Zhikuan and Xie, Lingxi and Shen, Wei and Tian, Qi},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year = {2026}
}This project builds upon the codebase and environment foundations of TRELLIS and UniLat3D. We thank the authors of these projects for making their work available.
