Skip to content

Jumpat/tigon

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TIGON: Text-Image Conditioned 3D Generation

Project Page Hugging Face Model Paper on arXiv

Official repository for the CVPR 2026 paper "Text-Image Conditioned 3D Generation".

Supplementary video demo

Authors

Jiazhong Cen1,2, Jiemin Fang2,✉, Sikuang Li1,2, Guanjun Wu3,2, Chen Yang2, Taoran Yi3,2, Zanwei Zhou1,2, Zhikuan Bao2, Lingxi Xie2, Wei Shen1,✉, Qi Tian2

1 MoE Key Lab of Artificial Intelligence, AI Institute, School of Computer Science, Shanghai Jiao Tong University
2 Huawei Inc.
3 Huazhong University of Science and Technology

Contact: jaminfong@gmail.com, wei.shen@sjtu.edu.cn

Overview

TIGON is a text-image conditioned 3D generation framework that supports:

  • text-to-3D generation
  • image-to-3D generation
  • interleaved text-image conditioned 3D generation

The repository currently provides the inference pipeline and demo entry for interactive generation.

Installation

1. Create the environment

Please create the runtime environment from environment.yml:

conda env create -f environment.yml
conda activate tigon

2. Install extra dependencies

After the base environment is ready, create an external directory under the repository root and install the required external dependencies:

mkdir -p external
cd external

git clone https://github.com/autonomousvision/mip-splatting.git
pip install mip-splatting/submodules/diff-gaussian-rasterization --no-build-isolation

pip install flash-attn --no-build-isolation

git clone https://github.com/NVlabs/nvdiffrast.git
pip install ./nvdiffrast --no-build-isolation

git clone https://github.com/facebookresearch/dinov3.git

Then place the DINOv3 ViT-H/16+ checkpoint at:

./external/dinov3_vith16plus_pretrain_lvd1689m-7c1da9a5.pth

3. Compatibility note

The environment used by TIGON is the same as the environments used by TRELLIS and UniLat3D. If you have already prepared either of those environments, you can directly use this repository in most cases.

You still need to make sure the extra components required by TIGON are correctly prepared, especially:

  • CLIP-related dependencies in the environment
  • DINOv3 codebase and weight file under external

Checkpoints

The pretrained checkpoint is available at the Hugging Face repository below:

After downloading the checkpoint, place the mix_e2e_pipe folder under the repository root:

tigon/
|-- mix_e2e_pipe/
|-- demo.py
|-- trellis/
|-- configs/
|-- ...

The demo script loads the checkpoint from:

./mix_e2e_pipe

Inference

After the environment and checkpoint are ready, run:

python demo.py

The script supports three generation modes:

  • text only
  • image only
  • text + image interleaved conditioning

During execution, the script will ask for:

  • random seed
  • text prompt
  • image path

Generated results are saved under interactive_output/, including:

  • rendered 3D video in .mp4
  • four-view rendered images in .png
  • input metadata in _info.txt
  • saved reference condition image in _ref.png

Notes

  • demo.py defaults to CUDA_VISIBLE_DEVICES=0.
  • The script enables pipeline offloading by default through TIGON_ENABLE_OFFLOAD=1.
  • The checkpoint is expected to provide the gaussian output format for rendering and visualization.

Repository Structure

TIGON/
|-- demo.py
|-- environment.yml
|-- configs/
|-- trellis/
|-- condition_images/
|-- external/               # created manually during setup
|-- mix_e2e_pipe/           # downloaded checkpoint folder

Citation

If you find this repository useful, please cite:

@inproceedings{cen2026tigon,
  title     = {Text-Image Conditioned 3D Generation},
  author    = {Cen, Jiazhong and Fang, Jiemin and Li, Sikuang and Wu, Guanjun and Yang, Chen and Yi, Taoran and Zhou, Zanwei and Bao, Zhikuan and Xie, Lingxi and Shen, Wei and Tian, Qi},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year      = {2026}
}

Acknowledgement

This project builds upon the codebase and environment foundations of TRELLIS and UniLat3D. We thank the authors of these projects for making their work available.

About

Official repository of Text-Image Conditioned 3D Generation (TIGON, CVPR 2026)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors