This repository contains the code for Rethinking Image-to-3D Generation with Sparse Queries: Efficiency, Capacity, and Input-View Bias.
Authors: Zhiyuan Xu, Jiuming Liu, Yuxin Chen, Masayoshi Tomizuka, Chenfeng Xu, Chensheng Peng
Affiliations: UC Berkeley, University of Cambridge, UT Austin
We present SparseGen, a novel framework for efficient image-to-3D generation, which exhibits low input-view bias while being significantly faster. Unlike traditional approaches that rely on dense volumetric grids, triplanes, or pixel-aligned primitives, we model scenes with a compact sparse set of learned 3D anchor queries and a learned expansion operator that decodes each transformed query into a small local set of 3D Gaussian primitives. Trained under a rectified-flow reconstruction objective without 3D supervision, our model learns to allocate representation capacity where geometry and appearance matter, achieving significant reductions in memory and inference time while preserving multi-view fidelity. We introduce quantitative measures of input-view bias and utilization to show that sparse queries reduce overfitting to conditioning views while being representationally efficient. Our results argue that sparse set-latent expansion is a principled, practical alternative for efficient 3D generative modeling.
Given
The current codebase expects Python with CUDA-enabled PyTorch.
Create and activate a conda environment:
conda create -n sparsegen python=3.10 git cmake -y
conda activate sparsegenInstall the CUDA 12.4 toolchain inside the conda env:
conda install -c "nvidia/label/cuda-12.4.0" cuda-toolkitInstall prebuilt PyTorch and torchvision wheels for CUDA 12.4:
python -m pip install --index-url https://download.pytorch.org/whl/cu124 torch==2.4.0 torchvision==0.19.0Install ninja, then build pytorch3d from source:
python -m pip install ninja
CUDA_HOME="$CONDA_PREFIX" FORCE_CUDA=1 python -m pip install --no-build-isolation "git+https://github.com/facebookresearch/pytorch3d.git@stable"Install the remaining Python dependencies:
python -m pip install -r requirements.txtNotes:
gsplatandpytorch3dare version-sensitive. The setup above was verified with PyTorch 2.4 and CUDA 12.4 for this repo.- A working
sparsegenenv was verified withnvcc 12.4,torch 2.4.0+cu124,torchvision 0.19.0+cu124,pytorch3d 0.7.8, andgsplat 1.5.3. pytorch3dwas built successfully from thestabletag withCUDA_HOME="$CONDA_PREFIX"and--no-build-isolation.ninjais installed before thepytorch3dbuild to avoid slow fallback compilation.requirements.txtintentionally excludestorch,torchvision, andpytorch3dbecause they must be installed separately in that order.- Training uses
wandbin offline mode by default throughconfigs/spgen.yaml.
Download the dataset from here.
After downloading, unpack the dataset to a local directory and update the dataset root in data_manager/srn.py:
SHAPENET_DATASET_ROOT = "/path/to/SPG_SRN"Expected structure:
SPG_SRN/
└── srn_cars/
├── cars_train/
│ └── <example_id>/
│ ├── intrinsics.txt
│ ├── rgb/
│ │ ├── 000000.png
│ │ └── ...
│ └── opc/
│ ├── 000000.png
│ └── ...
├── cars_val/
│ └── <example_id>/
│ ├── intrinsics.txt
│ ├── rgb/
│ └── opc/
└── cars_test/
└── <example_id>/
├── intrinsics.txt
├── rgb/
└── opc/
Training is launched through scripts/run_train.sh, which wraps train.py with torchrun.
Use the current training config in configs/spgen.yaml:
bash scripts/run_train.shBy default, scripts/run_train.sh uses:
torchrun --standalone --nproc-per-node=2 train.py --config-name spgenAdjust --nproc-per-node in the script if your setup differs.
Hydra writes outputs under its run directory, including the resolved config and checkpoints saved during training.
Evaluation is launched through scripts/run_eval.sh, which wraps eval.py with config configs_eval/default.yaml.
Download the pretrained checkpoint from here
Then extract it so that the checkpoint directory contains both model.pth and .hydra/config.yaml:
tar -xzf ckpt_srn.tar.gzExpected structure:
ckpt_srn/
├── .hydra/
│ └── config.yaml
└── model.pth
The provided evaluation script is:
bash scripts/run_eval.shUpdate the checkpoint path in scripts/run_eval.sh before running:
python3 eval.py \
model_path=/path/to/ckptThe current evaluation path:
- expects a checkpoint file passed as
model_path - loads the training config from the sibling
.hydra/config.yamlnext to that checkpoint - runs one-sample, single-GPU evaluation
- reports
PSNR,LPIPS,SSIM, andFID
train.py: training entrypointeval.py: evaluation entrypointconfigs/spgen.yaml: training configconfigs_eval/default.yaml: evaluation configmodel/spgen_model.py: SparseGen modelmodel/sp_trans.py: transformer backbonemodel/renderer_gs.py: Gaussian renderermodel/sparse_diffusion.py: training wrapper and lossesdata_manager/srn.py: SRN dataset managerevaluation/generator.py: evaluation-time samplingevaluation/metricator.py: evaluation metrics
We thank the viewset-diffusion repository for open-source code.
If you find this repository useful, please consider citing:
@article{xu2026rethinking,
title={Rethinking Image-to-3D Generation with Sparse Queries: Efficiency, Capacity, and Input-View Bias},
author={Xu, Zhiyuan and Liu, Jiuming and Chen, Yuxin and Tomizuka, Masayoshi and Xu, Chenfeng and Peng, Chensheng},
journal={arXiv preprint arXiv:2604.13905},
year={2026}
}