SparseGen

About

This repository contains the code for Rethinking Image-to-3D Generation with Sparse Queries: Efficiency, Capacity, and Input-View Bias.

Authors: Zhiyuan Xu, Jiuming Liu, Yuxin Chen, Masayoshi Tomizuka, Chenfeng Xu, Chensheng Peng
Affiliations: UC Berkeley, University of Cambridge, UT Austin

We present SparseGen, a novel framework for efficient image-to-3D generation, which exhibits low input-view bias while being significantly faster. Unlike traditional approaches that rely on dense volumetric grids, triplanes, or pixel-aligned primitives, we model scenes with a compact sparse set of learned 3D anchor queries and a learned expansion operator that decodes each transformed query into a small local set of 3D Gaussian primitives. Trained under a rectified-flow reconstruction objective without 3D supervision, our model learns to allocate representation capacity where geometry and appearance matter, achieving significant reductions in memory and inference time while preserving multi-view fidelity. We introduce quantitative measures of input-view bias and utilization to show that sparse queries reduce overfitting to conditioning views while being representationally efficient. Our results argue that sparse set-latent expansion is a principled, practical alternative for efficient 3D generative modeling.

Given $V$ input views (clean and/or noisy) with known camera poses, an image encoder (with adaLN timesteps) and a 3D position encoder generate position-aware image features. A sparse set of learnable 3D anchor queries attends to these fused features in a transformer-based expansion network and is decoded into a compact set of 3D Gaussians. Finally, the generated Gaussians are rendered for target views via differentiable splatting, enabling fast, high-quality 3D generation and rendering.

Setup

The current codebase expects Python with CUDA-enabled PyTorch.

Create and activate a conda environment:

conda create -n sparsegen python=3.10 git cmake -y
conda activate sparsegen

Install the CUDA 12.4 toolchain inside the conda env:

conda install -c "nvidia/label/cuda-12.4.0" cuda-toolkit

Install prebuilt PyTorch and torchvision wheels for CUDA 12.4:

python -m pip install --index-url https://download.pytorch.org/whl/cu124 torch==2.4.0 torchvision==0.19.0

Install ninja, then build pytorch3d from source:

python -m pip install ninja
CUDA_HOME="$CONDA_PREFIX" FORCE_CUDA=1 python -m pip install --no-build-isolation "git+https://github.com/facebookresearch/pytorch3d.git@stable"

Install the remaining Python dependencies:

python -m pip install -r requirements.txt

Notes:

gsplat and pytorch3d are version-sensitive. The setup above was verified with PyTorch 2.4 and CUDA 12.4 for this repo.
A working sparsegen env was verified with nvcc 12.4, torch 2.4.0+cu124, torchvision 0.19.0+cu124, pytorch3d 0.7.8, and gsplat 1.5.3.
pytorch3d was built successfully from the stable tag with CUDA_HOME="$CONDA_PREFIX" and --no-build-isolation.
ninja is installed before the pytorch3d build to avoid slow fallback compilation.
requirements.txt intentionally excludes torch, torchvision, and pytorch3d because they must be installed separately in that order.
Training uses wandb in offline mode by default through configs/spgen.yaml.

Data

Download the dataset from here.

After downloading, unpack the dataset to a local directory and update the dataset root in data_manager/srn.py:

SHAPENET_DATASET_ROOT = "/path/to/SPG_SRN"

Expected structure:

SPG_SRN/
└── srn_cars/
    ├── cars_train/
    │   └── <example_id>/
    │       ├── intrinsics.txt
    │       ├── rgb/
    │       │   ├── 000000.png
    │       │   └── ...
    │       └── opc/
    │           ├── 000000.png
    │           └── ...
    ├── cars_val/
    │   └── <example_id>/
    │       ├── intrinsics.txt
    │       ├── rgb/
    │       └── opc/
    └── cars_test/
        └── <example_id>/
            ├── intrinsics.txt
            ├── rgb/
            └── opc/

Training

Training is launched through scripts/run_train.sh, which wraps train.py with torchrun.

Use the current training config in configs/spgen.yaml:

bash scripts/run_train.sh

By default, scripts/run_train.sh uses:

torchrun --standalone --nproc-per-node=2 train.py --config-name spgen

Adjust --nproc-per-node in the script if your setup differs.

Hydra writes outputs under its run directory, including the resolved config and checkpoints saved during training.

Evaluation

Evaluation is launched through scripts/run_eval.sh, which wraps eval.py with config configs_eval/default.yaml.

Download the pretrained checkpoint from here

Then extract it so that the checkpoint directory contains both model.pth and .hydra/config.yaml:

tar -xzf ckpt_srn.tar.gz

Expected structure:

ckpt_srn/
├── .hydra/
│   └── config.yaml
└── model.pth

The provided evaluation script is:

bash scripts/run_eval.sh

Update the checkpoint path in scripts/run_eval.sh before running:

python3 eval.py \
        model_path=/path/to/ckpt

The current evaluation path:

expects a checkpoint file passed as model_path
loads the training config from the sibling .hydra/config.yaml next to that checkpoint
runs one-sample, single-GPU evaluation
reports PSNR, LPIPS, SSIM, and FID

Repo Structure

train.py: training entrypoint
eval.py: evaluation entrypoint
configs/spgen.yaml: training config
configs_eval/default.yaml: evaluation config
model/spgen_model.py: SparseGen model
model/sp_trans.py: transformer backbone
model/renderer_gs.py: Gaussian renderer
model/sparse_diffusion.py: training wrapper and losses
data_manager/srn.py: SRN dataset manager
evaluation/generator.py: evaluation-time sampling
evaluation/metricator.py: evaluation metrics

Acknowledgement

We thank the viewset-diffusion repository for open-source code.

Citation

If you find this repository useful, please consider citing:

@article{xu2026rethinking,
  title={Rethinking Image-to-3D Generation with Sparse Queries: Efficiency, Capacity, and Input-View Bias},
  author={Xu, Zhiyuan and Liu, Jiuming and Chen, Yuxin and Tomizuka, Masayoshi and Xu, Chenfeng and Peng, Chensheng},
  journal={arXiv preprint arXiv:2604.13905},
  year={2026}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SparseGen

About

Setup

Data

Training

Evaluation

Repo Structure

Acknowledgement

Citation

About

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
assets		assets
configs		configs
configs_eval		configs_eval
data_manager		data_manager
evaluation		evaluation
model		model
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
camera.py		camera.py
eval.py		eval.py
requirements.txt		requirements.txt
train.py		train.py
utils.py		utils.py

Folders and files

Latest commit

History

Repository files navigation

SparseGen

About

Setup

Data

Training

Evaluation

Repo Structure

Acknowledgement

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages