GASP

Unifying Geometric and Semantic Self-Supervised Pre-training for Autonomous Driving

Project Page · arXiv · Setup · Data Preparation · Training · Citation

About

This is the official repository for GASP: Unifying Geometric and Semantic Self-Supervised Pre-training for Autonomous Driving, accepted at WACV 2026.

Self-supervised pre-training based on next-token prediction has enabled large language models to capture the underlying structure of text, and has led to unprecedented performance on a large array of tasks when applied at scale. Similarly, autonomous driving generates vast amounts of spatiotemporal data, alluding to the possibility of harnessing scale to learn the underlying geometric and semantic structure of the environment and its evolution over time.

We propose a geometric and semantic self-supervised pre-training method, GASP, that learns a unified representation by predicting, at any queried future point in spacetime:

General occupancy — capturing the evolving structure of the 3D scene
Ego occupancy — modeling the ego vehicle path through the environment
Distilled vision foundation model features — high-level semantic features from DINOv2

By modeling geometric and semantic 4D occupancy fields instead of raw sensor measurements, the model learns a structured, generalizable representation of the environment and its evolution through time. We validate GASP on multiple autonomous driving benchmarks, demonstrating significant improvements in semantic occupancy forecasting, online mapping, and ego trajectory prediction.

Setup

1. Build the Docker image

From the repository root, run:

bash docker/build_docker.sh

This builds a Docker image tagged gasp:latest based on CUDA 11.8 / Ubuntu 22.04, installs all Python and CUDA dependencies, and compiles the custom CUDA extensions.

Tip: To speed up the build, edit docker/Dockerfile and remove CUDA architectures in CUDA_ARCHITECTURES that do not match your GPU. See NVIDIA CUDA GPUs for your architecture number.

2. Configure environment variables

Copy or create docker/.env with your credentials:

WANDB_API_KEY=your_wandb_key
HF_TOKEN=your_huggingface_token

You can also set the paths to your datasets and outputs (defaults shown):

DATASET_ROOT=/datasets/
DINO_DIR=/dino_features/
OUTPUT_DIR=/outputs/

3. Start and attach to the container

# Start the container in the background
./docker/start_docker.sh

# Attach a shell to the running container
./docker/into_docker.sh

# Stop the container when done
./docker/stop_docker.sh

To run a one-off command directly (foreground, exits when done):

./docker/start_docker.sh "python gasp/train.py gasp-av2"

Data Preparation

Argoverse 2

Download the Argoverse 2 Sensor Dataset from the official Argoverse website.

DINOv2 Features

GASP predicts distilled DINOv2 features as a pre-training target. There are two ways to provide them:

Option A — Pre-compute features (recommended)

Pre-extract and cache Denoised DINOv2 features for the full dataset:

python gasp/scripts/extract_dino_features.py \
    --data-dir /datasets/argoverse2 \
    --output-dir /dino_features/

This allows fast dataloading during training by reading cached .npy files instead of running the vision encoder on-the-fly.

Option B — Compute features on the fly with DINOv3

While not part of the paper, for convinience, features can also be computed on the fly using DINOv3 with optional dimensionality reduction. To generate a pre-computed reduction matrix (recommended for efficiency):

python gasp/scripts/generate_dinov3_reduction.py \
    --data-dir /datasets/argoverse2 \
    --output-dir /dino_features/

Without a reduction matrix, a subset of feature dimensions will be used automatically.

Missing LiDAR Points

To fill in missing LiDAR points in the dataset:

python gasp/scripts/extract_missing_points.py \
    --data-dir /datasets/argoverse2

Training

Training is launched inside the Docker container via gasp/train.py. The script uses a subcommand-based CLI — run with -h for help at any level:

# Top-level help
python gasp/train.py -h

# Help for the GASP pre-training config
python gasp/train.py gasp-av2 -h

GASP pre-training (full — occupancy + ego path + DINO features)

python gasp/train.py gasp-av2

UNO baseline (occupancy only)

python gasp/train.py uno-av2

Trainer arguments (e.g. number of nodes, batch size) come before the subcommand; model arguments come after:

python gasp/train.py [trainer args] gasp-av2 [model args]

Citation

If you find this work useful, please consider citing:

@article{ljungbergh2025gasp,
  title        = {GASP: Unifying Geometric and Semantic Self-Supervised Pre-training for Autonomous Driving},
  author       = {Ljungbergh, William and Lilja, Adam and Tonderski, Adam and Laveno Ling, Arvid and Lindstr{\"o}m, Carl and Verbeke, Willem and Fu, Junsheng and Petersson, Christoffer and Hammarstrand, Lars and Felsberg, Michael},
  journal      = {WACV},
  year         = {2026}
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
docker		docker
docs/_static/imgs		docs/_static/imgs
gasp		gasp
requirements		requirements
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GASP

Unifying Geometric and Semantic Self-Supervised Pre-training for Autonomous Driving

About

Setup

1. Build the Docker image

2. Configure environment variables

3. Start and attach to the container

Data Preparation

Argoverse 2

DINOv2 Features

Option A — Pre-compute features (recommended)

Option B — Compute features on the fly with DINOv3

Missing LiDAR Points

Training

GASP pre-training (full — occupancy + ego path + DINO features)

UNO baseline (occupancy only)

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Languages

Folders and files

Latest commit

History

Repository files navigation

GASP

Unifying Geometric and Semantic Self-Supervised Pre-training for Autonomous Driving

About

Setup

1. Build the Docker image

2. Configure environment variables

3. Start and attach to the container

Data Preparation

Argoverse 2

DINOv2 Features

Option A — Pre-compute features (recommended)

Option B — Compute features on the fly with DINOv3

Missing LiDAR Points

Training

GASP pre-training (full — occupancy + ego path + DINO features)

UNO baseline (occupancy only)

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Languages

Packages