Project Page · arXiv · Setup · Data Preparation · Training · Citation
This is the official repository for GASP: Unifying Geometric and Semantic Self-Supervised Pre-training for Autonomous Driving, accepted at WACV 2026.
Self-supervised pre-training based on next-token prediction has enabled large language models to capture the underlying structure of text, and has led to unprecedented performance on a large array of tasks when applied at scale. Similarly, autonomous driving generates vast amounts of spatiotemporal data, alluding to the possibility of harnessing scale to learn the underlying geometric and semantic structure of the environment and its evolution over time.
We propose a geometric and semantic self-supervised pre-training method, GASP, that learns a unified representation by predicting, at any queried future point in spacetime:
- General occupancy — capturing the evolving structure of the 3D scene
- Ego occupancy — modeling the ego vehicle path through the environment
- Distilled vision foundation model features — high-level semantic features from DINOv2
By modeling geometric and semantic 4D occupancy fields instead of raw sensor measurements, the model learns a structured, generalizable representation of the environment and its evolution through time. We validate GASP on multiple autonomous driving benchmarks, demonstrating significant improvements in semantic occupancy forecasting, online mapping, and ego trajectory prediction.
From the repository root, run:
bash docker/build_docker.shThis builds a Docker image tagged gasp:latest based on CUDA 11.8 / Ubuntu 22.04, installs all Python and CUDA dependencies, and compiles the custom CUDA extensions.
Tip: To speed up the build, edit
docker/Dockerfileand remove CUDA architectures inCUDA_ARCHITECTURESthat do not match your GPU. See NVIDIA CUDA GPUs for your architecture number.
Copy or create docker/.env with your credentials:
WANDB_API_KEY=your_wandb_key
HF_TOKEN=your_huggingface_tokenYou can also set the paths to your datasets and outputs (defaults shown):
DATASET_ROOT=/datasets/
DINO_DIR=/dino_features/
OUTPUT_DIR=/outputs/# Start the container in the background
./docker/start_docker.sh
# Attach a shell to the running container
./docker/into_docker.sh
# Stop the container when done
./docker/stop_docker.shTo run a one-off command directly (foreground, exits when done):
./docker/start_docker.sh "python gasp/train.py gasp-av2"Download the Argoverse 2 Sensor Dataset from the official Argoverse website.
GASP predicts distilled DINOv2 features as a pre-training target. There are two ways to provide them:
Pre-extract and cache Denoised DINOv2 features for the full dataset:
python gasp/scripts/extract_dino_features.py \
--data-dir /datasets/argoverse2 \
--output-dir /dino_features/This allows fast dataloading during training by reading cached .npy files instead of running the vision encoder on-the-fly.
While not part of the paper, for convinience, features can also be computed on the fly using DINOv3 with optional dimensionality reduction. To generate a pre-computed reduction matrix (recommended for efficiency):
python gasp/scripts/generate_dinov3_reduction.py \
--data-dir /datasets/argoverse2 \
--output-dir /dino_features/Without a reduction matrix, a subset of feature dimensions will be used automatically.
To fill in missing LiDAR points in the dataset:
python gasp/scripts/extract_missing_points.py \
--data-dir /datasets/argoverse2Training is launched inside the Docker container via gasp/train.py. The script uses a subcommand-based CLI — run with -h for help at any level:
# Top-level help
python gasp/train.py -h
# Help for the GASP pre-training config
python gasp/train.py gasp-av2 -hpython gasp/train.py gasp-av2python gasp/train.py uno-av2Trainer arguments (e.g. number of nodes, batch size) come before the subcommand; model arguments come after:
python gasp/train.py [trainer args] gasp-av2 [model args]If you find this work useful, please consider citing:
@article{ljungbergh2025gasp,
title = {GASP: Unifying Geometric and Semantic Self-Supervised Pre-training for Autonomous Driving},
author = {Ljungbergh, William and Lilja, Adam and Tonderski, Adam and Laveno Ling, Arvid and Lindstr{\"o}m, Carl and Verbeke, Willem and Fu, Junsheng and Petersson, Christoffer and Hammarstrand, Lars and Felsberg, Michael},
journal = {WACV},
year = {2026}
}