GPU-accelerated Markov Chain Monte Carlo (MCMC) framework for geostatistical inference of subglacial bed topography. Built on CuPy, gstatsim_custom and extending the gstatsMCMC implementation by Niya Shao, this project performs geostatistical Monte Carlo Markvo Chain method for inversion - entirely on the HPC of UF (L4 Turin GPUs and Blackwell B200s).
-
SGS context: Each MCMC proposal modifies a block of the spatial field simulate() re-simulates just that block conditioned on everything else → the MCMC kernel accepts or rejects → repeat. The class avoids re-initializing GPU resources between proposals, which is where most of the performance gain comes from.
- Based on gstatsim_custom's implementation of SGS on GPU, SGS works by visiting each unknown grid cell one at a time in a random order, and at each cell:
- Finding nearby known values (neighbors)
- Kriging (ordinary kriging or spatial kriging) - fitting a local estimate and uncertainty using the variogram
- Drawing a sample from that local Gaussian distribution
- Treating the new value as known for all subsequent cells This produces a statistically plausible realization of the spatial field.
- Based on gstatsim_custom's implementation of SGS on GPU, SGS works by visiting each unknown grid cell one at a time in a random order, and at each cell:
-
Fused CUDA kernels for mass conservation : Ice flux divergence and full mass balance residuals are computed in a single custom CUDA kernel.
-
On-GPU normal score transformation : Quantile-based forward/inverse transforms keep data GPU-resident throughout the chain, referencing sk-learn QuantileTransformer
-
Multi-GPU parallelism : Run independent MCMC chains across multiple GPUs with per-device process and batch scheduling.
-
- If we want to run n chains = [a1, a2, a3, ,.. an] and have 3 GPUs available. Each pass would go: [a1, a2, a3] -> [a4, a5, a6] -> [a7, a8, a9] -> [a10]
Main bottleneck is that Sequential Gaussian Simulation suffers on large grids (this is ~1500x1500), so running SGS for N interations takes a longgggg time. Hence, utilizing GPU is necessary to speed up this process.
gstatsMCMC-CuPy/
├── MCMC_GPU/ # Core GPU module
│ ├── MCMC_cu.py # MCMC chain class
│ ├── MC_res.py # Fused CUDA kernel for mass conservation residuals
│ ├── SGS_GPU.py # Sequential Gaussian Simulation context manager
│ ├── QuantileTransformer_gpu.py # GPU normal score transform
│ ├── Utilities.py # GPU utility functions
│ └── gstatsim_custom_gpu/ # Kriging & spatial statistics <><> From *Matthew Jones*
│ ├── _krige_gpu.py # Batch kriging solver
│ ├── covariance_gpu.py # Variogram model registry
│ ├── besselk_gpu.py # Custom CUDA Bessel K function
│ ├── interpolate_gpu.py # Neighbor-based kriging interpolation
│ ├── neighbors_gpu.py # KDTree neighbor search & stencil generation
│ └── utilities_gpu.py # Gaussian transform, memory tuning
├── scripts/
│ └── T4_GPU_SmallScaleChain.ipynb # Tutorial notebook for running a single Small Scale Chain
├── smallScaleChain_multiprocessing_GPU.py # Multi-GPU chain driver using multiprocessing + spawn context
├── data/
│ ├── Supprt_Force.csv # Glacial survey data (coords, vel, bath, SMB,.. )
│ ├── bed_1000k.npy # Bed elevation after 1M LargeScaleChain iterations
│ ├── bed_2000k.npy # Bed elevation after 2M LargeScaleChain iterations
│ ├── 200_seeds.txt # RNG seeds for reproducible chains
│ └── data_weights_SF.txt # Data weights / uncertainties
└── tests/ # Unit and benchmark tests
├── test_mcmc_comparison.py # Single Kernel in C vs. Normal CuPy (6 Kernel)
├── test_preprocess.py # CuPy vs. Numpy
├── test_sgs.py # Test batch SGS vs. gstatsim
└── test_quantile.py # sklearn QT vs. CuPy QT wrapper
- Python 3.12.11+
- NVIDIA GPU with CUDA
- CUDA Toolkit 12.2+
- CUPY 13.4.1
- rapidsai/25.06
pip install cupy-cuda12x # adjust for your CUDA version (e.g., cupy-cuda11x)
pip install numpy scipy pandas scikit-learn scikit-gstat verde
module load rapidsai/25.06 # HiPerGatorgit clone https://github.com/tylerrleee/gstatsMCMC-CuPy
cd gstatsMCMC-CuPyimport cupy as cp
import pandas as pd
from MCMC_GPU.MCMC_cu import chain_sgs_gpu
# 1. Load data
df = pd.read_csv("data/Supprt_Force.csv")
xx = cp.array(df["x"].unique(), dtype=cp.float64)
yy = cp.array(df["y"].unique(), dtype=cp.float64)
# 2. Initialize chain
chain = chain_sgs_gpu(xx, yy, bed_init, ice_surface, velx, vely, smb, dhdt, ice_mask)
# 3. Configure
chain.set_variogram(model="matern", range_val=23.6, sill=0.44, shape=0.31)
chain.set_normal_transformation(quantile_transformer)
chain.set_trend(trend_surface)
chain.set_block_sizes(min_block=5, max_block=20)
chain.set_sgs_param(n_neighbors=48, search_radius=30_000)
chain.set_loss_type(sigma_mc=3.0)
chain.set_update_region(high_velocity_mask)
# 4. Run MCMC
chain.run(n_iterations=100_000)For a complete walkthrough with data loading, variogram fitting, and result visualization, see scripts/T4_GPU_SmallScaleChain.ipynb.


