OpenBioSim · lohedges · Jan 21, 2026 · Jan 13, 2026 · Jan 14, 2026 · Jan 15, 2026
diff --git a/README.md b/README.md
@@ -10,18 +10,19 @@
 [![Conda Version](https://anaconda.org/openbiosim/loch/badges/downloads.svg)](https://anaconda.org/openbiosim/loch)
 [![License: GPL v3](https://img.shields.io/badge/License-GPL_v3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0.en.html)
 
-CUDA accelerated Grand Canonical Monte Carlo (GCMC) water sampling code. Built
+CUDA/OpenCL accelerated Grand Canonical Monte Carlo (GCMC) water sampling code. Built
 on top of [Sire](https://github.com/OpenBioSim/sire),
 [BioSimSpace](https://github.com/OpenBioSim/biosimspace),
-[OpenMM](https://github.com/openmm/openmm), and
-[PyCUDA](https://documen.tician.de/pycuda/index.html#).
+[OpenMM](https://github.com/openmm/openmm),
+[PyCUDA](https://documen.tician.de/pycuda/index.html#),
+and [PyOpenCL](https://documen.tician.de/pyopencl/).
 
 ## Installation
 
 First, create a conda environment with the required dependencies:
 
 ```
-conda create -f environment.yaml
+conda env create -f environment.yaml
 conda activate loch
 ```
 
@@ -49,7 +50,7 @@ conda install -c conda-forge -c openbiosim/label/dev loch
 
 Instead of computing the energy change for each trial insertion/deletion with
 OpenMM, the calculation is performed at the reaction field (RF) level using
-a custom CUDA kernel, allowing multiple candidates to be evaluated
+a custom CUDA/OpenCL kernel, allowing multiple candidates to be evaluated
 simultaneously. Particle mesh Ewald (PME) is handled via the method for
 sampling from an approximate potential (in this case the RF potential)
 introduced [here](https://doi.org/10.1063/1.1563597). Parallelisation of the
@@ -228,8 +229,9 @@ to enhance sampling.
 Once finished, `mu_ex` will contain the computed excess chemical potential in units
 kcal/mol.
 
-Note that the simulation requires a system with CUDA support. Please set the
-`CUDA_VISIBLE_DEVICES` environment variable accordingly.
+Note that the simulation requires a system with CUDA or OpenCL support. Please
+set the `CUDA_VISIBLE_DEVICES` or `OPENCL_VISIBLE_DEVICES` environment variable
+accordingly.
 
 The standard volume can be computed as follows:
 
@@ -263,13 +265,11 @@ Free Energy Perturbation (FEP) with GCMC using `loch` is supported via the
 
 ## Notes
 
-* Make sure that `nvcc` is in your `PATH`. If you require a different `nvcc` to that
-  provided by conda, you can set the `PYCUDA_NVCC` environment variable to point
-  to the desired `nvcc` binary, or use the `nvcc` kwarg in the `GCMCSampler` constructor.
-  Depending on your setup, you may also need to install the `cuda-nvvm` package from
-  `conda-forge`.
-
-* A future version supporting AMD GPUs via PyOpenCL is planned.
+* When using the CUDA platform, make sure that `nvcc` is in your `PATH`. If you require
+  a different `nvcc` to that provided by conda, you can set the `PYCUDA_NVCC` environment
+  variable to point to the desired `nvcc` binary, or use the `nvcc` kwarg in the
+  `GCMCSampler` constructor. Depending on your setup, you may also need to install the
+  `cuda-nvvm` package from `conda-forge`.
 
 * OpenMM-to-Sire roundtrip example:
 

diff --git a/WHITEPAPER.md b/WHITEPAPER.md
@@ -1,22 +1,23 @@
-# Loch: CUDA accelerated Grand Canonical Monte Carlo (GCMC) water sampling
+# Loch: GPU accelerated Grand Canonical Monte Carlo (GCMC) water sampling
 
 ## Introduction
 
-We present `loch`, a high-performance CUDA-accelerated Python package designed
+We present `loch`, a high-performance GPU-accelerated Python package designed
 for Grand Canonical Monte Carlo (GCMC) water sampling in molecular simulations
 via [OpenMM](https://openmm.org/). To enable parallelisation of insertion and
-deletion attempts, `loch` leverages GPU capabilities using a custom CUDA kernel
-for nonbonded interactions. This allows thousands of GCMC trials to be attempted
-in parallel, significantly enhancing sampling efficiency compared to traditional
-CPU-based implementations that perform sequential attempts via the OpenMM Python
-API. Additionally, electrostatics for GCMC attempts are computed using the
-reaction field (RF) method, with accepted candidates being re-evaluated with a
-correction step based on the difference between reaction field and Particle Mesh
-Ewald (PME) potential energies. The use of an approximate potential for trial
-moves leads to a substantial speed-up in GCMC move evaluation. `loch` has been
-designed to be modular, allowing standalone GCMC sampling, or integration with
-OpenMM-based molecular dynamics simulation code, e.g. as has been done in the
-[SOMD2](https://github.com/openbiosim/somd2) free-energy perturbation engine.
+deletion attempts, `loch` leverages GPU capabilities using a custom CUDA/OpenCL
+kernel for nonbonded interactions. This allows thousands of GCMC trials to be
+attempted in parallel, significantly enhancing sampling efficiency compared to
+traditional CPU-based implementations that perform sequential attempts via the
+OpenMM Python API. Additionally, electrostatics for GCMC attempts are computed
+using the reaction field (RF) method, with accepted candidates being
+re-evaluated with a correction step based on the difference between reaction
+field and Particle Mesh Ewald (PME) potential energies. The use of an
+approximate potential for trial moves leads to a substantial speed-up in GCMC
+move evaluation. `loch` has been designed to be modular, allowing standalone
+GCMC sampling, or integration with OpenMM-based molecular dynamics simulation
+code, e.g. as has been done in the [SOMD2](https://github.com/openbiosim/somd2)
+free-energy perturbation engine.
 
 ## Parallelisation strategy
 
@@ -52,6 +53,14 @@ each iteration, as more trials need to be evaluated in parallel, and more data
 needs to be transferred to and from the GPU, in which case it might be more
 efficient to simply perform more iterations with a smaller batch size.
 
+To enable reproduciblility across GPU platforms we choose to generate random
+numbers on the host using NumPy's random number generator, then transfer these
+to the GPU kernels where required. This avoids differences in random number
+generation across different GPU architectures and drivers, making testing
+and validation of the implementation significantly easier. In benchmarks we
+have found the NumPy approach to be as performant as using GPU-based random
+numbers for the typical batch sizes employed in `loch`.
+
 ## Sampling from an approximate potential
 
 In order to further accelerate the evaluation of GCMC insertion and deletion
@@ -91,7 +100,7 @@ Other than the cost of evaluating GCMC trials using PME, performance is aslo
 impacted by the cost of updating nonbonded parameters and atomic positions
 in the OpenMM context after each accepted insertion or deletion. (No updates
 are required for trial moves, since these are all evaluated via the custom
-CUDA kernel.) [Recent updates](https://github.com/openmm/openmm/pull/4610)
+CUDA/OpenCL kernel.) [Recent updates](https://github.com/openmm/openmm/pull/4610)
 to OpenMM have helped mitigate the cost of modifying force field parameters,
 allowing updates for only the subset of parameters that have changed within
 a particular force. However, updating atomic positions still requires

diff --git a/environment.yaml b/environment.yaml
@@ -8,3 +8,4 @@ dependencies:
   - biosimspace
   - loguru
   - pycuda
+  - pyopencl
diff --git a/examples/bpti/bpti.py b/examples/bpti/bpti.py
@@ -53,6 +53,14 @@
     choices=["info", "debug", "error"],
     required=False,
 )
+parser.add_argument(
+    "--platform",
+    help="The GPU platform to use",
+    type=str,
+    default="auto",
+    choices=["auto", "cuda", "opencl"],
+    required=False,
+)
 
 args = parser.parse_args()
 
@@ -78,6 +86,7 @@
     num_ghost_waters=100,
     bulk_sampling_probability=0,
     log_level=args.log_level,
+    platform=args.platform,
     overwrite=True,
 )
 
@@ -92,6 +101,7 @@
     pressure=None,
     constraint="h_bonds",
     timestep="2 fs",
+    platform=args.platform,
 )
 d.randomise_velocities()
 

diff --git a/examples/scytalone/sd.py b/examples/scytalone/sd.py
@@ -64,6 +64,14 @@
     choices=["info", "debug", "error"],
     required=False,
 )
+parser.add_argument(
+    "--platform",
+    help="The GPU platform to use",
+    type=str,
+    default="auto",
+    choices=["auto", "cuda", "opencl"],
+    required=False,
+)
 args = parser.parse_args()
 
 # Store the ligand index.
@@ -90,6 +98,7 @@
     ghost_file=f"ghosts_{lig}.txt",
     log_file=f"gcmc_{lig}.txt",
     log_level=args.log_level,
+    platform=args.platform,
     overwrite=True,
 )
 
@@ -104,6 +113,7 @@
     pressure=None,
     constraint="h_bonds",
     timestep="2 fs",
+    platform=args.platform,
 )
 d.randomise_velocities()
 

diff --git a/examples/water/water.py b/examples/water/water.py
@@ -66,6 +66,14 @@
     choices=["info", "debug", "error"],
     required=False,
 )
+parser.add_argument(
+    "--platform",
+    help="The GPU platform to use",
+    type=str,
+    default="auto",
+    choices=["auto", "cuda", "opencl"],
+    required=False,
+)
 args = parser.parse_args()
 
 # Load the water box.
@@ -91,6 +99,8 @@
     temperature=args.temperature,
     num_ghost_waters=100,
     log_level=args.log_level,
+    platform=args.platform,
+    overwrite=True,
 )
 
 # Create a dynamics object using the modified GCMC system.
@@ -104,6 +114,7 @@
     pressure=None,
     constraint="h_bonds",
     timestep="2 fs",
+    platform=args.platform,
 )
 d.randomise_velocities()
 

diff --git a/recipes/loch/template.yaml b/recipes/loch/template.yaml
@@ -18,6 +18,7 @@ requirements:
     - loguru
     - pip
     - pycuda # [not macos]
+    - pyopencl
     - python
     - setuptools
     - sire

diff --git a/src/loch/__init__.py b/src/loch/__init__.py
@@ -1,7 +1,7 @@
 ######################################################################
 # Loch: GPU accelerated GCMC water sampling engine.
 #
-# Copyright: 2025
+# Copyright: 2025-2026
 #
 # Authors: The OpenBioSim Team <team@openbiosim.org>
 #
-Original file line number
+Diff line change
@@ Expand Up / @@ -8,3 +8,4 @@ dependencies: @@
       - biosimspace
       - loguru
       - pycuda
+      - pyopencl