Add support for OpenCL platform #6
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR adds support for using OpenCL as the platform for performing GCMC sampling on the GPU using
pyopencl. The existing CUDA specific kernels have been adapted to support both CUDA and OpenCL via the use of pre-processor macros to abstract platform specific functionality. In the Python layer, the code has been refactored using bothpycudaandpyopenclfor interfacing with each platform, i.e. memory setup/transfer, context handling is performed by the matching Python interface, with no change to the existing public API of theGCMCSampler.Unlike cuRAND CUDA, OpenCL has no native random number support so generation of random numbers has been moved to the host. These are now generated using NumPy and are passed to kernels at runtime as required. The number of random numbers needed per batch is small, so the overhead is pretty small. To mitigate random number generation overheads, batches of numbers are pre-computed in a background thread while the GPU kernels are working, such that numbers are immediately ready when needed. Using the same RNG for both platforms is also desirable from a testing perspective, since it allows us to directly compare CUDA and OpenCL results.
The existing unit tests have now been parameterised over the two available platforms and an additional unit test has been added to confirm that, given the same RNG seed, single-point energies agree across both platforms. I have also tested all of the example scripts, which produce the same results.
In the process of adding OpenCL support I also took the opportunity to profile and optimise the GPU kernels and
GCMCSampler. Trivial optimisations, none of which involve reduced precision maths operations, have improved performance by roughly 30-40%. I have also exposed options to enable/disable compiler optimisations during kernel compilation, with the default optimisations matching those used by OpenMM for the respective platforms. Benchmarks show that the CUDA and OpenCL platforms are largely comparable in performance, with most of the discrepancy during a simulation coming from the platform performance differences for OpenMM.Note that the addition of OpenCL support should enable the use
lochon other OpenMM platforms that we don't directly support. For example, it would be possible to run OpenMM dynamics using the Metal or HIP platforms, while using OpenCL forloch.