enable hwloc, cuFFTMp, and HeFFTe support in GROMACS easyblock#3531
Conversation
There was a problem hiding this comment.
This looks good, matches what I can find in the docs. My only concern is that there is no version-checking with the new options. I think the hwloc one is there since 2016, but the HEFFTE one is more recent (I can't quite figure it out but I think it is 2023, see https://gitlab.com/gromacs/gromacs/-/issues/4090). For our own use case we could make the check be more recent than when it was first supported.
EDIT: Indeed, heffte seems to first appear in 2023.1: https://manual.gromacs.org/2023.1/install-guide/index.html
EDIT: The option for hwloc is first documented in 2016.4: https://manual.gromacs.org/2016.4/install-guide/index.html
Thanks! I added the version checks, hadn't seen your edits. But I also see hwloc being mentioned in the 2016.1 docs, and it's also in the code: HeFFTe is being mentioned in the 2023 docs (https://manual.gromacs.org/current/release-notes/2023/major/performance.html#pme-decomposition-support-with-cuda-and-sycl-backends), and also in the CMake file for 2023 (https://gitlab.com/gromacs/gromacs/-/blob/v2023/CMakeLists.txt?ref_type=tags#L741) and 2023.1 (https://gitlab.com/gromacs/gromacs/-/blob/v2023.1/CMakeLists.txt?ref_type=tags#L749). |
|
One thing I was a little bit worried about is that the HeFFTe installation requires a GPU (for the tests), hence simply installing GROMACS and its dependencies will also require a GPU if we enable this by default. Should we make it optional in some way (commenting out the HeFFTe depdendency or disabling the tests)? For EESSI it would currently already cause an issue, as we build on nodes without GPUs. |
|
Hi! Don't want to derail the discussion here, but, while I don't have any recent numbers, the situation has not changed much from what NVIDIA reports in their blog:
HeFFTe has the benefit of supporting AMD and Intel GPUs, but it's not the best choice for CUDA installation. cuFFTMp has its own share of issues, as @bedroge outlined in the PR description, but I think a performance difference is relevant for evaluating which effort is more worthwhile. Regarding versioning, can confirm that heffte (and cufftmp) were added in 2023, and hwloc was added in 2016. |
Thanks for your input, it's definitely a fair point. I initially added only HeFFTe support in this PR, as it seemed like a more logical default option (e.g. no additional requirements on the hardware like with cuFFTMp), and a first attempt at adding cuFFTMp support failed miserably 😅 But I can have another look at it, ultimately it would be nice if the easyblock supports both, and people can choose between the two of them. |
|
Now that we have an easyconfig for cuFFTMp (https://github.com/easybuilders/easybuild-easyconfigs/blob/develop/easybuild/easyconfigs/c/cuFFTMp/cuFFTMp-11.4.0-gompi-2025b-CUDA-12.9.1.eb), it's trivial to add support for it. I've done that in 23eda47. Initially it didn't compile because it was picking up the I couldn't really force it to use the header provided by cuFFTMp first (moving it up or down in the deps list didn't work), but adding |
|
Assuming you cannot select the multi-GPU FFT library at runtime, we would need to select them at build time. We could do that by having an easyconfig parameter that controls this, or by having separate easyconfigs with a corresponding version suffix? |
|
@boegelbot please test @ jsc-zen3-a100 |
|
@bedroge: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de PR test command '
Test results coming soon (I hope)... Details- notification for comment with ID 4336494167 processed Message to humans: this is just bookkeeping information for me, |
|
@boegelbot please test @ jsc-zen3 |
|
@bedroge: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de PR test command '
Test results coming soon (I hope)... Details- notification for comment with ID 4336609134 processed Message to humans: this is just bookkeeping information for me, |
|
Test report by @boegelbot Overview of tested easyconfigs (in order)
Build succeeded for 1 out of 1 (total: 28 mins 39 secs) (1 easyconfigs in total) |
|
Test report by @boegelbot Overview of tested easyconfigs (in order)
Build succeeded for 2 out of 2 (total: 1 hour 11 mins 15 secs) (2 easyconfigs in total) |
Hmm, forgot to actually include that fix, but I just pushed it (cfb6f93). |
|
@boegelbot please test @ jsc-zen3-a100 |
|
@boegelbot please test @ jsc-zen3-a100 |
|
@ocaisa: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de PR test command '
Test results coming soon (I hope)... Details- notification for comment with ID 4339145797 processed Message to humans: this is just bookkeeping information for me, |
|
Test report by @boegelbot Overview of tested easyconfigs (in order)
Build succeeded for 1 out of 1 (total: 28 mins 4 secs) (1 easyconfigs in total) |
|
|
||
| # Enable hwloc support (added in v2016) if it's listed as dependency | ||
| if gromacs_version >= '2016' and get_software_root('hwloc'): | ||
| self.cfg.update('configopts', '-DGMX_HWLOC=ON') |
There was a problem hiding this comment.
I don't mind this, but I do wonder what the impact is.
hwloc is essentially always there as a indirect dependency, via OpenMPI.
Maybe we should change this check to only enable this when hwloc is listed as a direct dependency of GROMACS, so checking via self.cfg.dependencies() ?
There was a problem hiding this comment.
The documentation seems to suggest that it's useful to always enable this if hwloc is available:
Added CMake support to detect and build GROMACS with hwloc, which will improve GROMACS ability to recognize and take advantage of all the available hardware. If hwloc is unavailable, GROMACS will fall back on other detection routines.
and
Run-time detection of hardware capabilities can be improved by linking with hwloc. By default this is turned off since it might not be supported everywhere, but if you have hwloc installed it should work by just setting -DGMX_HWLOC=ON
But maybe @al42and can give some advice here: does that indeed make sense, or are there possible drawbacks?
There was a problem hiding this comment.
There are no downsides to using hwloc, except for the issues you encountered on Grace Hopper. However, I would not expect these failed tests to translate into any actual issues for users, because the upsides of using hwloc in GROMACS are, for now, mostly aspirational. It can help somewhat with CPU affinity, particularly SMT, but NUMA information is not used at all except for logging; and, in the HPC setting, the scheduler should take care of CPU affinity.
So, dropping hwloc does not have any major drawbacks as far as I know.
There was a problem hiding this comment.
So, dropping hwloc does not have any major drawbacks as far as I know.
@al42and Did you mean to state that "enabling hwloc support does not have any major drawbacks" ?
Just so we're clear 😅
With that in mind, I'm fine with merging this as is, if I understood it correctly.
There was a problem hiding this comment.
Did you mean to state that "enabling hwloc support does not have any major drawbacks" ?
Neither enabling nor disabling has major drawbacks 🙃 But enabling has minor benefits.
There was a problem hiding this comment.
Thanks @al42and, also for the other suggestions (which I can fix in a follow-up PR). @boegel could you merge this then, so we can ingest the GROMACS builds from EESSI/software-layer#1497?
| # The list of GMX_SIMD options can be found | ||
| # http://manual.gromacs.org/documentation/2018/install-guide/index.html#simd-support | ||
| if 'MIC-AVX512' in optarch and LooseVersion(self.version) >= LooseVersion('2016'): | ||
| res = 'AVX_512_KNL' |
There was a problem hiding this comment.
Nit: AVX_512_KML removed in 2026: https://manual.gromacs.org/documentation/2026.0/release-notes/2026/major/removed-functionality.html
| @@ -211,6 +211,23 @@ | |||
| cuda_cc_semicolon_sep = self.cfg.get_cuda_cc_template_value( | |||
| "cuda_cc_semicolon_sep").replace('.', '') | |||
| self.cfg.update('configopts', '-DGMX_CUDA_TARGET_SM="%s"' % cuda_cc_semicolon_sep) | |||
There was a problem hiding this comment.
Nit: in 2026, we finally switched to traditional CMAKE_CUDA_ARCHITECTURES, which, if I understand correctly, is set automatically by the EB cmake module?
There was a problem hiding this comment.
@al42and Any advantage of using -DCMAKE_CUDA_ARCHITECTURES over -DGMX_CUDA_TARGET_SM ?
We'll tackle that in a follow-up PR...
There was a problem hiding this comment.
Should be identical in practice. The only advantage is that CMAKE_CUDA_ARCHITECTURES is standardized.
|
Test report by @boegel Overview of tested easyconfigs (in order)
Build succeeded for 1 out of 1 (total: 35 mins 46 secs) (1 easyconfigs in total) |
|
Test report by @boegel Overview of tested easyconfigs (in order)
Build succeeded for 1 out of 1 (total: 1 hour 2 mins 52 secs) (1 easyconfigs in total) |
In EESSI we noticed that GROMACS builds currently show the following with
gmx -version:Hwloc is part of the foss toolchain and can be easily enabled.
For Multi-GPU FFT support either cuFFTMp (https://manual.gromacs.org/documentation/current/install-guide/index.html#using-cufftmp) or HeFFTe (https://manual.gromacs.org/documentation/current/install-guide/index.html#using-heffte) is required. I was trying to add support for both, but cuFFTMp is part of NVHPC, and simply adding that as dependency will make GROMACS pick up other stuff from that installation (e.g. OpenMP libraries). Since cuFFTMp also imposes some additional requirements (see https://docs.nvidia.com/hpc-sdk/cufftmp/usage/requirements.html), I've only added HeFFTe support for now. I've also just opened an easyconfigs PR for HeFFTe with CUDA support: easybuilders/easybuild-easyconfigs#22024. Once that's merged, I'll make another to add this as a dependency to CUDA versions of GROMACS.