Skip to content

enable hwloc, cuFFTMp, and HeFFTe support in GROMACS easyblock#3531

Merged
boegel merged 10 commits into
easybuilders:developfrom
bedroge:gromacs_heffte
Jun 7, 2026
Merged

enable hwloc, cuFFTMp, and HeFFTe support in GROMACS easyblock#3531
boegel merged 10 commits into
easybuilders:developfrom
bedroge:gromacs_heffte

Conversation

@bedroge

@bedroge bedroge commented Dec 13, 2024

Copy link
Copy Markdown
Contributor

In EESSI we noticed that GROMACS builds currently show the following with gmx -version:

Multi-GPU FFT:       none
Hwloc support:       disabled

Hwloc is part of the foss toolchain and can be easily enabled.

For Multi-GPU FFT support either cuFFTMp (https://manual.gromacs.org/documentation/current/install-guide/index.html#using-cufftmp) or HeFFTe (https://manual.gromacs.org/documentation/current/install-guide/index.html#using-heffte) is required. I was trying to add support for both, but cuFFTMp is part of NVHPC, and simply adding that as dependency will make GROMACS pick up other stuff from that installation (e.g. OpenMP libraries). Since cuFFTMp also imposes some additional requirements (see https://docs.nvidia.com/hpc-sdk/cufftmp/usage/requirements.html), I've only added HeFFTe support for now. I've also just opened an easyconfigs PR for HeFFTe with CUDA support: easybuilders/easybuild-easyconfigs#22024. Once that's merged, I'll make another to add this as a dependency to CUDA versions of GROMACS.

@ocaisa ocaisa left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, matches what I can find in the docs. My only concern is that there is no version-checking with the new options. I think the hwloc one is there since 2016, but the HEFFTE one is more recent (I can't quite figure it out but I think it is 2023, see https://gitlab.com/gromacs/gromacs/-/issues/4090). For our own use case we could make the check be more recent than when it was first supported.

EDIT: Indeed, heffte seems to first appear in 2023.1: https://manual.gromacs.org/2023.1/install-guide/index.html

EDIT: The option for hwloc is first documented in 2016.4: https://manual.gromacs.org/2016.4/install-guide/index.html

@bedroge

bedroge commented Dec 16, 2024

Copy link
Copy Markdown
Contributor Author

This looks good, matches what I can find in the docs. My only concern is that there is no version-checking with the new options. I think the hwloc one is there since 2016, but the HEFFTE one is more recent (I can't quite figure it out but I think it is 2023, see https://gitlab.com/gromacs/gromacs/-/issues/4090). For our own use case we could make the check be more recent than when it was first supported.

EDIT: Indeed, heffte seems to first appear in 2023.1: https://manual.gromacs.org/2023.1/install-guide/index.html

EDIT: The option for hwloc is first documented in 2016.4: https://manual.gromacs.org/2016.4/install-guide/index.html

Thanks! I added the version checks, hadn't seen your edits. But I also see hwloc being mentioned in the 2016.1 docs, and it's also in the code:
https://gitlab.com/gromacs/gromacs/-/blob/v2016.1/CMakeLists.txt?ref_type=tags#L506

HeFFTe is being mentioned in the 2023 docs (https://manual.gromacs.org/current/release-notes/2023/major/performance.html#pme-decomposition-support-with-cuda-and-sycl-backends), and also in the CMake file for 2023 (https://gitlab.com/gromacs/gromacs/-/blob/v2023/CMakeLists.txt?ref_type=tags#L741) and 2023.1 (https://gitlab.com/gromacs/gromacs/-/blob/v2023.1/CMakeLists.txt?ref_type=tags#L749).

@bedroge

bedroge commented Dec 16, 2024

Copy link
Copy Markdown
Contributor Author

One thing I was a little bit worried about is that the HeFFTe installation requires a GPU (for the tests), hence simply installing GROMACS and its dependencies will also require a GPU if we enable this by default. Should we make it optional in some way (commenting out the HeFFTe depdendency or disabling the tests)? For EESSI it would currently already cause an issue, as we build on nodes without GPUs.

@al42and

al42and commented Jan 3, 2025

Copy link
Copy Markdown

Hi!

Don't want to derail the discussion here, but, while I don't have any recent numbers, the situation has not changed much from what NVIDIA reports in their blog:

We find cuFFTMp to be up to 2x faster [than HeFFTe]

HeFFTe has the benefit of supporting AMD and Intel GPUs, but it's not the best choice for CUDA installation. cuFFTMp has its own share of issues, as @bedroge outlined in the PR description, but I think a performance difference is relevant for evaluating which effort is more worthwhile.

Regarding versioning, can confirm that heffte (and cufftmp) were added in 2023, and hwloc was added in 2016.

@bedroge

bedroge commented Jan 3, 2025

Copy link
Copy Markdown
Contributor Author

Hi!

Don't want to derail the discussion here, but, while I don't have any recent numbers, the situation has not changed much from what NVIDIA reports in their blog:

We find cuFFTMp to be up to 2x faster [than HeFFTe]

HeFFTe has the benefit of supporting AMD and Intel GPUs, but it's not the best choice for CUDA installation. cuFFTMp has its own share of issues, as @bedroge outlined in the PR description, but I think a performance difference is relevant for evaluating which effort is more worthwhile.

Regarding versioning, can confirm that heffte (and cufftmp) were added in 2023, and hwloc was added in 2016.

Thanks for your input, it's definitely a fair point. I initially added only HeFFTe support in this PR, as it seemed like a more logical default option (e.g. no additional requirements on the hardware like with cuFFTMp), and a first attempt at adding cuFFTMp support failed miserably 😅 But I can have another look at it, ultimately it would be nice if the easyblock supports both, and people can choose between the two of them.

@bedroge

bedroge commented Apr 28, 2026

Copy link
Copy Markdown
Contributor Author

Now that we have an easyconfig for cuFFTMp (https://github.com/easybuilders/easybuild-easyconfigs/blob/develop/easybuild/easyconfigs/c/cuFFTMp/cuFFTMp-11.4.0-gompi-2025b-CUDA-12.9.1.eb), it's trivial to add support for it. I've done that in 23eda47.

Initially it didn't compile because it was picking up the cufft.h from CUDA, while it should pick up the one from cuFFTMp, causing errors like:

/home/bob/eessi/versions/2025.06/software/linux/x86_64/intel/cascadelake/software/cuFFTMp/11.4.0-gompi-2025b-CUDA-12.9.1/include/cufftMp.h:74:4: error: #error cuFFT and cuFFTMp version mismatch. .../math_libs/X.Y/include/cufftmp/ should be included before .../math_libs/X.Y/include/
   74 |   #error cuFFT and cuFFTMp version mismatch. .../math_libs/X.Y/include/cufftmp/ should be included before .../math_libs/X.Y/include/
      |    ^~~~~

I couldn't really force it to use the header provided by cuFFTMp first (moving it up or down in the deps list didn't work), but adding -Xcompiler -v to the nvcc command showed that it first searches in the source dir. So, I solved things by adding a symlink to the correct cufft.h in the source dir, and that allowed me to complete the build. The tests are still failing for me, as I don't have a system that meets the requirements, see https://docs.nvidia.com/cuda/cufftmp/usage/requirements.html and https://docs.nvidia.com/nvshmem/release-notes-install-guide/install-guide/abstract.html#nvshmem-installation-guide.

@bedroge bedroge requested a review from ocaisa April 28, 2026 14:50
@bedroge

bedroge commented Apr 28, 2026

Copy link
Copy Markdown
Contributor Author

Assuming you cannot select the multi-GPU FFT library at runtime, we would need to select them at build time. We could do that by having an easyconfig parameter that controls this, or by having separate easyconfigs with a corresponding version suffix?

@bedroge

bedroge commented Apr 28, 2026

Copy link
Copy Markdown
Contributor Author

@boegelbot please test @ jsc-zen3-a100
CORE_CNT=16
EB_ARGS="--installpath /tmp/$USER/pr3531 GROMACS-2025.3-foss-2025b-CUDA-12.9.1.eb"

@boegelbot

Copy link
Copy Markdown

@bedroge: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de

PR test command 'if [[ develop != 'develop' ]]; then EB_BRANCH=develop ./easybuild_develop.sh 2> /dev/null 1>&2; EB_PREFIX=/home/boegelbot/easybuild/develop source init_env_easybuild_develop.sh; fi; EB_PR=3531 EB_ARGS="--installpath /tmp/$USER/pr3531 GROMACS-2025.3-foss-2025b-CUDA-12.9.1.eb" EB_CONTAINER= EB_REPO=easybuild-easyblocks EB_BRANCH=develop /opt/software/slurm/bin/sbatch --job-name test_PR_3531 --ntasks="16" --partition=jsczen3g --gres=gpu:1 ~/boegelbot/eb_from_pr_upload_jsc-zen3.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 10308

Test results coming soon (I hope)...

Details

- notification for comment with ID 4336494167 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@bedroge

bedroge commented Apr 28, 2026

Copy link
Copy Markdown
Contributor Author

@boegelbot please test @ jsc-zen3
CORE_CNT=16
EB_ARGS="--installpath /tmp/$USER/pr3531 GROMACS-2023.1-foss-2022a.eb GROMACS-2025.4-foss-2025b.eb"

@boegelbot

Copy link
Copy Markdown

@bedroge: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de

PR test command 'if [[ develop != 'develop' ]]; then EB_BRANCH=develop ./easybuild_develop.sh 2> /dev/null 1>&2; EB_PREFIX=/home/boegelbot/easybuild/develop source init_env_easybuild_develop.sh; fi; EB_PR=3531 EB_ARGS="--installpath /tmp/$USER/pr3531 GROMACS-2023.1-foss-2022a.eb GROMACS-2025.4-foss-2025b.eb" EB_CONTAINER= EB_REPO=easybuild-easyblocks EB_BRANCH=develop /opt/software/slurm/bin/sbatch --job-name test_PR_3531 --ntasks="16" ~/boegelbot/eb_from_pr_upload_jsc-zen3.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 10310

Test results coming soon (I hope)...

Details

- notification for comment with ID 4336609134 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot

Copy link
Copy Markdown

Test report by @boegelbot

Overview of tested easyconfigs (in order)

  • SUCCESS GROMACS-2025.3-foss-2025b-CUDA-12.9.1.eb

Build succeeded for 1 out of 1 (total: 28 mins 39 secs) (1 easyconfigs in total)
jsczen3g1.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.7, x86_64, AMD EPYC-Milan Processor (zen3), 1 x NVIDIA NVIDIA A100 80GB PCIe, 590.48.01, Python 3.9.25
See https://gist.github.com/boegelbot/ab97d5bb6bfe0c6b8b22e861b87b1b52 for a full test report.

@boegelbot

Copy link
Copy Markdown

Test report by @boegelbot

Overview of tested easyconfigs (in order)

  • SUCCESS GROMACS-2023.1-foss-2022a.eb

  • SUCCESS GROMACS-2025.4-foss-2025b.eb

Build succeeded for 2 out of 2 (total: 1 hour 11 mins 15 secs) (2 easyconfigs in total)
jsczen3c2.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.7, x86_64, AMD EPYC-Milan Processor (zen3), Python 3.9.25
See https://gist.github.com/boegelbot/225f832fb2fec3d1ee57ebe047f9a51c for a full test report.

@bedroge

bedroge commented Apr 28, 2026

Copy link
Copy Markdown
Contributor Author

So, I solved things by adding a symlink to the correct cufft.h in the source dir, and that allowed me to complete the build.

Hmm, forgot to actually include that fix, but I just pushed it (cfb6f93).

@bedroge

bedroge commented Apr 28, 2026

Copy link
Copy Markdown
Contributor Author

@boegelbot please test @ jsc-zen3-a100
CORE_CNT=16
EB_ARGS="--installpath /tmp/$USER/pr3531 GROMACS-2025.3-foss-2025b-CUDA-12.9.1.eb"

Comment thread easybuild/easyblocks/g/gromacs.py Outdated

@ocaisa ocaisa left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

@ocaisa

ocaisa commented Apr 28, 2026

Copy link
Copy Markdown
Member

@boegelbot please test @ jsc-zen3-a100
CORE_CNT=16
EB_ARGS="--installpath /tmp/$USER/pr3531 GROMACS-2025.3-foss-2025b-CUDA-12.9.1.eb"

@bedroge bedroge changed the title enable hwloc and HeFFTe support in GROMACS easyblock enable hwloc, cuFFTMp, and HeFFTe support in GROMACS easyblock Apr 28, 2026
@boegelbot

Copy link
Copy Markdown

@ocaisa: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de

PR test command 'if [[ develop != 'develop' ]]; then EB_BRANCH=develop ./easybuild_develop.sh 2> /dev/null 1>&2; EB_PREFIX=/home/boegelbot/easybuild/develop source init_env_easybuild_develop.sh; fi; EB_PR=3531 EB_ARGS="--installpath /tmp/$USER/pr3531 GROMACS-2025.3-foss-2025b-CUDA-12.9.1.eb" EB_CONTAINER= EB_REPO=easybuild-easyblocks EB_BRANCH=develop /opt/software/slurm/bin/sbatch --job-name test_PR_3531 --ntasks="16" --partition=jsczen3g --gres=gpu:1 ~/boegelbot/eb_from_pr_upload_jsc-zen3.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 10314

Test results coming soon (I hope)...

Details

- notification for comment with ID 4339145797 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot

Copy link
Copy Markdown

Test report by @boegelbot

Overview of tested easyconfigs (in order)

  • SUCCESS GROMACS-2025.3-foss-2025b-CUDA-12.9.1.eb

Build succeeded for 1 out of 1 (total: 28 mins 4 secs) (1 easyconfigs in total)
jsczen3g1.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.7, x86_64, AMD EPYC-Milan Processor (zen3), 1 x NVIDIA NVIDIA A100 80GB PCIe, 590.48.01, Python 3.9.25
See https://gist.github.com/boegelbot/9f6d82d29030b0597d7d12240a590bca for a full test report.


# Enable hwloc support (added in v2016) if it's listed as dependency
if gromacs_version >= '2016' and get_software_root('hwloc'):
self.cfg.update('configopts', '-DGMX_HWLOC=ON')

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind this, but I do wonder what the impact is.

hwloc is essentially always there as a indirect dependency, via OpenMPI.

Maybe we should change this check to only enable this when hwloc is listed as a direct dependency of GROMACS, so checking via self.cfg.dependencies() ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation seems to suggest that it's useful to always enable this if hwloc is available:

Added CMake support to detect and build GROMACS with hwloc, which will improve GROMACS ability to recognize and take advantage of all the available hardware. If hwloc is unavailable, GROMACS will fall back on other detection routines.

and

Run-time detection of hardware capabilities can be improved by linking with hwloc. By default this is turned off since it might not be supported everywhere, but if you have hwloc installed it should work by just setting -DGMX_HWLOC=ON

But maybe @al42and can give some advice here: does that indeed make sense, or are there possible drawbacks?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are no downsides to using hwloc, except for the issues you encountered on Grace Hopper. However, I would not expect these failed tests to translate into any actual issues for users, because the upsides of using hwloc in GROMACS are, for now, mostly aspirational. It can help somewhat with CPU affinity, particularly SMT, but NUMA information is not used at all except for logging; and, in the HPC setting, the scheduler should take care of CPU affinity.

So, dropping hwloc does not have any major drawbacks as far as I know.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, dropping hwloc does not have any major drawbacks as far as I know.

@al42and Did you mean to state that "enabling hwloc support does not have any major drawbacks" ?
Just so we're clear 😅

With that in mind, I'm fine with merging this as is, if I understood it correctly.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean to state that "enabling hwloc support does not have any major drawbacks" ?

Neither enabling nor disabling has major drawbacks 🙃 But enabling has minor benefits.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @al42and, also for the other suggestions (which I can fix in a follow-up PR). @boegel could you merge this then, so we can ingest the GROMACS builds from EESSI/software-layer#1497?

# The list of GMX_SIMD options can be found
# http://manual.gromacs.org/documentation/2018/install-guide/index.html#simd-support
if 'MIC-AVX512' in optarch and LooseVersion(self.version) >= LooseVersion('2016'):
res = 'AVX_512_KNL'

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -211,6 +211,23 @@
cuda_cc_semicolon_sep = self.cfg.get_cuda_cc_template_value(
"cuda_cc_semicolon_sep").replace('.', '')
self.cfg.update('configopts', '-DGMX_CUDA_TARGET_SM="%s"' % cuda_cc_semicolon_sep)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: in 2026, we finally switched to traditional CMAKE_CUDA_ARCHITECTURES, which, if I understand correctly, is set automatically by the EB cmake module?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@al42and Any advantage of using -DCMAKE_CUDA_ARCHITECTURES over -DGMX_CUDA_TARGET_SM ?

We'll tackle that in a follow-up PR...

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be identical in practice. The only advantage is that CMAKE_CUDA_ARCHITECTURES is standardized.

@boegel boegel left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@boegel

boegel commented Jun 7, 2026

Copy link
Copy Markdown
Member

Test report by @boegel

Overview of tested easyconfigs (in order)

  • SUCCESS GROMACS-2024.4-foss-2023b-CUDA-12.4.0.eb

Build succeeded for 1 out of 1 (total: 35 mins 46 secs) (1 easyconfigs in total)
node3300.joltik.os - Linux RHEL 9.6, x86_64, Intel(R) Xeon(R) Gold 6242 CPU @ 2.80GHz (cascadelake), 1 x NVIDIA Tesla V100-SXM2-32GB, 580.159.04, Python 3.9.21
See https://gist.github.com/boegel/eda70c9c75139768aaba845f9459da26 for a full test report.

@boegel

boegel commented Jun 7, 2026

Copy link
Copy Markdown
Member

Test report by @boegel

Overview of tested easyconfigs (in order)

  • SUCCESS GROMACS-2026.2-foss-2025b.eb

Build succeeded for 1 out of 1 (total: 1 hour 2 mins 52 secs) (1 easyconfigs in total)
node4235.shinx.os - Linux RHEL 9.6, x86_64, AMD EPYC 9654 96-Core Processor (zen4), Python 3.9.21
See https://gist.github.com/boegel/4bede08c4e977380549ea277bb61ffd0 for a full test report.

@boegel boegel merged commit f4b4629 into easybuilders:develop Jun 7, 2026
22 checks passed
@bedroge bedroge deleted the gromacs_heffte branch June 7, 2026 20:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants