Skip to content

fix(meson): banner reports LAMMPS as enabled when always compiled#338

Merged
HaoZeke merged 59 commits intoTheochemUI:mainfrom
HaoZeke:fix-lammps-feature-banner
May 10, 2026
Merged

fix(meson): banner reports LAMMPS as enabled when always compiled#338
HaoZeke merged 59 commits intoTheochemUI:mainfrom
HaoZeke:fix-lammps-feature-banner

Conversation

@HaoZeke
Copy link
Copy Markdown
Collaborator

@HaoZeke HaoZeke commented May 10, 2026

LAMMPS is unconditionally compiled into the client (see client/meson.build subdir('potentials/LAMMPS') with comment 'always compiled, loaded at runtime via dlopen'). Yet the features banner check at line 168 looks for -DLAMMPS_POT in the compile-args string -- a flag the meson build never sets, so LAMMPS was always reported as 'disabled'.

Extends the features_list tuple to (flag, name, mode):
'compile' -- existing behaviour, check -Dflag in _args_str
'runtime' -- always compiled in; LammpsLoader::instance()
does the dlopen lookup at run time

LAMMPS now reads 'enabled (dlopen at runtime)' which matches reality. Other entries unchanged.

Note: the runtime-loaded report does NOT verify liblammps.so is actually present on LD_LIBRARY_PATH; that diagnostic still comes from LammpsLoader::require_loaded() throwing when a LAMMPS potential is requested without the library.

Some more of this needs to be done too.

LAMMPS is unconditionally compiled into the client (see
client/meson.build subdir('potentials/LAMMPS') with comment
'always compiled, loaded at runtime via dlopen'). Yet the
features banner check at line 168 looks for -DLAMMPS_POT in
the compile-args string -- a flag the meson build never sets,
so LAMMPS was always reported as 'disabled'.

Extends the features_list tuple to (flag, name, mode):
  'compile' -- existing behaviour, check -Dflag in _args_str
  'runtime' -- always compiled in; LammpsLoader::instance()
               does the dlopen lookup at run time

LAMMPS now reads 'enabled (dlopen at runtime)' which matches
reality. Other entries unchanged.

Note: the runtime-loaded report does NOT verify liblammps.so
is actually present on LD_LIBRARY_PATH; that diagnostic still
comes from LammpsLoader::require_loaded() throwing when a
LAMMPS potential is requested without the library.
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 10, 2026

eOn Documentation Preview

Download: documentation.zip

Unzip and open index.html to view.

HaoZeke added 4 commits May 10, 2026 03:24
…ix banner for ARTn/IRA runtime loaders

The banner fix branch was missing two more conditional-runtime
features: ARTn and IRA. Both have *Resource classes that follow the
LammpsLoader pattern (dynlib::openFirst then dlsym, with
require_loaded() throwing only when the user actually requests the
feature). Unlike LAMMPS, they're conditionally compiled (with_artn /
with_ira meson options control inclusion of the resource shim AND
set -DWITH_ARTN / -DWITH_IRA).

Adds 'compile_runtime' mode to features_list:
  -DWITH_ARTN/IRA in args -> 'enabled (dlopen at runtime)'
  not in args             -> 'disabled'

Metatomic verified as compile-LINKED (not dlopen): the
client/potentials/Metatomic/meson.build links metatomic_pot via
library() with link_with, no dynlib::openFirst pattern. The existing
'compile' mode for WITH_METATOMIC is correct.

Docs (user_guide/lammps_pot.md): clarifies in.lammps file placement.

  - akmc.py / parallel_replica / basin_hopping (multi-job drivers):
    put in.lammps in potfiles/ next to config.ini. The Python driver
    copies them per-job scratch dir before each eonclient launch.

  - Direct single-job eonclient invocations
    (job = minimization | nudged_elastic_band | single_point |
    process_search): in.lammps + parameter files MUST be in the SAME
    dir as config.ini. LAMMPSPot calls lammps_file("in.lammps")
    with a relative path; missing-file is silent and the next force
    evaluation segfaults inside LAMMPSPot::force.

Symptoms documented: 'eonclient segfaults inside LAMMPSPot::force
immediately after Beginning minimization / Beginning NEB' = check
in.lammps location.

Discovered while debugging Cu V/SIA recombination NEB; gdb backtrace
showed eonclient SEGV at LAMMPSPot::force +0x155 from
relaxMatter -> Matter::computePotential. Moving in.lammps from
potfiles/ to CWD made the minimization succeed in 2.4s with the
correct -31105.175 eV (matches eOn akmc.py path).
revision=head was non-reproducible and lagged the upstream capnp
codegen fix. v1.1.0 includes cb9d85e (capnp_compile.py + .cpp output
rename for MSVC), so the conda-forge feedstock can drop fix_capnpc.patch
once it bumps eon past this commit.

Net change for the conda recipe:
  - subprojects/rgpot pulled from a tag instead of a tarball pin
  - patches: [fix_capnpc.patch] becomes unnecessary
VASP has no library dependency: VASP::spawnVASP forks ./runvasp.sh and
the class consumes POSCAR / FU files. The previous with_vasp gate +
-DWITH_VASP define added build-time complexity for zero binary cost.

Now always compiled on POSIX (Windows still omitted via #ifndef _WIN32
in Potential.cpp). The conda-forge feedstock can drop the with_vasp
flag and a single eonclient binary supports VASP iff runvasp.sh is on
PATH and PotType::VASP is requested.

Banner gains a 'subprocess' mode (alongside compile / runtime /
compile_runtime): VASP_POT now reports "VASP: enabled (subprocess)"
on POSIX and "VASP: disabled (POSIX only)" on Windows.

Drops: WITH_VASP from CMakeLists.txt, meson_options.txt with_vasp,
the meson if-guard, and the WITH_VASP preprocessor branch in
Potential.cpp.
ARTnResource and IRAResource already wrap dlopen + dlsym + per-symbol
load with global mutexes; the only remaining gate was the with_artn /
with_ira meson option that toggled compilation of the shim files plus
-DWITH_ARTN / -DWITH_IRA preprocessor defines. That option also
piggy-backed a configure-time cmake build of subprojects/artn-plugin
and subprojects/ira (workaround for meson's Fortran scanner choking on
-cpp + submodules). All of that goes away.

After this change ARTn and IRA behave like LAMMPS: ARTnSaddleSearch
+ ARTnResource and IRACompare + IRAResource compile unconditionally.
libartn.so and libira.so are dlopen'd lazily at the first
require_loaded() / is_loaded() call, and the user installs them
through the same channels they already use for liblammps.so
(LD_LIBRARY_PATH on Linux, DYLD_LIBRARY_PATH on macOS, PATH on
Windows). Eonclient gives one binary that lights up ARTn / IRA at
run time when the libraries are available.

Removed:
  meson_options.txt  with_artn, artn_libdir, with_ira, ira_libdir,
                     ira_includedir
  client/meson.build configure-time cmake build of artn-plugin/ira,
                     -DWITH_ARTN/-DWITH_IRA, _runtime_rpath entries,
                     compile_runtime banner mode
  source             every #ifdef WITH_ARTN / #ifdef WITH_IRA across
                     ARTnSaddleSearch.{h,cpp}, IRACompare.cpp,
                     ProcessSearchJob.cpp, SaddleSearchJob.cpp,
                     JobIntegrationTest.cpp, ARTnTest.cpp; the
                     iralib_interf.h include in IRACompare.cpp
                     (IRAResource.h already redeclares the C
                     entry points inline)

Banner now reports:
  ARTn: enabled (dlopen at runtime)
  IRA:  enabled (dlopen at runtime)

Pre-construct guards in ProcessSearchJob / SaddleSearchJob still
throw a runtime_error when method=artn or minmode_method=artn but
libartn isn't on the search path; the message names the entry point
so per-case integration tests still pin which dispatch failed.

Test changes:
  ARTnTest / IRATest are always built; they SKIP at runtime when the
  respective .so is missing (mirroring how LAMMPS-dependent tests
  skip when liblammps isn't installed).
  The 3 'rejects ARTn when not compiled' integration tests become
  'rejects ARTn when libartn missing': they SKIP if libartn IS
  loaded, otherwise assert the new "requires libartn at runtime"
  error message.
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 10, 2026

Benchmark Results

Note

All benchmarks unchanged

Count
⚪ Unchanged 8
8 unchanged benchmark(s)
Benchmark Before After Ratio
bench_eonclient.TimeMinimizationLJCluster.peakmem_minimization_lbfgs 27M 27.2M ~1x
bench_eonclient.TimeMinimizationLJCluster.time_minimization_lbfgs 26.2±0ms 27.0±0ms ~1.03x
bench_eonclient.TimeNEBMorsePt.peakmem_neb 27M 27.2M ~1x
bench_eonclient.TimeNEBMorsePt.time_neb 252±0ms 254±0ms ~1.01x
bench_eonclient.TimePointMorsePt.peakmem_point_evaluation 27M 27M ~1x
bench_eonclient.TimePointMorsePt.time_point_evaluation 8.93±0ms 9.55±0ms ~1.07x
bench_eonclient.TimeSaddleSearchMorseDimer.peakmem_saddle_search_dimer 27M 27.2M ~1x
bench_eonclient.TimeSaddleSearchMorseDimer.time_saddle_search_dimer 62.4±0ms 62.8±0ms ~1.01x
Details
  • Base: 248f3933
  • Head: cbc0991d
  • Runner: ubuntu-22.04
Raw asv-spyglass output
All benchmarks:

| Change   | Before   | After    |   Ratio | Benchmark (Parameter)                                                  |
|----------|----------|----------|---------|------------------------------------------------------------------------|
|          | 27M      | 27.2M    |    1    | bench_eonclient.TimeMinimizationLJCluster.peakmem_minimization_lbfgs   |
|          | 26.2±0ms | 27.0±0ms |    1.03 | bench_eonclient.TimeMinimizationLJCluster.time_minimization_lbfgs      |
|          | 27M      | 27.2M    |    1    | bench_eonclient.TimeNEBMorsePt.peakmem_neb                             |
|          | 252±0ms  | 254±0ms  |    1.01 | bench_eonclient.TimeNEBMorsePt.time_neb                                |
|          | 27M      | 27M      |    1    | bench_eonclient.TimePointMorsePt.peakmem_point_evaluation              |
|          | 8.93±0ms | 9.55±0ms |    1.07 | bench_eonclient.TimePointMorsePt.time_point_evaluation                 |
|          | 27M      | 27.2M    |    1    | bench_eonclient.TimeSaddleSearchMorseDimer.peakmem_saddle_search_dimer |
|          | 62.4±0ms | 62.8±0ms |    1.01 | bench_eonclient.TimeSaddleSearchMorseDimer.time_saddle_search_dimer    |

HaoZeke added 23 commits May 10, 2026 11:10
Two changes that ship together because they overlap in LAMMPSPot:

(1) Portable run-input bundle (.eonlpb)
    A self-contained blob holding in.lammps plus every file the run
    needs: pair_coeff data, custom pair_style .so plugins, KIM
    tables, read_data inputs, shell helpers. The content is opaque
    to eOn -- the packer walks a directory and the client untars it.

    LAMMPSPot now has two operating modes:
      bundle mode  -- [Potential] lammps_bundle = path/to/run.eonlpb.
                      LAMMPSBundle::extract() unpacks into a per-instance
                      scratch dir under temp_directory_path() (PID +
                      64-bit random suffix to avoid collisions). The
                      makeNewLAMMPS() path issues "shell cd <scratch>"
                      to liblammps so every relative path inside
                      in.lammps resolves there. Scratch is removed
                      in the destructor.
      legacy mode  -- bundle path empty. Behaviour matches pre-2.15:
                      in.lammps and every file it references must
                      live in eonclient's CWD.

    Bundle format is plain little-endian:
      [0..7]   magic    "EONLPB1\0"
      [8..15]  uint64   manifest length
      [16..]   JSON     {"files": [{"name": ..., "size": ...}, ...]}
      [...]    concatenated file bodies

    Pack and inspect with the new eon.lammps_bundle module:
      python -m eon.lammps_bundle pack potfiles/ run.eonlpb
      python -m eon.lammps_bundle list run.eonlpb

(2) Fail loudly on missing in.lammps (issue eon-7qqp)
    LAMMPSPot's constructor now refuses to build the potential if
    legacy mode is active and in.lammps is absent from CWD; the
    runtime_error names the CWD and points at the bundle option.
    Previously lammps_file() silently logged its own error and the
    next force() segfaulted on a null pair_style.

    LammpsLoader gained two optional symbols:
      lammps_has_error
      lammps_get_last_error_message
    Present in LAMMPS >= 3Mar2020, null on older builds. After
    lammps_file() we surface the LAMMPS-side message verbatim instead
    of pressing on with a half-initialised handle.

API additions:
  Parameters::potential_options.LAMMPSBundlePath  (default "")
  [Potential] lammps_bundle = <path>              (INI)
  Potential.lammps_bundle: str                    (JSON / yaml schema)
…, VASP-always-on

lammps_pot.md
  Replace the CWD-only file-placement guidance with two clearly
  separated workflows: (1) bundle mode via lammps_bundle =
  path/to/run.eonlpb (recommended; covers pair_coeff data, custom
  pair_style .so plugins, KIM tables, read_data inputs); (2) legacy
  CWD mode for users who still want to keep files next to
  config.ini. Document the new fail-loud behaviour and the
  lammps_has_error pass-through so users see liblammps's own message
  instead of a segfault.

artn.md
  Drop the "needs the artn-plugin subproject + meson configure-time
  cmake build + -Dwith_artn=true" instructions. ARTn is now always
  compiled and dlopens libartn.so at run time, mirroring LAMMPS.
  Replace the build-flag block with a runtime-requirements block
  that shows how to build libartn standalone and put it on
  LD_LIBRARY_PATH.

potential.md
  Rewrite the support-tier note: vendored + LAMMPS + VASP + ARTn +
  IRA are always compiled (LAMMPS / ARTn / IRA are runtime-loaded,
  VASP is a subprocess shim). Only ASE-* / AMS / MPIPot / GP*
  remain behind -Dwith_*. Update the LAMMPS and VASP entries in
  the support table to match the new badges.
Five fragments tied to PR TheochemUI#338:
  338.fixed.md            rgpot.wrap pinned to v1.1.0
  338-vasp.changed.md     VASP always compiled on POSIX
  338-artn-ira.changed.md ARTn / IRA dlopen at runtime
  338-lammps-bundle.added.md  .eonlpb portable run-input bundle
  338-lammps-fail-loud.fixed.md
                          LAMMPSPot fails loudly on missing in.lammps
Mirrors LAMMPS / ARTn / IRA: XtbLoader is a singleton that opens
libxtb on first instance() call and caches every xtb_* function
pointer used in XTBPot. require_loaded() throws an install-hint
runtime_error when the user requests a PotType::XTB potential and
libxtb isn't found; otherwise the same eonclient binary works
across systems with and without xtb installed.

Removed:
  meson_options.txt  with_xtb
  CMakeLists.txt     WITH_XTB
  client/meson.build dependency('xtb', required: true), -DWITH_XTB,
                     conditional subdir('potentials/XTBPot')
  source             #ifdef WITH_XTB across Potential.cpp factory
                     and XTBPot includes; XTBPot now always linked

XTBPot rewrite:
  XTBPot.h no longer pulls xtb.h; opaque libxtb handles are void*
  fields and the .cpp casts at call sites. Constructor and
  destructor moved out-of-line; everything routes through
  eonc::get_xtb_loader(). The four GFN paramset loaders are
  null-checked at use site (older / stripped libxtb builds may
  not export every variant).

Tests:
  test_xtb (XTBTest.cpp) and test_cineb_xtb (CINEBXTBTest.cpp) are
  always built; both SKIP at runtime via XtbLoader::is_loaded() when
  libxtb is absent from LD_LIBRARY_PATH. devdocs/testing.md updated.

Banner:
  XTB: enabled (dlopen at runtime)
The MPI C++ bindings (MPI::COMM_WORLD, MPI::INT, MPI::DOUBLE,
MPI::Group, MPI::Init, MPI::Finalize, ...) were deprecated in
MPI-2.2 and removed outright in MPI-3.0 (~2012). Modern conda-forge
MPICH 4.x and OpenMPI 5.x do not ship mpicxx.h, so -Dwith_mpi=true
has been a build-time error on those platforms.

Every call site now uses the C bindings, which every spec-compliant
MPI implementation guarantees:

  client/potentials/MPIPot/MPIPot.cpp
    MPI::COMM_WORLD.Send / .Recv / .Iprobe -> MPI_Send / MPI_Recv /
        MPI_Iprobe with explicit MPI_COMM_WORLD + MPI_STATUS_IGNORE.
    MPI::INT / MPI::DOUBLE                 -> MPI_INT / MPI_DOUBLE.
    The local icwd buffer goes from long[1024] to int[1024] so the
    over-the-wire layout matches the existing MPI server: the
    pre-port code declared long[1024] but tagged the message
    MPI::INT, which on little-endian sent the low 4 bytes of each
    long. Any deployed MPI server still parses 1024 MPI_INTs
    correctly with the new buffer.

  client/ClientEON.cpp
    MPI::Init / Is_initialized -> MPI_Init / MPI_Initialized.
    MPI::COMM_WORLD.Get_rank / .Get_size / .Allgather /
        .Get_group / .Create / .Barrier / .Abort / .Isend / .Recv
        -> MPI_Comm_rank / _size / _Allgather / _group / _create /
           _Barrier / _Abort / _Isend / _Recv with proper
           MPI_Request / MPI_Status_ignore handling.
    MPI::Group.Incl                       -> MPI_Group_incl.
    MPI::Finalize                          -> MPI_Finalize.
    Typo MPI_COMM_nullptr                 -> MPI_COMM_NULL.

Runtime-load follow-up: the FlexiBLAS-of-MPI shim (one eonclient
binary that dlopens any spec-compliant libmpi) is the right end
state, mirroring LAMMPS / ARTn / IRA / XTB. The MPI standard binary
ABI is in late draft for MPI-5 with experimental support already in
MPICH 4.3+ / OpenMPI 5.0+; MPItrampoline is the practical interim.
Documented in docs/source/user_guide/mpi_potential.md so the
direction is recorded.
MPItrampoline is the FlexiBLAS-of-MPI: it ships a stable wrapper
ABI, and one eonclient binary forwards every MPI_* call to any
spec-compliant libmpi at run time, picked via the
MPITRAMPOLINE_LIB env var. This solves the same problem the
LAMMPS / ARTn / IRA / XTB dlopen loaders solve for their
respective libraries -- one binary, many runtime backends.

The meson detection order in -Dwith_mpi=true builds:

  1. dependency('MPItrampoline', method: 'cmake')
       Picks up the upstream CMake config that installs alongside
       libmpitrampoline.so. Preferred path.
  2. dependency('mpi-c')
       MPItrampoline also installs a pkg-config shim that shadows
       the system mpi-c.pc; this catches conda-style installs
       where pkg-config wins over CMake.
  3. dependency('mpi', required: true)
       Last-resort direct link to the system MPI. Builds against
       that single flavour the way pre-2.15 always did.

subprojects/mpitrampoline.wrap pins upstream v5.5.1 so users who
want meson to fetch and CMake-build the trampoline locally can
just `meson subprojects download mpitrampoline` instead of
installing it separately. The wrap declares
`provide.dependency_names = MPItrampoline` so meson auto-falls
through to the subproject when neither system pkg-config nor
CMake config is found.

Document the workflow in docs/source/user_guide/mpi_potential.md
(install MPItrampoline once, build MPIwrapper per MPI flavour,
swap MPITRAMPOLINE_LIB to switch flavours without rebuilding).
The MPI standard binary ABI in late MPI-5 draft will eventually
obsolete the trampoline; until MPICH 4.3+ / OpenMPI 5.0+
stabilise that, MPItrampoline is the working FlexiBLAS-equivalent.
Vault issue eon-7416: post-saddle Min1/Min2 prints
'[Matter] 0 0.00000e+00 -nan -0.00000' and the eonclient process
hangs indefinitely. The root cause is that NaN-tainted force vectors
(from atom overlap inside an EAM/ML repulsive core after a too-large
push from the saddle) propagate silently through every comparison
in the line-search loop -- 'NaN < threshold' is false, so the loop
condition never terminates and the minimizer keeps retrying with
shrinking step sizes that all evaluate to NaN.

Add a single allFinite() guard at the entry to LBFGS::step() and
ConjugateGradients::step() that returns -1 the moment a non-finite
element appears in the gradient. The respective run() loops already
treat status<0 as a hard failure (LBFGS) or now do (CG); both also
re-check after the loop in case isConverged() silently swallowed a
NaN comparison.

Effect: a NaN-locked minimization terminates with a clear
"non-finite force" warning instead of a SLURM-killed hang. The job
returns a -1 status the akmc.py wrapper can route to a clean
termination_reason instead of leaving an entire chain-state combo
indefinitely waiting on output.
Same FlexiBLAS-of-MPI pattern this PR established for MPItrampoline
and the dlopen-based LAMMPS / ARTn / IRA / XTB loaders, applied to
BLAS / LAPACK. Add the with_flexiblas meson option (and the cmake
mirror WITH_FLEXIBLAS) that links Eigen via FlexiBLAS so the
backend (OpenBLAS, Intel MKL, BLIS, ATLAS, ARMPL, ...) is picked
by the FLEXIBLAS env var at run time:

    meson setup builddir -Dwith_flexiblas=true
    FLEXIBLAS=OPENBLAS ./eonclient   # OpenBLAS
    FLEXIBLAS=INTELMKL ./eonclient   # Intel MKL
    FLEXIBLAS=BLIS     ./eonclient   # BLIS

Defines that activate Eigen's BLAS / LAPACKE backend:
  -DEIGEN_USE_BLAS
  -DEIGEN_USE_LAPACKE

subprojects/flexiblas.wrap pins upstream v3.5.0 and declares
'flexiblas' as a provided dependency, so meson auto-builds the
wrap when no system install is found. The cmake side mirrors via
FetchContent_Declare(flexiblas, GIT_TAG v3.5.0). There is never a
"no FlexiBLAS available" failure mode.

with_flexiblas and use_mkl are mutually exclusive -- explicitly
errored at configure time. The legacy use_mkl / USE_MKL flag
stays for users who want a direct MKL link without the trampoline
indirection; FlexiBLAS's FLEXIBLAS=INTELMKL path is the
recommended modern equivalent.

Documented in docs/source/install/index.md alongside the existing
ccache / mold notes.
Issue TheochemUI#177 ('BLD: Cleanup with sourceset') was the natural follow-up
to the meson-format pass. With LAMMPS, ARTn, IRA, XTB, VASP, and
MPItrampoline now standardised on the dlopen / always-compile
pattern, the eonclib source-list decisions reduce to three real
gates: WITH_GPRD, WITH_GP_SURROGATE, WITH_SERVE_MODE. Replace the
historical chain of `eonclib_sources +=` accretions with a single
sourceset:

  eonclib_ss = import('sourceset').source_set()
  eonclib_ss.add(files(<always-on sources>))                # ARTn / IRA included
  eonclib_ss.add(when: 'WITH_GPRD',         if_true: files(...))
  eonclib_ss.add(when: 'WITH_GP_SURROGATE', if_true: files(...))
  eonclib_ss.add(when: 'WITH_SERVE_MODE',   if_true: files(...))
  eonclib_cfg = configuration_data()
  eonclib_cfg.set('WITH_GPRD',         get_option('with_gprd'))
  eonclib_cfg.set('WITH_GP_SURROGATE', get_option('with_gp_surrogate'))
  eonclib_cfg.set('WITH_SERVE_MODE',   get_option('with_serve'))
  eonclib_resolved = eonclib_ss.apply(eonclib_cfg)
  library('eonclib', sources: eonclib_resolved.sources(), ...)

Source membership now lives in one place; feature blocks keep their
side-effecting parts (subproject() calls, find_library, -DWITH_X
defines, _deps additions) but no longer reach into the source list.
Build output is byte-identical to the pre-cleanup chain --
sourceset.apply() produces the same deduplicated list.

The sourceset module also positions us to add per-feature
dependency tracking later (`when: 'WITH_X', if_true: dep_x`), which
is the right place to land further consolidation when AMS / metatomic
get the same treatment.
Reformat the files added or rewritten earlier in this PR (XtbLoader,
LammpsBundle, the C-bindings MPI port, the modernised LBFGS / CG
NaN guards, ARTn / IRA include cleanup, JobIntegrationTest) to
match the repo's .clang-format. No behaviour change; CI lints job
was failing on the unformatted introductions.
…torch libs (closes TheochemUI#304)

Three CI-driven build fixes that came out of PR TheochemUI#338 against
TheochemUI/eOn (TheochemUI#338):

1. -ldl link
   Hoist `dl_dep = cppc.find_library('dl', required: false)` to file
   scope in client/meson.build so every loader-using target inherits
   `-ldl` automatically. Pre-fix only potentials/LAMMPS/meson.build
   had a local dl_dep, and the new XtbLoader / ARTn / IRA paths
   inherited `_deps` without it -- libxtbeon.so failed to link with
   `undefined reference to dlopen / dlsym / dlclose` on every Linux
   CI runner. Drop the redundant local dl_dep from
   potentials/LAMMPS/meson.build.

2. Catch2 --allow-running-no-tests
   Catch2 v3 returns exit 4 from a process that ran no tests / no
   assertions even when every TEST_CASE issued SKIP() cleanly. Pass
   --allow-running-no-tests to every Catch2 unit test in the test()
   loop so test_xtb / test_artn / test_ira / test_cineb_xtb skip
   silently with exit 0 on systems where libxtb / libartn / libira
   isn't on LD_LIBRARY_PATH (e.g. macOS CI without libxtb).

3. Optional torch component lookup (closes TheochemUI#304)
   Make the libtorch find_library loop required: false and skip
   missing libraries. The CMake build's find_package(Torch) handles
   missing components gracefully and so do we now; this unblocks
   Windows where torch_global_deps ships only a .dll (no .lib import
   library) and conda-forge libtorch *cpu* installs that strip
   torch_cuda. Combined with the with_xtb -> always-compile + dlopen
   change earlier in this PR (issue TheochemUI#304 item 1), the
   dependency('xtb') fallback removal (item 2), and the VLA ->
   std::vector swap inside the XTBPot rewrite (item 3), all four
   meson cross-platform packaging items in TheochemUI#304 are upstream:

     TheochemUI#304 (1) with_xtb forcing Fortran -- with_xtb is gone
     TheochemUI#304 (2) xtb fallback triggering source builds -- no
              dependency('xtb') anymore
     TheochemUI#304 (3) VLA in XTBPot.cpp -- replaced by std::vector<double>
              R_bohr(3 * N) in the XtbLoader rewrite
     TheochemUI#304 (4) Torch find_library unconditional -- this commit
…loses TheochemUI#309)

MetatomicPotential ctor called metatomic_torch::pick_output(
"energy_uncertainty", outputs, v_energy_uq) blind, then caught the
resulting c10::Error and disabled uncertainty checks. Logically
correct -- but c10::Error captures its construction stack and
prints it on what() / on logger forwarding, so every load of a
model without an uncertainty head surfaced a multi-frame backtrace
that looked like a real crash to users.

Pre-check `outputs.contains(...)` for both candidate keys before
the call:

  <base-variant>::energy_uncertainty   (only when v_energy_uq is set)
  energy_uncertainty                   (always)

If neither key is present, log DEBUG and disable uncertainty
checks the same way the catch block did -- but without ever
constructing a c10::Error in the first place. v_energy_uq is a
torch::optional<std::string>, so guard with has_value() + empty()
before deref.
closes TheochemUI#225)

The --features flag has been wired through CommandLine.cpp ->
printFeatures() -> the FEATURES_STRING configured from
client/meson.build's features_list at configure time since the
banner work earlier in this PR (it now reports the LAMMPS / ARTn /
IRA / XTB / VASP / Metatomic / MPI / FlexiBLAS / Serve modes
correctly with 'enabled' / 'disabled' / 'enabled (dlopen at
runtime)' / 'enabled (subprocess)' markers). Add the user-facing
reference under 'Building from source' so users can confirm a conda
env or a custom build before launching long ensembles.
XTB is unconditionally compiled now and dlopens libxtb at run time
via XtbLoader (commit 9f2c41c). The -Dwith_xtb option no longer
exists; meson errors out with 'Unknown options: with_xtb' when
present.

  pixi.toml install_eon_main task: drop -Dwith_xtb=False
  .github/workflows/ci_xtb.yml: drop -Dwith_xtb=True; add a comment
    pointing at LD_LIBRARY_PATH-based detection so the test_xtb /
    test_cineb_xtb suites SKIP cleanly on envs without libxtb
closes TheochemUI#164)

Four meson hardening tweaks that travel together:

1. warning_level: 0 -> 1 in the top-level project default_options
   gives -Wall on every translation unit. Pre-2.15 we shipped 0 to
   suppress build noise ahead of the CECAM school (TheochemUI#117); the
   suppression has outlived its use. warning_level=2 (-Wall
   -Wextra) and =3 (which also passes -fimplicit-none and breaks
   the vendored Fortran TUs that rely on implicit typing) are
   tracked as follow-ups in TheochemUI#117.

2. -fvisibility=hidden + -fvisibility-inlines-hidden applied via
   add_project_arguments in client/meson.build. Cuts the dynamic
   symbol table on libeonclib.so by ~10x, speeds up rtld lookups
   in conda envs, stops internal namespace symbols from leaking.
   cppc.get_supported_arguments() filters out flags MSVC doesn't
   accept, so the same line is a no-op on Windows (PE/COFF treats
   every function as private until __declspec(dllexport)). Public
   entry points the embedding host needs can opt back in via
   __attribute__((visibility("default"))) per-symbol; the
   eonclient binary itself is unaffected.

3. dependency('pybind11') resolved once via a need_pybind11 flag
   covering the four blocks that previously each ran the lookup
   (with_catlearn / with_ase / with_ase_orca / with_ase_nwchem).
   Configure-time logs cleaner; same _deps end state.

4. Drop dead build artifacts (closes TheochemUI#164):
     client/Makefile.hopper, client/Rules.mk -- legacy GNU-make
       infrastructure replaced by meson years ago; nothing in the
       active build references them.
     tools/mastereqn -- standalone reference code that isn't
       built or shipped anywhere.
     tools/toykmc -- ditto.
… chain

MetatomicPotential.cpp is the heaviest TU in the project: pulls in
the full libtorch + metatensor-torch + metatomic-torch include
graph (~2.5 GB of preprocessor output, ~30 s wall-clock per
compile on a warm cache, multiple minutes from cold).

Add a meson PCH wrapper at client/potentials/Metatomic/pch/
metatomic_pch.h that precompiles the third-party header chain
verbatim with the same diagnostic-suppression pragmas the consuming
TU uses. meson rebuilds the PCH only when the wrapper or any
transitive include changes; touching MetatomicPotential.cpp itself
no longer pays the third-party reparse.

End state: cold rebuild of metatomic_pot drops from minutes to
seconds; warm rebuild stays similar but with a cleaner ninja log.
No effect on the runtime binary.
Add a baseline .clang-tidy at the repo root covering the
high-signal categories without flooding:

  bugprone-*           (real bug classes -- excludes
                        easily-swappable-parameters and
                        narrowing-conversions, which fire heavily
                        on Eigen-int-as-row-index code)
  cert-*               (security / undefined behaviour)
  cppcoreguidelines-pro-type-cstyle-cast / -member-init
  cppcoreguidelines-slicing
  cppcoreguidelines-virtual-class-destructor
  modernize-use-{nullptr,override,equals-default,equals-delete}
  modernize-redundant-void-arg
  modernize-deprecated-headers
  performance-{for-range-copy,implicit-conversion-in-loop,
              inefficient-vector-operation,
              unnecessary-value-param}
  readability-qualified-auto
  readability-redundant-{control-flow,string-cstr,string-init,
                         smartptr-get}

Deliberately excluded categories that don't pay back the noise:
cppcoreguidelines-avoid-magic-numbers (atomistic constants are
intrinsically magic), -pro-bounds-* (Eigen-heavy code), fuchsia-*,
llvm-*, google-readability-todo (we use TODO(rg)).

HeaderFilterRegex skips the vendored client/thirdparty/ tree; a
companion client/thirdparty/.clang-tidy with `Checks: '-*'` belt-
and-braces opts the same tree out so editors that don't honour the
parent regex still see the right ruleset.

WarningsAsErrors is empty: clang-tidy is advisory in this PR,
turning the highest-confidence subset on as warnings-as-errors is
its own follow-up.
Add .github/workflows/ci_sanitizers.yml that builds a debug
configuration with `b_sanitize=address,undefined` and runs the
full meson test suite. Catches use-after-free / out-of-bounds /
signed-overflow / null-deref regressions in the LAMMPSBundle
extract path, the MetatomicPotential neighbor list code, and the
LBFGS / CG minimizer paths.

ASAN_OPTIONS=halt_on_error=1:abort_on_error=1:detect_leaks=0
disables leak-detection because Eigen's static-init pool
legitimately holds memory until process exit; we care about UAF /
OOB / null-deref, not leaks.

UBSAN_OPTIONS=halt_on_error=1:print_stacktrace=1 fails fast on
signed overflow / shift overflow / float-cast overflow / vptr
mismatch.

b_lundef=false relaxes -Wl,--no-undefined which ASan's interceptor
shims can't satisfy at link time on the conda-forge gcc 13.

Builds against pixi's `dev-lite` env -- the smallest one that
covers the runtime-loaded loaders + the always-on potentials.
ARTn / IRA / XTB / LAMMPS tests skip cleanly when the runtime libs
aren't installed (we land --allow-running-no-tests in the same PR
so this works).
DynLib.h: mark every lookup helper [[nodiscard]] so callers must
inspect the returned handle / pointer / error string. dlopen()
returning NULL is the normal "library not found" signal in
LammpsLoader / XtbLoader / *Resource; ignoring the return is
always a bug. Same for sym(), openFirst(), error(), and the
loadSym<> template.

LAMMPSPot.h: explicit threading note on the class. Internally calls
std::filesystem::current_path() (process-wide state) and issues
"shell cd <workdir>" to liblammps; both are global side effects.
liblammps itself keeps process-global state per LAMMPSObj. One
LAMMPSPot per thread / per NEB image is the contract.

XtbLoader.h: replace the (void) typedef arg lists with () to
satisfy modernize-redundant-void-arg.

LAMMPSPot.h: add `override` on ~LAMMPSPot and force() now that
clang-tidy's modernize-use-override fires (was passing under
warning_level=0).
The historical `#define LBFGS_EPS 1e-30` (1) skipped the type
system, (2) leaked into every TU that #included LBFGS.h, and (3)
gave no hint about its role for a future maintainer reading the
guard at line 124 of LBFGS.cpp:

    if (std::abs(s0.dot(y0)) < LBFGS_EPS) { ... reset(); ... }

Convert to:

    inline constexpr double LBFGS_EPS = 1e-30;

with a docstring explaining it as the curvature-update gate -- the
floor below which `s_0 . y_0` is too small to invert into rho
without amplifying denormals into the history buffer. Same style
the ARTnSaddleSearch.h ARTN_MODE_TOLERANCE / ARTN_SMALL_DISPLACEMENT
constants already use. Numerically identical; just a less hostile
declaration site.
Two related fixes that came out of an actual local build with the
new warning_level=1 flag enabled:

1. Revert the global `-fvisibility=hidden` from earlier in this PR.
   It hid Matter::compare in libeonclib.so so linking eonclient
   against the lib failed with `undefined reference to
   eonc::Matter::compare(eonc::Matter const&, bool)`. To use
   -fvisibility=hidden cleanly we'd have to annotate every
   cross-DSO entry point in libeonclib (Matter::*, Potential::*,
   Job::*, ...) with __attribute__((visibility("default"))) -- a
   separate multi-day refactor. Keep -fvisibility-inlines-hidden;
   it only affects inline / template-instantiation symbols and
   doesn't change cross-DSO linkage. cppc.get_supported_arguments()
   filters out the flag on MSVC where it isn't accepted.

2. Migrate the 10 [[deprecated]] constructor call sites that
   warning_level=1 surfaced. Every Optimizer subclass and every
   Dynamics instantiation was still calling the legacy
   "Pass {Optimizer,Dynamics}Config directly" ctors. Replace with
   the {Optimizer,Dynamics}Config::fromParams(params) form already
   used by FIRE.

   client/LBFGS.h
   client/Quickmin.h
   client/SteepestDescent.h
   client/ConjugateGradients.h
   client/DynamicsSaddleSearch.cpp
   client/ParallelReplicaJob.cpp                (2 sites)
   client/ReplicaDynamicsJob.cpp
   client/SafeHyperJob.cpp
   client/TADJob.cpp

End state: a clean -Wall build with zero deprecated-declarations
warnings. The remaining clang-tidy advisories
(performance-unnecessary-value-param on shared_ptr by value,
modernize-redundant-void-arg in eonc::Optimizer, etc.) are
pre-existing and tracked under TheochemUI#117 / TheochemUI#144.
A clean -Wall build now emits zero warnings. Two changes:

1. NEBInitialPaths.cpp::sidppPath
   gcc 13's -Wstringop-overread fired on the QUILL_LOG_INFO call.
   The `use_zbl ? "-ZBL" : ""` ternary fed a const char* into
   quill's safe_strnlen specialisation that calls memchr with
   n=SIZE_MAX; the optimizer can't prove the early-exit at the
   null terminator and the source-size analysis trips on the 5-
   byte literal. Bind the literal to a sized std::string before
   logging so quill's encoding takes the string overload that
   already knows the length.

2. Optimizer / LBFGS / Quickmin / SteepestDescent / CG / FIRE
   Pass std::shared_ptr<ObjectiveFunction> by value into every
   ctor and std::move into the base m_objf storage. Avoids the
   per-construction refcount bump that
   performance-unnecessary-value-param flagged. Derived ctors
   that previously read a_objf->degreesOfFreedom() now read
   through the base m_objf member that just got the moved-in
   value -- semantically identical, no extra copy.

Verified on cosmolab (gcc 13.3, dev-lite pixi env): 155/155
ninja steps, 0 warnings, 0 errors.
Project default_options jumps from warning_level=1 (-Wall) to 2
(-Wall -Wextra). The codebase compiles cleanly under both --
verified locally via cosmolab gcc 13.3 / dev-lite pixi env, 229
ninja steps, 0 warnings.

Pin the inih r62 subproject to warning_level=0 in its
default_options:

  inireader_dep = dependency(
      'inireader',
      fallback: ['inih', 'INIReader_dep'],
      required: true,
      default_options: ['default_library=static', 'warning_level=0'],
  )

inih's vendored tests/unittest_alloc.c has an `unused parameter
'user'` warning that's upstream test code, not ours to fix.
Without the override our -Wextra inherits into the subproject and
leaks the warning into our build log.

warning_level=3 stays out of reach: it also passes -fimplicit-none
to gfortran, which breaks eonclient's vendored Fortran TUs that
rely on implicit typing. Fortran-side cleanup is tracked in TheochemUI#117.
HaoZeke added 28 commits May 10, 2026 13:32
readcon-core v0.9.0 ships pre-generated C/C++ headers via cargo-c, so
consumers no longer need cbindgen on the build host. The ensure_cbindgen
pixi task and its setupeon depends-on entry are now dead weight.
Twelve cert-err33-c hits in the clang-tidy survey: stdio functions
whose return values were silently dropped. Categorised by remediation:

  Real error path -- must surface to the caller:

    Parameters.cpp:Parameters::load(FILE*)
      fseek/ftell/fread results checked; abort with EONC_LOG_ERROR
      and `return 1` on any failure.

    HelperFunctions.cpp:loadMode(FILE*, int)
      fscanf return checked against expected 3-double match;
      EONC_LOG_CRITICAL + std::exit(1) on a short read so the
      mode-direction file isn't silently parsed as zeros.

    HelperFunctions.cpp:saveMode(FILE*, ...)
      fprintf return checked; EONC_LOG_CRITICAL + std::exit(1) on a
      short write rather than dropping data on the floor. Signature
      updated to `const std::shared_ptr<Matter>&` + `const AtomMatrix&`
      to match (and the matching declaration in HelperFunctions.h).

    BasinHoppingJob.cpp:188-192
      `char fname[128]; snprintf(...)` -> `std::string fname =
      std::format(...)`. Removes the truncation hazard entirely.

  No-recovery path -- explicit (void) cast to silence:

    ConFileIO.cpp:writeMatterCon writes are best-effort -- if the
    fprintf chain fails partway through there's no way to recover
    once the header has been emitted, so cast each fprintf and the
    closing fclose to (void).

    HelperFunctions.cpp:loadMode RAII closer
      fclose ignored at scope-exit; the buffer was read-only so
      flushing-write failure is impossible.

The companion cert-err34-c sites in Bundling.cpp / Parameters.cpp /
fpe_handler.cpp landed in d22d4fe.
Three remaining clang-tidy categories from the survey:

1. performance-unnecessary-value-param (53 hits)
   Auto-fix on the .cpp + matching .h files: every value-typed
   parameter that was only read as const (`AtomMatrix mode`,
   `std::shared_ptr<X> ptr`, `std::string name`, ...) becomes a
   const reference, and the few cases where the param is moved-into
   storage (Matter::relax prefixMovie / prefixCheckpoint) get
   pass-by-value-and-std::move. Same end semantics, no copy at the
   API boundary.

2. bugprone-implicit-widening-of-multiplication-result (10 hits)
   `int * int` chains assigned to `Eigen::Index` / `long` /
   `size_t` widen implicitly and were flagged. Anchor the
   multiplication in the destination type up front:

     const Eigen::Index seg = static_cast<Eigen::Index>(3) * natoms;

   in IDPPObjectiveFunction, NEBObjectiveFunction, and
   NEBOcinebController.

3. Misc bugprone (5 hits)
   Matter.cpp:operator=  -- self-assignment guard
                            (cert-oop54-cpp / bugprone-unhandled-
                            self-assignment).
   RandomNumbers.cpp:56  -- assignment-in-if hoisted out
                            (bugprone-assignment-in-if-condition).
   ReplicaExchangeJob.cpp:28,32 -- (long)(x + 0.5) -> std::lround
                                   (bugprone-incorrect-roundings).
   BiasedGradientSquaredDescent.cpp:156 -- repeated branch body
                                           collapsed into a ternary
                                           (bugprone-branch-clone).
   ReplicaExchangeJob.cpp:58 -- migrate the leftover deprecated
                                Dynamics ctor to DynamicsConfig::
                                fromParams(params).
   Bundling.cpp:catch -- attach a std::cerr line to the formerly
                         empty std::stoi catch
                         (bugprone-empty-catch).

Follow-up not in this commit: the Optimizer / SaddleSearchMethod
status-int return values are still magic-number style (-1 / 0 / 1).
A typed enum (`enum class StepResult { Failed, Step, Converged }`)
would be the right move and crosses every call site -- separate
focused branch.
First half of the typed-status refactor the user asked for. Add
client/StatusTypes.h with two strongly-typed enums:

  enum class StepResult : int { Failed = -1, NotConverged = 0,
                                Converged = 1 };
  enum class SaddleStatus : int { Good = 0, Init = 1, ... }; // 22+1 states

This commit migrates `Optimizer::step()` / `Optimizer::run()` (and
every subclass: LBFGS, ConjugateGradients, FIRE, Quickmin,
SteepestDescent) from `int` returns to `StepResult`. Driver call
sites that captured the return value (MinModeSaddleSearch::run's
optStatus) switch to comparing against `StepResult::Failed` rather
than the bare `< 0` check; sites that discard the return
(NEBInitialPaths, NudgedElasticBand, BiasedGradientSquaredDescent,
HelperFunctions::relaxMatter) compile unchanged.

Underlying integer values are preserved (`Failed = -1`,
`NotConverged = 0`, `Converged = 1`) so any out-of-tree code that
read the int continues to do so via to_int(StepResult).

Followup commit applies SaddleStatus to MinModeSaddleSearch /
ARTnSaddleSearch / BasinHoppingSaddleSearch /
BiasedGradientSquaredDescent / DynamicsSaddleSearch and the
SaddleSearchMethod base.
Pre-StepResult build failure surfaced one driver site that
captured the int return value of `Optimizer::run()` without
actually using it -- `int status = idpp_optim->run(...)` in
NEBInitialPaths::idppOptimizePath. Cast to (void) since the
convergence is read from idpp_objf->getConvergence() right after.
…rchy

Phase two of the typed-status work (StepResult landed in 915dd04).
Replace the magic-int saddle-search status with a strongly-typed
`enum class SaddleStatus : int` defined in StatusTypes.h alongside
StepResult.

API surface:

  client/SaddleSearchMethod.h
    virtual SaddleStatus run() = 0;
    virtual SaddleStatus getStatus() const { return SaddleStatus::Good; }
    virtual std::string_view describeStatus(SaddleStatus) const
        { return statusMessage(s); }   // default forwards to canonical
                                        // statusMessage in StatusTypes.h

  Subclasses (MinMode / ARTn / BasinHopping / DynamicsSaddle / BGSD):
    SaddleStatus run() override;
    SaddleStatus getStatus() const override { return status; }
    SaddleStatus status{SaddleStatus::Good};

  Job drivers (SaddleSearchJob / ProcessSearchJob):
    SaddleStatus do{Saddle,Process}Search();
    void saveData(SaddleStatus status);
    void printEndState(SaddleStatus status);

Migration notes:

  * MinModeSaddleSearch::Status (the unscoped enum with 22 STATUS_*
    enumerators) goes away entirely; canonical definition is
    eonc::SaddleStatus in StatusTypes.h. The old 22 names migrate
    1:1 (STATUS_GOOD -> SaddleStatus::Good, STATUS_BAD_NO_CONVEX ->
    SaddleStatus::BadNoConvex, ...) preserving the underlying int
    values byte-for-byte so results.dat parsers, the akmc.py
    driver, and the JobIntegrationTest fixtures continue to read
    the same integers via to_int(SaddleStatus).

  * ARTnSaddleSearch's static constexpr int STATUS_BAD_ARTN_ERROR
    = 22 (and Status_Good / BadMaxIterations) are gone; ARTn now
    returns SaddleStatus::BadArtnError directly. The ad-hoc
    describeStatus override is dropped -- the base default
    describeStatus(SaddleStatus) -> statusMessage(SaddleStatus)
    handles the full vocabulary.

  * NEBOcinebController::runDimer keeps its own int return values
    (-2 / -1 / 0 / 1) since they're MMF-internal control flags with
    semantics distinct from saddle-search status; that subset is
    a candidate for its own typed enum in a follow-up.

  * Termination-reason serialization uses to_int(status) so the
    int written to results.dat is unchanged.

110+ STATUS_* call sites migrated mechanically via sed. clang-format
pass applied; all targets compile cleanly on cosmolab gcc 13.3.
Out-of-line method definitions in BasinHoppingSaddleSearch /
BiasedGradientSquaredDescent / DynamicsSaddleSearch /
MinModeSaddleSearch needed `using eonc::SaddleStatus;` after the
header include to resolve the unqualified `SaddleStatus` return
type now that the typed enum lives in StatusTypes.h. clang-format
pass landed alongside.
Stray `int minModeStatus` left over from the SaddleStatus migration
in the inner `refineTransition` loop. Bare int can't return-implicit-
convert to SaddleStatus; tighten the local to the typed enum so the
`return minModeStatus;` path on a non-Good search compiles.
…MFStatus enum

The remainder of the clang-tidy advisory pass on PR TheochemUI#338. Five
buckets:

1. bugprone-implicit-widening-of-multiplication-result
   (ARTnSaddleSearch.cpp)
   Anchor `nat * 3` chains in size_t / ptrdiff_t up front:
     const std::size_t natz = static_cast<std::size_t>(nat);
     std::vector<int> if_pos(natz * 3, 1);
   Pointer-offset arithmetic on tau_sad_ptr / evec_ptr is now
   `evec_ptr + static_cast<std::ptrdiff_t>(3) * nat`.

2. performance-unnecessary-value-param
   (BasinHoppingSaddleSearch / BiasedGradientSquaredDescent /
    DynamicsSaddleSearch / ARTnSaddleSearch ctors;
    DynamicsSaddleSearch::refineTransition::prod)
   shared_ptr-by-value -> const-ref where the param is just read,
   pass-by-value-and-std::move where it gets stored. ARTn ctor
   reorders the member init list to match declaration order
   (avoids the -Wreorder follow-on warning).

3. bugprone-incorrect-roundings (DynamicsSaddleSearch.cpp:93,97)
   `(int)(x + 0.5)` is broken for negative arguments and exact
   floats. Replace with std::lround.

4. New typed enum MMFStatus (NEBOcinebController.h/.cpp)
   The runDimer return values (-2 / -1 / 0 / 1) had distinct
   semantic meanings (PositiveCurvature / Skipped / Helped /
   MaxIterations) that didn't map onto SaddleStatus -- positive
   curvature on the climbing image is an *expected* short-circuit
   for the OCINEB controller, not a generic saddle-search failure.
   Promote to its own scoped enum:

     enum class MMFStatus : int {
       Helped = 0, MaxIterations = 1,
       Skipped = -1, PositiveCurvature = -2
     };

   to_int(MMFStatus) for the log format strings; underlying values
   preserved.

5. Internal docs at ~/Git/Gitlab/obsidian-notes/Software/eon/
   issues.org get a new DONE entry summarising PR TheochemUI#338's full
   scope (runtime-load + bundle + typed-status + clang-tidy
   pass), and the eon-7qqp + eon-7416 bug entries flip to DONE
   with CLOSED stamps pointing at the landed fixes.
The C++ side migrated saddle-search termination reasons to
`enum class SaddleStatus : int` in client/StatusTypes.h. The Python
orchestrator was still comparing raw ints (`termination_reason == 0`,
`== 15`, `= 9`, `= 6`) against magic literals and carrying two
duplicated label tables -- one title-cased in akmcstate.py, one
slug-cased in explorer.py. Both tables had drifted out of sync with
the C++ enumeration.

Introduce eon/status_codes.py as the single source of truth: an
IntEnum mirror of StatusTypes.h plus label / slug lookup helpers.
Wire format is unchanged -- IntEnum members compare equal to the int
they wrap, so existing results.dat files round-trip without churn --
but every server-side comparison now reads as the named member.

Two latent bugs surface and get fixed by the consolidation:

- akmcstate.py:599 had a 20-entry result_state_code list, so any
  termination_reason in {20, 21, 22} (DimerLostMode,
  DimerRestoredBest, BadArtnError -- all live status codes since the
  ARTn / dimer warm-start work landed) would IndexError out of
  register_bad_saddle and crash the akmc loop. The new helper falls
  back to "Unknown termination code (N)" for any value SaddleStatus
  cannot construct.

- explorer.py:584 had "nonlocal abort" with a space (typo) where
  every other slug uses snake_case. Nothing downstream consumed the
  typo'd form, so the fix is just collapsing to "nonlocal_abort".

Net cleanup:
  - eon/akmcstate.py: 20-line label table -> 1 helper call,
    `== 15` -> `== SaddleStatus.BadMdTrajectoryTooShort`.
  - eon/basinhopping.py:251 `== 0` -> `== SaddleStatus.Good`.
  - eon/explorer.py: slug table replaced by callable dispatch into
    saddle_status_slug / minimization_status_slug; magic-int writes
    `= 9` / `= 6` -> `= int(SaddleStatus.BadMinima)` / `BadNotConnected`;
    three `== 0` checks named.

Towncrier fragment in docs/newsfragments/338-py-status-codes.changed.md.
Three reds on PR TheochemUI#338 that the cosmolab smoke didn't catch because
it built libs+binary, not the test target or the docs workflow.

1. client/unit_tests/SaddleSearchTest.cpp -- five `int status =
   search.run()` sites left over from the SaddleStatus enum class
   migration. The base-class signature changed to
   `SaddleStatus run() = 0`, but the unit test was never updated and
   the test target wasn't part of the cosmolab build, so this didn't
   surface locally. Adds `#include "StatusTypes.h"` + namespace-scoped
   `using eonc::SaddleStatus`, retypes `status` as `SaddleStatus`.
   The existing `>= Good` / `<= DimerRestoredBest` / `!= Good`
   comparisons compile cleanly because C++20 scoped-enum relational
   operators are well-defined within a single enum type.

2. .github/workflows/ci_docs.yml -- the `pixi run ensure_cbindgen`
   step was kept after commit ee27bba dropped that pixi task with
   the readcon v0.9.0 bump (cargo-c handles header generation now).
   Pixi exits 127 ("Available tasks: gen-ref / makedocs / ...") with
   no ensure_cbindgen listed. Drop the line.

3. clang-format drift in StatusTypes.h (inline-comment alignment) and
   ARTnTest.cpp (line-wrap) that landed unformatted in earlier
   commits and tripped the lints job's `uvx prek -a`.
…ptimizer + ARTn migration

Catch2 v3 streams the lhs/rhs of every REQUIRE/CHECK/INFO via
m_oss << value, so the SaddleStatus / StepResult migration broke
every assertion on those types -- 'no match for operator<<'. Adds
inline ostream insertion overloads in StatusTypes.h:

  SaddleStatus -> "<message> (<int>)"
  StepResult   -> "<int>"

Then finishes the test-side migration:

* OptimizerTest: 4 sites, `int status = opt.run(...)` becomes
  `StepResult status = opt.run(...)`; `CHECK(status == 1)` becomes
  `CHECK(status == StepResult::Converged)`.

* ARTnTest: `int dimerStatus / artnStatus / status` -> `SaddleStatus`,
  the loose `>= 0` becomes `>= SaddleStatus::Good`. Adds
  `using eonc::SaddleStatus`.

* JobIntegrationTest: 8 sites that read `int status =
  std::stoi(results["termination_reason"])` and then compared against
  `SaddleStatus::*`. `std::stoi` is the wire-format reader (results.dat
  carries an int), so we cast through to keep the wire side honest:
  `SaddleStatus status = static_cast<SaddleStatus>(std::stoi(...))`.
  Then 5 `REQUIRE(status == 0)` and `(status == 0) // GOOD` sites
  become `REQUIRE(status == SaddleStatus::Good)`.

The pre-existing `REQUIRE(std::stoi(results[...]) == 0)` form (no
intermediate variable, comparing two ints) is left as-is -- it
documents the wire-format check.

Cosmolab full-tree build now green: 0 errors, 0 warnings.
Runtime tests pass except `test_ira` and `test_jobs` cases that
require libira / libartn on LD_LIBRARY_PATH (the dev-lite env on
cosmolab does not stage them, the CI runners do).
ci_docs.yml lost its cbindgen-install step in the v2.14.0 release-prep
churn. The other workflows (ci_ase, ci_benchmark, ci_build_akmc,
ci_build_gprd, ci_metatomic, ci_sanitizers, ci_serve, ci_xtb) all
keep the same `command -v cbindgen || cargo install cbindgen` step.

The wrap subproject at subprojects/readcon-core/meson.build:22
still calls find_program('cbindgen', required: true) at v0.8.0 *and*
v0.9.0 -- the conda-forge readcon recipe is what dropped its
cbindgen dependency by shipping pre-generated headers via cargo-c,
but our meson build path goes through the subproject, not the conda
package, so cbindgen is still required at meson-setup time.

Restore the install step at the same position the other workflows
have it (between sccache config and the meson setup step).
ce4f17d added eon/status_codes.py (the SaddleStatus IntEnum mirror
of client/StatusTypes.h) but did not register it in eon/meson.build.
meson-python therefore never copied it into site-packages, so the
wheel-installed package raised at import time:

  File ".../site-packages/eon/akmcstate.py", line 17, in <module>
    from eon.status_codes import SaddleStatus, saddle_status_label
  ModuleNotFoundError: No module named 'eon.status_codes'

Surfaced on macos-14 first because that runner does a clean
meson-python install rather than running from the source tree;
ubuntu CI ran the eon module via a path that resolved against the
checkout for some test cases. Listing the new file in py.install_sources
fixes both runners and the local `pip install .` path.
The IRA / ARTn migration to runtime dlopen (df7ee8d) made libira and
libartn optional at runtime: missing libs surface as a thrown exception
from require_loaded() and IRACompare::match returns error=-1 instead of
0. The integration tests in IRATest.cpp and one missed JobIntegrationTest
case did not actually have the runtime-skip guard the IRATest header
docstring claims they did, so on macos-14 (where the dev-mta env stages
neither libira nor libartn) the tests REQUIRE(result.error == 0) and
crash out with Catch2 exit 42.

Add the same `if (!eonc::get_*_resource().is_loaded()) SKIP("... not
available at runtime");` guard the LAMMPS / XTB / ARTn tests already use.

Affected test cases:
- IRATest: 5 TEST_CASE_METHOD blocks (every match/symmetry case).
- JobIntegrationTest: "SaddleSearchJob standalone ARTn works without
  direction file" -- the other ARTn integration cases at lines 1028 and
  1091 already had the guard, this one was missed during 092e... when
  ARTn went runtime-loaded.

linux-64 dev-mta CI passes today because the runner happens to have
libira / libartn on LD_LIBRARY_PATH; the SKIP path keeps the macos
build green without weakening the linux assertions.
… dtors

XtbLoader::~XtbLoader() / LammpsLoader::~LammpsLoader() /
ARTnResource::~ARTnResource() / IRAResource::~IRAResource() each held a
process-lifetime dlopen'd handle and called dynlib::close() at static
destruction. That ordering crashes on the ubuntu-22.04 ci-xtb runner
(test_xtb / test_cineb_xtb SIGSEGV right after Catch2 prints
"All tests passed", before exit). It does not crash on cosmolab dev-xtb
(release or debugoptimized) because the libstdc++ / glibc destructor
ordering happens to land elsewhere. Reproducing on every CI runner is
not the bar; leaking the handle is the right move regardless.

Why: each of these libraries (libxtb, liblammps, libartn, libira) drags
in either a gfortran runtime (atexit hooks running FINAL/STOP, stdio
flush) or an MPI / OpenMP / KIM teardown chain whose ordering relative
to a Meyer-singleton dtor is unspecified. dlclose-during-process-fini
runs those chains against a partially torn-down process state; the
canonical symptom is a segfault inside libgfortran's stop_string or
inside libc's stdio flush. Plugin loaders in long-lived processes
(LAMMPS, GROMACS, i-PI) all leak their dlopen'd handles for the same
reason -- the OS unmaps the mapping at exit, so the "leak" is bounded
by process lifetime.

The error-path dlclose inside the *constructor* is kept: that fires
before any consumer has dlsym'd or held a function pointer, so it is
safe to unwind.
…aders

fc1005d only landed the XtbLoader half of the fix because the other
three loader files (LammpsLoader.cpp, ARTnResource.cpp, IRAResource.cpp)
hadn't been Read'd in this session. Same change as fc1005d: drop the
dlclose call from the destructor, leak the handle to OS-reap at exit.
Reasons for each are in the per-file comment.
5270adf added a NaN-force guard to LBFGS / CG step() and run() to
avoid spinning forever when the underlying potential returns NaN
forces. The guard is the right behaviour but the implementation
called `m_objf->getGradient()` afresh at three of the four sites:

  LBFGS::run end:        if (!m_objf->getGradient().allFinite()) ...
  CG::step entry:        if (!m_objf->getGradient().allFinite()) ...
  CG::run end:           if (!m_objf->getGradient().allFinite()) ...

Each fresh getGradient invalidates the cached forces the saddle-search
outer loop just populated, charging +1 force evaluation per call. On
SaddleSearch / morse_pt the end-to-end cost was a 28 - 39% regression
(39 -> 50 force calls dimer, 44 -> 61 Lanczos), exactly the
JobIntegrationTest "force calls match SVN" assertions tripped on the
macos / ubuntu CI runs of TheochemUI#338. Bisected by reverting the four checks
on cosmolab and watching the SVN bounds satisfy.

Trim to one cached check per optimizer:

* LBFGS::step keeps `if (!f.allFinite())` -- `f` is the local already
  loaded by `f = -m_objf->getGradient()` at the top of step() so the
  check is free. The exit-side fresh getGradient in LBFGS::run goes.
* CG::step gates the check on `m_cg_i > 0` and uses the stored
  `m_force` from the previous step's line_search / single_step body.
  No fresh evaluation, so the saddle-search cache stays warm. The
  exit-side fresh getGradient in CG::run goes.

The per-step guard is sufficient: if NaN appears between steps,
the next step()'s body still loads it via getGradient and the
guard fires there; if NaN appears mid-step, the local-`f` /
local-`m_force` check inside the body bails on the same iteration.

Validated on cosmolab dev-lite (`meson test test_jobs`): all
SaddleSearch SVN bounds satisfied (1/1 PASS).
Two test-build hygiene changes that together speed up the sanitizer
matrix dramatically and silence the libstdc++ regex maybe-uninitialized
false positive at -O3 + LTO.

1. Catch2 amalgamated as a single static_library

   `thirdparty/catch2/catch_amalgamated.cpp` (~3000 lines including
   <regex>) was listed in every test executable's `sources`, so each
   of the 30+ test targets re-compiled it from scratch. Under
   ASan + UBSan instrumentation that's the dominant cost of the
   sanitizer build (45+ min observed on the GHA ubuntu runner; the
   in-progress step that prompted the ask). Promote it to its own
   `static_library('catch2_amalgamated', ...)` and link the artifact
   into every test exe + the TestMain helper + the nwchem socket exe
   + bench_pot. Catch2 compiles once per build dir per buildtype.

   Side benefit: the lib is built with `-Wno-maybe-uninitialized`,
   suppressing the libstdc++ 13.3 false positive on
   `function<bool(char)>` move-ctor inside `_NFA::_M_insert_alt`
   that fires only at -O3 + LTO (the release-buildtype CI matrix).
   This is the long-running GCC PR93499 / 96012 family. eOn's own
   test sources stay under -Wmaybe-uninitialized.

2. bench_pot off the default-build set

   `bench_pot` is the meson benchmark target invoked by
   `meson test --benchmark`. The default `meson compile` builds it
   too, which is wasted work in:

   - sanitizer matrix (instruments + links a benchmark binary
     nobody runs in `meson test --suite eon`)
   - test matrix (same; benchmarks have their own job)
   - dev-loop (anyone who ran `ninja -C bbdir`)

   `build_by_default = false` keeps `meson test --benchmark` and any
   explicit `ninja bench_pot` working but skips it from the default
   target set. That alone trims ~20% off the sanitizer build wall.

Local cosmolab dev-lite: full `ninja -C bbdir-warn` 0 errors / 0
warnings, 25s incremental.
The metatomic-docs CI step was emitting four legitimate warnings near
the end of the sphinx pass:

  release.md:179      myst.xref_missing  'post-bump-actions'
  testing.md:59       misc.highlighting_failure  meson lexer chokes
                                          on `[['name', 'file']]`
  v2.8.0/publications.md:29   bibtex.key_not_found
                                          gardenReassignmentMagicNumbers2018a
  v2.8.1/publications.md:23   myst.xref_missing
                                          goswamiEfficientExplorationChemical2025

Per-fix:

* release.md: the heading `## 3. Post-bump actions` auto-anchors as
  `3-post-bump-actions`, so the link target `#post-bump-actions`
  doesn't resolve. Add an explicit `(post-bump-actions)=` MyST label
  before the heading so the in-page reference works.

* testing.md: pygments' `meson` lexer doesn't handle the
  `[['name', 'file', 'data_dir']]` nested-string-array form used in
  the registration snippet. Switch the fence to `bash` (which still
  highlights strings + brackets reasonably) and add a caption so the
  snippet origin is obvious.

* v2.8.0 publications: typo in the cite key -- the bib entry is
  `gardenReassignmentMagicNumbers2018` (no trailing `a`); drop the
  stray `a`.

* v2.8.1 publications: the `[ArXiV preprint](...)` link erroneously
  used the cite key as the URL target. Replace with the canonical
  arXiv URL https://arxiv.org/abs/2510.21368 .

The remaining `eon.schema.NudgedElasticBandConfig:1` lexing warning
in pydantic-generated docstrings is auto-output from
`pydantic.BaseModel.__doc__` (literal JSON in the model schema)
and is benign; not silenced here.
…ctor

d60a9b8 split catch_amalgamated.cpp into its own static_library and
removed it from `testMain`'s sources, but `testMain` was already an
orphan target -- nothing in the test_array foreach links against it
and bench_pot uses TestMain.cpp directly. Without
`catch_amalgamated.cpp` in its source list the testMain shared
library exposed undefined Catch2 references (its TestMain.cpp uses
`Catch2ApprovalListener::testCaseStarting` etc.) and `b_lundef=true`
LTO link blew up across every test workflow:

  undefined reference to `Catch::AssertionHandler::handleUnexpectedInflightException()'
  undefined reference to `Catch::AssertionHandler::handleExpr(...)'
  ...

`link_whole: catch2_lib` would also fix the link, but a target
nobody consumes is just dead weight that LTOs every clean build for
no reason. Drop the `testMain = library(...)` declaration outright
and leave a comment pointing at the (one) bench_pot consumer of
TestMain.cpp.

Cosmolab dev-lite full rebuild: 0 errors, 0 link errors, 0 warnings.
d60a9b8 introduced catch2_lib as a static_library to dedupe the 3000-
line amalgamated TU across 30+ test executables. The dedup works at
debug / debugoptimized buildtypes (cosmolab green), but at the CI's
`--buildtype release` (which sets `-flto=auto`) LTO sees through the
archive boundary and prunes any object whose symbols aren't directly
referenced by the test source. That includes the Catch2-amalgamated
`main()` -- no test source calls main, so LTO drops it -- and the
AssertionHandler / EventListener vtable carriers for the listener
that TestMain.cpp registers. Result: every test linker step blew up
with

  (.text+0x24): undefined reference to `main'
  undefined reference to `Catch::AssertionHandler::handleExpr(...)'
  undefined reference to `Catch::EventListenerBase::testRunStarting(...)'
  ...

Switch from `link_with: catch2_lib` to `link_whole: catch2_lib`
everywhere catch2_lib is consumed (test executables foreach,
test_socket_nwchem, bench_pot). `link_whole` translates to
`-Wl,--whole-archive` which forces the linker to pull in every
object before LTO runs, so main() and the listener machinery survive
the LTO pass. The size cost is bounded by the catch2 amalgamated
itself (~250 KiB per exe at -O3 -flto), which is the same cost the
pre-d60a9b80 per-target compile already paid.

Reproduced on cosmolab dev-lite at `--buildtype release` (the same
flags as CI): without link_whole it fails identically; with
link_whole all 390 targets link clean.
…lags

d60a9b8 added `-Wno-maybe-uninitialized` unconditionally to the
catch2_lib cpp_args to silence the libstdc++ <regex> false positive
under -O3 + LTO. That works on gcc / clang but MSVC (the
windows-latest matrix) parses the flag as a number and bails with

  cl : Command line error D8021 : invalid numeric argument
  '/Wno-maybe-uninitialized'

Route the flag through `cppc.get_supported_arguments(...)` so only
compilers that recognise it pass it through. The pattern matches
the rest of the meson build (`cppc.get_supported_arguments` is
already used for `-fvisibility-inlines-hidden`, `-O3 -flto=auto`,
`-march=native`, `-ffast-math`).

Local cosmolab dev-lite at `--buildtype release`: still 0 errors,
0 warnings -- the GCC build still gets the suppression and the
catch2 amalgamated still compiles cleanly.
The static_library('catch2_amalgamated') + link_whole approach gives
the Linux sanitizer matrix a real dedup win (~30 test exes share one
compile of the 3000-line amalgamated TU), but on the windows-latest
matrix it fails to link with

  client\libcatch2_amalgamated.a : fatal error LNK1136: invalid or corrupt file

at the /WHOLEARCHIVE step. Conda-forge's clang 19 toolchain on Windows
makes meson's `static_library` emit a GNU `.a` archive, but the MSVC
link.exe `/WHOLEARCHIVE` flag expects a COFF `.lib`. The two formats
are not interchangeable.

Branch the meson logic on `host_machine.system() == 'windows'`. Linux
and macOS keep the catch2_lib + link_whole path (the sanitizer-time
win that prompted d60a9b8). Windows reverts to the pre-d60a9b80
behaviour: catch_amalgamated.cpp is added to each test executable's
sources directly. The cost is one extra compile per test target on
the single Windows matrix entry; the Windows builds are not the slow
ones.

Verified on cosmolab dev-lite at `--buildtype release` (matches
CI flags): all 390 targets link clean.
@HaoZeke HaoZeke merged commit 7f562ae into TheochemUI:main May 10, 2026
14 checks passed
@HaoZeke HaoZeke deleted the fix-lammps-feature-banner branch May 10, 2026 19:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant