fix(meson): banner reports LAMMPS as enabled when always compiled#338
Merged
HaoZeke merged 59 commits intoTheochemUI:mainfrom May 10, 2026
Merged
fix(meson): banner reports LAMMPS as enabled when always compiled#338HaoZeke merged 59 commits intoTheochemUI:mainfrom
HaoZeke merged 59 commits intoTheochemUI:mainfrom
Conversation
LAMMPS is unconditionally compiled into the client (see
client/meson.build subdir('potentials/LAMMPS') with comment
'always compiled, loaded at runtime via dlopen'). Yet the
features banner check at line 168 looks for -DLAMMPS_POT in
the compile-args string -- a flag the meson build never sets,
so LAMMPS was always reported as 'disabled'.
Extends the features_list tuple to (flag, name, mode):
'compile' -- existing behaviour, check -Dflag in _args_str
'runtime' -- always compiled in; LammpsLoader::instance()
does the dlopen lookup at run time
LAMMPS now reads 'enabled (dlopen at runtime)' which matches
reality. Other entries unchanged.
Note: the runtime-loaded report does NOT verify liblammps.so
is actually present on LD_LIBRARY_PATH; that diagnostic still
comes from LammpsLoader::require_loaded() throwing when a
LAMMPS potential is requested without the library.
eOn Documentation PreviewDownload: documentation.zip Unzip and open |
…ix banner for ARTn/IRA runtime loaders
The banner fix branch was missing two more conditional-runtime
features: ARTn and IRA. Both have *Resource classes that follow the
LammpsLoader pattern (dynlib::openFirst then dlsym, with
require_loaded() throwing only when the user actually requests the
feature). Unlike LAMMPS, they're conditionally compiled (with_artn /
with_ira meson options control inclusion of the resource shim AND
set -DWITH_ARTN / -DWITH_IRA).
Adds 'compile_runtime' mode to features_list:
-DWITH_ARTN/IRA in args -> 'enabled (dlopen at runtime)'
not in args -> 'disabled'
Metatomic verified as compile-LINKED (not dlopen): the
client/potentials/Metatomic/meson.build links metatomic_pot via
library() with link_with, no dynlib::openFirst pattern. The existing
'compile' mode for WITH_METATOMIC is correct.
Docs (user_guide/lammps_pot.md): clarifies in.lammps file placement.
- akmc.py / parallel_replica / basin_hopping (multi-job drivers):
put in.lammps in potfiles/ next to config.ini. The Python driver
copies them per-job scratch dir before each eonclient launch.
- Direct single-job eonclient invocations
(job = minimization | nudged_elastic_band | single_point |
process_search): in.lammps + parameter files MUST be in the SAME
dir as config.ini. LAMMPSPot calls lammps_file("in.lammps")
with a relative path; missing-file is silent and the next force
evaluation segfaults inside LAMMPSPot::force.
Symptoms documented: 'eonclient segfaults inside LAMMPSPot::force
immediately after Beginning minimization / Beginning NEB' = check
in.lammps location.
Discovered while debugging Cu V/SIA recombination NEB; gdb backtrace
showed eonclient SEGV at LAMMPSPot::force +0x155 from
relaxMatter -> Matter::computePotential. Moving in.lammps from
potfiles/ to CWD made the minimization succeed in 2.4s with the
correct -31105.175 eV (matches eOn akmc.py path).
revision=head was non-reproducible and lagged the upstream capnp codegen fix. v1.1.0 includes cb9d85e (capnp_compile.py + .cpp output rename for MSVC), so the conda-forge feedstock can drop fix_capnpc.patch once it bumps eon past this commit. Net change for the conda recipe: - subprojects/rgpot pulled from a tag instead of a tarball pin - patches: [fix_capnpc.patch] becomes unnecessary
VASP has no library dependency: VASP::spawnVASP forks ./runvasp.sh and the class consumes POSCAR / FU files. The previous with_vasp gate + -DWITH_VASP define added build-time complexity for zero binary cost. Now always compiled on POSIX (Windows still omitted via #ifndef _WIN32 in Potential.cpp). The conda-forge feedstock can drop the with_vasp flag and a single eonclient binary supports VASP iff runvasp.sh is on PATH and PotType::VASP is requested. Banner gains a 'subprocess' mode (alongside compile / runtime / compile_runtime): VASP_POT now reports "VASP: enabled (subprocess)" on POSIX and "VASP: disabled (POSIX only)" on Windows. Drops: WITH_VASP from CMakeLists.txt, meson_options.txt with_vasp, the meson if-guard, and the WITH_VASP preprocessor branch in Potential.cpp.
ARTnResource and IRAResource already wrap dlopen + dlsym + per-symbol
load with global mutexes; the only remaining gate was the with_artn /
with_ira meson option that toggled compilation of the shim files plus
-DWITH_ARTN / -DWITH_IRA preprocessor defines. That option also
piggy-backed a configure-time cmake build of subprojects/artn-plugin
and subprojects/ira (workaround for meson's Fortran scanner choking on
-cpp + submodules). All of that goes away.
After this change ARTn and IRA behave like LAMMPS: ARTnSaddleSearch
+ ARTnResource and IRACompare + IRAResource compile unconditionally.
libartn.so and libira.so are dlopen'd lazily at the first
require_loaded() / is_loaded() call, and the user installs them
through the same channels they already use for liblammps.so
(LD_LIBRARY_PATH on Linux, DYLD_LIBRARY_PATH on macOS, PATH on
Windows). Eonclient gives one binary that lights up ARTn / IRA at
run time when the libraries are available.
Removed:
meson_options.txt with_artn, artn_libdir, with_ira, ira_libdir,
ira_includedir
client/meson.build configure-time cmake build of artn-plugin/ira,
-DWITH_ARTN/-DWITH_IRA, _runtime_rpath entries,
compile_runtime banner mode
source every #ifdef WITH_ARTN / #ifdef WITH_IRA across
ARTnSaddleSearch.{h,cpp}, IRACompare.cpp,
ProcessSearchJob.cpp, SaddleSearchJob.cpp,
JobIntegrationTest.cpp, ARTnTest.cpp; the
iralib_interf.h include in IRACompare.cpp
(IRAResource.h already redeclares the C
entry points inline)
Banner now reports:
ARTn: enabled (dlopen at runtime)
IRA: enabled (dlopen at runtime)
Pre-construct guards in ProcessSearchJob / SaddleSearchJob still
throw a runtime_error when method=artn or minmode_method=artn but
libartn isn't on the search path; the message names the entry point
so per-case integration tests still pin which dispatch failed.
Test changes:
ARTnTest / IRATest are always built; they SKIP at runtime when the
respective .so is missing (mirroring how LAMMPS-dependent tests
skip when liblammps isn't installed).
The 3 'rejects ARTn when not compiled' integration tests become
'rejects ARTn when libartn missing': they SKIP if libartn IS
loaded, otherwise assert the new "requires libartn at runtime"
error message.
Benchmark ResultsNote All benchmarks unchanged
8 unchanged benchmark(s)
Details
Raw asv-spyglass output |
Two changes that ship together because they overlap in LAMMPSPot:
(1) Portable run-input bundle (.eonlpb)
A self-contained blob holding in.lammps plus every file the run
needs: pair_coeff data, custom pair_style .so plugins, KIM
tables, read_data inputs, shell helpers. The content is opaque
to eOn -- the packer walks a directory and the client untars it.
LAMMPSPot now has two operating modes:
bundle mode -- [Potential] lammps_bundle = path/to/run.eonlpb.
LAMMPSBundle::extract() unpacks into a per-instance
scratch dir under temp_directory_path() (PID +
64-bit random suffix to avoid collisions). The
makeNewLAMMPS() path issues "shell cd <scratch>"
to liblammps so every relative path inside
in.lammps resolves there. Scratch is removed
in the destructor.
legacy mode -- bundle path empty. Behaviour matches pre-2.15:
in.lammps and every file it references must
live in eonclient's CWD.
Bundle format is plain little-endian:
[0..7] magic "EONLPB1\0"
[8..15] uint64 manifest length
[16..] JSON {"files": [{"name": ..., "size": ...}, ...]}
[...] concatenated file bodies
Pack and inspect with the new eon.lammps_bundle module:
python -m eon.lammps_bundle pack potfiles/ run.eonlpb
python -m eon.lammps_bundle list run.eonlpb
(2) Fail loudly on missing in.lammps (issue eon-7qqp)
LAMMPSPot's constructor now refuses to build the potential if
legacy mode is active and in.lammps is absent from CWD; the
runtime_error names the CWD and points at the bundle option.
Previously lammps_file() silently logged its own error and the
next force() segfaulted on a null pair_style.
LammpsLoader gained two optional symbols:
lammps_has_error
lammps_get_last_error_message
Present in LAMMPS >= 3Mar2020, null on older builds. After
lammps_file() we surface the LAMMPS-side message verbatim instead
of pressing on with a half-initialised handle.
API additions:
Parameters::potential_options.LAMMPSBundlePath (default "")
[Potential] lammps_bundle = <path> (INI)
Potential.lammps_bundle: str (JSON / yaml schema)
…, VASP-always-on lammps_pot.md Replace the CWD-only file-placement guidance with two clearly separated workflows: (1) bundle mode via lammps_bundle = path/to/run.eonlpb (recommended; covers pair_coeff data, custom pair_style .so plugins, KIM tables, read_data inputs); (2) legacy CWD mode for users who still want to keep files next to config.ini. Document the new fail-loud behaviour and the lammps_has_error pass-through so users see liblammps's own message instead of a segfault. artn.md Drop the "needs the artn-plugin subproject + meson configure-time cmake build + -Dwith_artn=true" instructions. ARTn is now always compiled and dlopens libartn.so at run time, mirroring LAMMPS. Replace the build-flag block with a runtime-requirements block that shows how to build libartn standalone and put it on LD_LIBRARY_PATH. potential.md Rewrite the support-tier note: vendored + LAMMPS + VASP + ARTn + IRA are always compiled (LAMMPS / ARTn / IRA are runtime-loaded, VASP is a subprocess shim). Only ASE-* / AMS / MPIPot / GP* remain behind -Dwith_*. Update the LAMMPS and VASP entries in the support table to match the new badges.
Five fragments tied to PR TheochemUI#338: 338.fixed.md rgpot.wrap pinned to v1.1.0 338-vasp.changed.md VASP always compiled on POSIX 338-artn-ira.changed.md ARTn / IRA dlopen at runtime 338-lammps-bundle.added.md .eonlpb portable run-input bundle 338-lammps-fail-loud.fixed.md LAMMPSPot fails loudly on missing in.lammps
Mirrors LAMMPS / ARTn / IRA: XtbLoader is a singleton that opens
libxtb on first instance() call and caches every xtb_* function
pointer used in XTBPot. require_loaded() throws an install-hint
runtime_error when the user requests a PotType::XTB potential and
libxtb isn't found; otherwise the same eonclient binary works
across systems with and without xtb installed.
Removed:
meson_options.txt with_xtb
CMakeLists.txt WITH_XTB
client/meson.build dependency('xtb', required: true), -DWITH_XTB,
conditional subdir('potentials/XTBPot')
source #ifdef WITH_XTB across Potential.cpp factory
and XTBPot includes; XTBPot now always linked
XTBPot rewrite:
XTBPot.h no longer pulls xtb.h; opaque libxtb handles are void*
fields and the .cpp casts at call sites. Constructor and
destructor moved out-of-line; everything routes through
eonc::get_xtb_loader(). The four GFN paramset loaders are
null-checked at use site (older / stripped libxtb builds may
not export every variant).
Tests:
test_xtb (XTBTest.cpp) and test_cineb_xtb (CINEBXTBTest.cpp) are
always built; both SKIP at runtime via XtbLoader::is_loaded() when
libxtb is absent from LD_LIBRARY_PATH. devdocs/testing.md updated.
Banner:
XTB: enabled (dlopen at runtime)
The MPI C++ bindings (MPI::COMM_WORLD, MPI::INT, MPI::DOUBLE,
MPI::Group, MPI::Init, MPI::Finalize, ...) were deprecated in
MPI-2.2 and removed outright in MPI-3.0 (~2012). Modern conda-forge
MPICH 4.x and OpenMPI 5.x do not ship mpicxx.h, so -Dwith_mpi=true
has been a build-time error on those platforms.
Every call site now uses the C bindings, which every spec-compliant
MPI implementation guarantees:
client/potentials/MPIPot/MPIPot.cpp
MPI::COMM_WORLD.Send / .Recv / .Iprobe -> MPI_Send / MPI_Recv /
MPI_Iprobe with explicit MPI_COMM_WORLD + MPI_STATUS_IGNORE.
MPI::INT / MPI::DOUBLE -> MPI_INT / MPI_DOUBLE.
The local icwd buffer goes from long[1024] to int[1024] so the
over-the-wire layout matches the existing MPI server: the
pre-port code declared long[1024] but tagged the message
MPI::INT, which on little-endian sent the low 4 bytes of each
long. Any deployed MPI server still parses 1024 MPI_INTs
correctly with the new buffer.
client/ClientEON.cpp
MPI::Init / Is_initialized -> MPI_Init / MPI_Initialized.
MPI::COMM_WORLD.Get_rank / .Get_size / .Allgather /
.Get_group / .Create / .Barrier / .Abort / .Isend / .Recv
-> MPI_Comm_rank / _size / _Allgather / _group / _create /
_Barrier / _Abort / _Isend / _Recv with proper
MPI_Request / MPI_Status_ignore handling.
MPI::Group.Incl -> MPI_Group_incl.
MPI::Finalize -> MPI_Finalize.
Typo MPI_COMM_nullptr -> MPI_COMM_NULL.
Runtime-load follow-up: the FlexiBLAS-of-MPI shim (one eonclient
binary that dlopens any spec-compliant libmpi) is the right end
state, mirroring LAMMPS / ARTn / IRA / XTB. The MPI standard binary
ABI is in late draft for MPI-5 with experimental support already in
MPICH 4.3+ / OpenMPI 5.0+; MPItrampoline is the practical interim.
Documented in docs/source/user_guide/mpi_potential.md so the
direction is recorded.
MPItrampoline is the FlexiBLAS-of-MPI: it ships a stable wrapper
ABI, and one eonclient binary forwards every MPI_* call to any
spec-compliant libmpi at run time, picked via the
MPITRAMPOLINE_LIB env var. This solves the same problem the
LAMMPS / ARTn / IRA / XTB dlopen loaders solve for their
respective libraries -- one binary, many runtime backends.
The meson detection order in -Dwith_mpi=true builds:
1. dependency('MPItrampoline', method: 'cmake')
Picks up the upstream CMake config that installs alongside
libmpitrampoline.so. Preferred path.
2. dependency('mpi-c')
MPItrampoline also installs a pkg-config shim that shadows
the system mpi-c.pc; this catches conda-style installs
where pkg-config wins over CMake.
3. dependency('mpi', required: true)
Last-resort direct link to the system MPI. Builds against
that single flavour the way pre-2.15 always did.
subprojects/mpitrampoline.wrap pins upstream v5.5.1 so users who
want meson to fetch and CMake-build the trampoline locally can
just `meson subprojects download mpitrampoline` instead of
installing it separately. The wrap declares
`provide.dependency_names = MPItrampoline` so meson auto-falls
through to the subproject when neither system pkg-config nor
CMake config is found.
Document the workflow in docs/source/user_guide/mpi_potential.md
(install MPItrampoline once, build MPIwrapper per MPI flavour,
swap MPITRAMPOLINE_LIB to switch flavours without rebuilding).
The MPI standard binary ABI in late MPI-5 draft will eventually
obsolete the trampoline; until MPICH 4.3+ / OpenMPI 5.0+
stabilise that, MPItrampoline is the working FlexiBLAS-equivalent.
Vault issue eon-7416: post-saddle Min1/Min2 prints '[Matter] 0 0.00000e+00 -nan -0.00000' and the eonclient process hangs indefinitely. The root cause is that NaN-tainted force vectors (from atom overlap inside an EAM/ML repulsive core after a too-large push from the saddle) propagate silently through every comparison in the line-search loop -- 'NaN < threshold' is false, so the loop condition never terminates and the minimizer keeps retrying with shrinking step sizes that all evaluate to NaN. Add a single allFinite() guard at the entry to LBFGS::step() and ConjugateGradients::step() that returns -1 the moment a non-finite element appears in the gradient. The respective run() loops already treat status<0 as a hard failure (LBFGS) or now do (CG); both also re-check after the loop in case isConverged() silently swallowed a NaN comparison. Effect: a NaN-locked minimization terminates with a clear "non-finite force" warning instead of a SLURM-killed hang. The job returns a -1 status the akmc.py wrapper can route to a clean termination_reason instead of leaving an entire chain-state combo indefinitely waiting on output.
Same FlexiBLAS-of-MPI pattern this PR established for MPItrampoline
and the dlopen-based LAMMPS / ARTn / IRA / XTB loaders, applied to
BLAS / LAPACK. Add the with_flexiblas meson option (and the cmake
mirror WITH_FLEXIBLAS) that links Eigen via FlexiBLAS so the
backend (OpenBLAS, Intel MKL, BLIS, ATLAS, ARMPL, ...) is picked
by the FLEXIBLAS env var at run time:
meson setup builddir -Dwith_flexiblas=true
FLEXIBLAS=OPENBLAS ./eonclient # OpenBLAS
FLEXIBLAS=INTELMKL ./eonclient # Intel MKL
FLEXIBLAS=BLIS ./eonclient # BLIS
Defines that activate Eigen's BLAS / LAPACKE backend:
-DEIGEN_USE_BLAS
-DEIGEN_USE_LAPACKE
subprojects/flexiblas.wrap pins upstream v3.5.0 and declares
'flexiblas' as a provided dependency, so meson auto-builds the
wrap when no system install is found. The cmake side mirrors via
FetchContent_Declare(flexiblas, GIT_TAG v3.5.0). There is never a
"no FlexiBLAS available" failure mode.
with_flexiblas and use_mkl are mutually exclusive -- explicitly
errored at configure time. The legacy use_mkl / USE_MKL flag
stays for users who want a direct MKL link without the trampoline
indirection; FlexiBLAS's FLEXIBLAS=INTELMKL path is the
recommended modern equivalent.
Documented in docs/source/install/index.md alongside the existing
ccache / mold notes.
Issue TheochemUI#177 ('BLD: Cleanup with sourceset') was the natural follow-up to the meson-format pass. With LAMMPS, ARTn, IRA, XTB, VASP, and MPItrampoline now standardised on the dlopen / always-compile pattern, the eonclib source-list decisions reduce to three real gates: WITH_GPRD, WITH_GP_SURROGATE, WITH_SERVE_MODE. Replace the historical chain of `eonclib_sources +=` accretions with a single sourceset: eonclib_ss = import('sourceset').source_set() eonclib_ss.add(files(<always-on sources>)) # ARTn / IRA included eonclib_ss.add(when: 'WITH_GPRD', if_true: files(...)) eonclib_ss.add(when: 'WITH_GP_SURROGATE', if_true: files(...)) eonclib_ss.add(when: 'WITH_SERVE_MODE', if_true: files(...)) eonclib_cfg = configuration_data() eonclib_cfg.set('WITH_GPRD', get_option('with_gprd')) eonclib_cfg.set('WITH_GP_SURROGATE', get_option('with_gp_surrogate')) eonclib_cfg.set('WITH_SERVE_MODE', get_option('with_serve')) eonclib_resolved = eonclib_ss.apply(eonclib_cfg) library('eonclib', sources: eonclib_resolved.sources(), ...) Source membership now lives in one place; feature blocks keep their side-effecting parts (subproject() calls, find_library, -DWITH_X defines, _deps additions) but no longer reach into the source list. Build output is byte-identical to the pre-cleanup chain -- sourceset.apply() produces the same deduplicated list. The sourceset module also positions us to add per-feature dependency tracking later (`when: 'WITH_X', if_true: dep_x`), which is the right place to land further consolidation when AMS / metatomic get the same treatment.
Reformat the files added or rewritten earlier in this PR (XtbLoader, LammpsBundle, the C-bindings MPI port, the modernised LBFGS / CG NaN guards, ARTn / IRA include cleanup, JobIntegrationTest) to match the repo's .clang-format. No behaviour change; CI lints job was failing on the unformatted introductions.
…torch libs (closes TheochemUI#304) Three CI-driven build fixes that came out of PR TheochemUI#338 against TheochemUI/eOn (TheochemUI#338): 1. -ldl link Hoist `dl_dep = cppc.find_library('dl', required: false)` to file scope in client/meson.build so every loader-using target inherits `-ldl` automatically. Pre-fix only potentials/LAMMPS/meson.build had a local dl_dep, and the new XtbLoader / ARTn / IRA paths inherited `_deps` without it -- libxtbeon.so failed to link with `undefined reference to dlopen / dlsym / dlclose` on every Linux CI runner. Drop the redundant local dl_dep from potentials/LAMMPS/meson.build. 2. Catch2 --allow-running-no-tests Catch2 v3 returns exit 4 from a process that ran no tests / no assertions even when every TEST_CASE issued SKIP() cleanly. Pass --allow-running-no-tests to every Catch2 unit test in the test() loop so test_xtb / test_artn / test_ira / test_cineb_xtb skip silently with exit 0 on systems where libxtb / libartn / libira isn't on LD_LIBRARY_PATH (e.g. macOS CI without libxtb). 3. Optional torch component lookup (closes TheochemUI#304) Make the libtorch find_library loop required: false and skip missing libraries. The CMake build's find_package(Torch) handles missing components gracefully and so do we now; this unblocks Windows where torch_global_deps ships only a .dll (no .lib import library) and conda-forge libtorch *cpu* installs that strip torch_cuda. Combined with the with_xtb -> always-compile + dlopen change earlier in this PR (issue TheochemUI#304 item 1), the dependency('xtb') fallback removal (item 2), and the VLA -> std::vector swap inside the XTBPot rewrite (item 3), all four meson cross-platform packaging items in TheochemUI#304 are upstream: TheochemUI#304 (1) with_xtb forcing Fortran -- with_xtb is gone TheochemUI#304 (2) xtb fallback triggering source builds -- no dependency('xtb') anymore TheochemUI#304 (3) VLA in XTBPot.cpp -- replaced by std::vector<double> R_bohr(3 * N) in the XtbLoader rewrite TheochemUI#304 (4) Torch find_library unconditional -- this commit
…loses TheochemUI#309) MetatomicPotential ctor called metatomic_torch::pick_output( "energy_uncertainty", outputs, v_energy_uq) blind, then caught the resulting c10::Error and disabled uncertainty checks. Logically correct -- but c10::Error captures its construction stack and prints it on what() / on logger forwarding, so every load of a model without an uncertainty head surfaced a multi-frame backtrace that looked like a real crash to users. Pre-check `outputs.contains(...)` for both candidate keys before the call: <base-variant>::energy_uncertainty (only when v_energy_uq is set) energy_uncertainty (always) If neither key is present, log DEBUG and disable uncertainty checks the same way the catch block did -- but without ever constructing a c10::Error in the first place. v_energy_uq is a torch::optional<std::string>, so guard with has_value() + empty() before deref.
closes TheochemUI#225) The --features flag has been wired through CommandLine.cpp -> printFeatures() -> the FEATURES_STRING configured from client/meson.build's features_list at configure time since the banner work earlier in this PR (it now reports the LAMMPS / ARTn / IRA / XTB / VASP / Metatomic / MPI / FlexiBLAS / Serve modes correctly with 'enabled' / 'disabled' / 'enabled (dlopen at runtime)' / 'enabled (subprocess)' markers). Add the user-facing reference under 'Building from source' so users can confirm a conda env or a custom build before launching long ensembles.
XTB is unconditionally compiled now and dlopens libxtb at run time via XtbLoader (commit 9f2c41c). The -Dwith_xtb option no longer exists; meson errors out with 'Unknown options: with_xtb' when present. pixi.toml install_eon_main task: drop -Dwith_xtb=False .github/workflows/ci_xtb.yml: drop -Dwith_xtb=True; add a comment pointing at LD_LIBRARY_PATH-based detection so the test_xtb / test_cineb_xtb suites SKIP cleanly on envs without libxtb
closes TheochemUI#164) Four meson hardening tweaks that travel together: 1. warning_level: 0 -> 1 in the top-level project default_options gives -Wall on every translation unit. Pre-2.15 we shipped 0 to suppress build noise ahead of the CECAM school (TheochemUI#117); the suppression has outlived its use. warning_level=2 (-Wall -Wextra) and =3 (which also passes -fimplicit-none and breaks the vendored Fortran TUs that rely on implicit typing) are tracked as follow-ups in TheochemUI#117. 2. -fvisibility=hidden + -fvisibility-inlines-hidden applied via add_project_arguments in client/meson.build. Cuts the dynamic symbol table on libeonclib.so by ~10x, speeds up rtld lookups in conda envs, stops internal namespace symbols from leaking. cppc.get_supported_arguments() filters out flags MSVC doesn't accept, so the same line is a no-op on Windows (PE/COFF treats every function as private until __declspec(dllexport)). Public entry points the embedding host needs can opt back in via __attribute__((visibility("default"))) per-symbol; the eonclient binary itself is unaffected. 3. dependency('pybind11') resolved once via a need_pybind11 flag covering the four blocks that previously each ran the lookup (with_catlearn / with_ase / with_ase_orca / with_ase_nwchem). Configure-time logs cleaner; same _deps end state. 4. Drop dead build artifacts (closes TheochemUI#164): client/Makefile.hopper, client/Rules.mk -- legacy GNU-make infrastructure replaced by meson years ago; nothing in the active build references them. tools/mastereqn -- standalone reference code that isn't built or shipped anywhere. tools/toykmc -- ditto.
… chain MetatomicPotential.cpp is the heaviest TU in the project: pulls in the full libtorch + metatensor-torch + metatomic-torch include graph (~2.5 GB of preprocessor output, ~30 s wall-clock per compile on a warm cache, multiple minutes from cold). Add a meson PCH wrapper at client/potentials/Metatomic/pch/ metatomic_pch.h that precompiles the third-party header chain verbatim with the same diagnostic-suppression pragmas the consuming TU uses. meson rebuilds the PCH only when the wrapper or any transitive include changes; touching MetatomicPotential.cpp itself no longer pays the third-party reparse. End state: cold rebuild of metatomic_pot drops from minutes to seconds; warm rebuild stays similar but with a cleaner ninja log. No effect on the runtime binary.
Add a baseline .clang-tidy at the repo root covering the
high-signal categories without flooding:
bugprone-* (real bug classes -- excludes
easily-swappable-parameters and
narrowing-conversions, which fire heavily
on Eigen-int-as-row-index code)
cert-* (security / undefined behaviour)
cppcoreguidelines-pro-type-cstyle-cast / -member-init
cppcoreguidelines-slicing
cppcoreguidelines-virtual-class-destructor
modernize-use-{nullptr,override,equals-default,equals-delete}
modernize-redundant-void-arg
modernize-deprecated-headers
performance-{for-range-copy,implicit-conversion-in-loop,
inefficient-vector-operation,
unnecessary-value-param}
readability-qualified-auto
readability-redundant-{control-flow,string-cstr,string-init,
smartptr-get}
Deliberately excluded categories that don't pay back the noise:
cppcoreguidelines-avoid-magic-numbers (atomistic constants are
intrinsically magic), -pro-bounds-* (Eigen-heavy code), fuchsia-*,
llvm-*, google-readability-todo (we use TODO(rg)).
HeaderFilterRegex skips the vendored client/thirdparty/ tree; a
companion client/thirdparty/.clang-tidy with `Checks: '-*'` belt-
and-braces opts the same tree out so editors that don't honour the
parent regex still see the right ruleset.
WarningsAsErrors is empty: clang-tidy is advisory in this PR,
turning the highest-confidence subset on as warnings-as-errors is
its own follow-up.
Add .github/workflows/ci_sanitizers.yml that builds a debug configuration with `b_sanitize=address,undefined` and runs the full meson test suite. Catches use-after-free / out-of-bounds / signed-overflow / null-deref regressions in the LAMMPSBundle extract path, the MetatomicPotential neighbor list code, and the LBFGS / CG minimizer paths. ASAN_OPTIONS=halt_on_error=1:abort_on_error=1:detect_leaks=0 disables leak-detection because Eigen's static-init pool legitimately holds memory until process exit; we care about UAF / OOB / null-deref, not leaks. UBSAN_OPTIONS=halt_on_error=1:print_stacktrace=1 fails fast on signed overflow / shift overflow / float-cast overflow / vptr mismatch. b_lundef=false relaxes -Wl,--no-undefined which ASan's interceptor shims can't satisfy at link time on the conda-forge gcc 13. Builds against pixi's `dev-lite` env -- the smallest one that covers the runtime-loaded loaders + the always-on potentials. ARTn / IRA / XTB / LAMMPS tests skip cleanly when the runtime libs aren't installed (we land --allow-running-no-tests in the same PR so this works).
DynLib.h: mark every lookup helper [[nodiscard]] so callers must inspect the returned handle / pointer / error string. dlopen() returning NULL is the normal "library not found" signal in LammpsLoader / XtbLoader / *Resource; ignoring the return is always a bug. Same for sym(), openFirst(), error(), and the loadSym<> template. LAMMPSPot.h: explicit threading note on the class. Internally calls std::filesystem::current_path() (process-wide state) and issues "shell cd <workdir>" to liblammps; both are global side effects. liblammps itself keeps process-global state per LAMMPSObj. One LAMMPSPot per thread / per NEB image is the contract. XtbLoader.h: replace the (void) typedef arg lists with () to satisfy modernize-redundant-void-arg. LAMMPSPot.h: add `override` on ~LAMMPSPot and force() now that clang-tidy's modernize-use-override fires (was passing under warning_level=0).
The historical `#define LBFGS_EPS 1e-30` (1) skipped the type
system, (2) leaked into every TU that #included LBFGS.h, and (3)
gave no hint about its role for a future maintainer reading the
guard at line 124 of LBFGS.cpp:
if (std::abs(s0.dot(y0)) < LBFGS_EPS) { ... reset(); ... }
Convert to:
inline constexpr double LBFGS_EPS = 1e-30;
with a docstring explaining it as the curvature-update gate -- the
floor below which `s_0 . y_0` is too small to invert into rho
without amplifying denormals into the history buffer. Same style
the ARTnSaddleSearch.h ARTN_MODE_TOLERANCE / ARTN_SMALL_DISPLACEMENT
constants already use. Numerically identical; just a less hostile
declaration site.
Two related fixes that came out of an actual local build with the
new warning_level=1 flag enabled:
1. Revert the global `-fvisibility=hidden` from earlier in this PR.
It hid Matter::compare in libeonclib.so so linking eonclient
against the lib failed with `undefined reference to
eonc::Matter::compare(eonc::Matter const&, bool)`. To use
-fvisibility=hidden cleanly we'd have to annotate every
cross-DSO entry point in libeonclib (Matter::*, Potential::*,
Job::*, ...) with __attribute__((visibility("default"))) -- a
separate multi-day refactor. Keep -fvisibility-inlines-hidden;
it only affects inline / template-instantiation symbols and
doesn't change cross-DSO linkage. cppc.get_supported_arguments()
filters out the flag on MSVC where it isn't accepted.
2. Migrate the 10 [[deprecated]] constructor call sites that
warning_level=1 surfaced. Every Optimizer subclass and every
Dynamics instantiation was still calling the legacy
"Pass {Optimizer,Dynamics}Config directly" ctors. Replace with
the {Optimizer,Dynamics}Config::fromParams(params) form already
used by FIRE.
client/LBFGS.h
client/Quickmin.h
client/SteepestDescent.h
client/ConjugateGradients.h
client/DynamicsSaddleSearch.cpp
client/ParallelReplicaJob.cpp (2 sites)
client/ReplicaDynamicsJob.cpp
client/SafeHyperJob.cpp
client/TADJob.cpp
End state: a clean -Wall build with zero deprecated-declarations
warnings. The remaining clang-tidy advisories
(performance-unnecessary-value-param on shared_ptr by value,
modernize-redundant-void-arg in eonc::Optimizer, etc.) are
pre-existing and tracked under TheochemUI#117 / TheochemUI#144.
A clean -Wall build now emits zero warnings. Two changes: 1. NEBInitialPaths.cpp::sidppPath gcc 13's -Wstringop-overread fired on the QUILL_LOG_INFO call. The `use_zbl ? "-ZBL" : ""` ternary fed a const char* into quill's safe_strnlen specialisation that calls memchr with n=SIZE_MAX; the optimizer can't prove the early-exit at the null terminator and the source-size analysis trips on the 5- byte literal. Bind the literal to a sized std::string before logging so quill's encoding takes the string overload that already knows the length. 2. Optimizer / LBFGS / Quickmin / SteepestDescent / CG / FIRE Pass std::shared_ptr<ObjectiveFunction> by value into every ctor and std::move into the base m_objf storage. Avoids the per-construction refcount bump that performance-unnecessary-value-param flagged. Derived ctors that previously read a_objf->degreesOfFreedom() now read through the base m_objf member that just got the moved-in value -- semantically identical, no extra copy. Verified on cosmolab (gcc 13.3, dev-lite pixi env): 155/155 ninja steps, 0 warnings, 0 errors.
Project default_options jumps from warning_level=1 (-Wall) to 2
(-Wall -Wextra). The codebase compiles cleanly under both --
verified locally via cosmolab gcc 13.3 / dev-lite pixi env, 229
ninja steps, 0 warnings.
Pin the inih r62 subproject to warning_level=0 in its
default_options:
inireader_dep = dependency(
'inireader',
fallback: ['inih', 'INIReader_dep'],
required: true,
default_options: ['default_library=static', 'warning_level=0'],
)
inih's vendored tests/unittest_alloc.c has an `unused parameter
'user'` warning that's upstream test code, not ours to fix.
Without the override our -Wextra inherits into the subproject and
leaks the warning into our build log.
warning_level=3 stays out of reach: it also passes -fimplicit-none
to gfortran, which breaks eonclient's vendored Fortran TUs that
rely on implicit typing. Fortran-side cleanup is tracked in TheochemUI#117.
readcon-core v0.9.0 ships pre-generated C/C++ headers via cargo-c, so consumers no longer need cbindgen on the build host. The ensure_cbindgen pixi task and its setupeon depends-on entry are now dead weight.
Twelve cert-err33-c hits in the clang-tidy survey: stdio functions
whose return values were silently dropped. Categorised by remediation:
Real error path -- must surface to the caller:
Parameters.cpp:Parameters::load(FILE*)
fseek/ftell/fread results checked; abort with EONC_LOG_ERROR
and `return 1` on any failure.
HelperFunctions.cpp:loadMode(FILE*, int)
fscanf return checked against expected 3-double match;
EONC_LOG_CRITICAL + std::exit(1) on a short read so the
mode-direction file isn't silently parsed as zeros.
HelperFunctions.cpp:saveMode(FILE*, ...)
fprintf return checked; EONC_LOG_CRITICAL + std::exit(1) on a
short write rather than dropping data on the floor. Signature
updated to `const std::shared_ptr<Matter>&` + `const AtomMatrix&`
to match (and the matching declaration in HelperFunctions.h).
BasinHoppingJob.cpp:188-192
`char fname[128]; snprintf(...)` -> `std::string fname =
std::format(...)`. Removes the truncation hazard entirely.
No-recovery path -- explicit (void) cast to silence:
ConFileIO.cpp:writeMatterCon writes are best-effort -- if the
fprintf chain fails partway through there's no way to recover
once the header has been emitted, so cast each fprintf and the
closing fclose to (void).
HelperFunctions.cpp:loadMode RAII closer
fclose ignored at scope-exit; the buffer was read-only so
flushing-write failure is impossible.
The companion cert-err34-c sites in Bundling.cpp / Parameters.cpp /
fpe_handler.cpp landed in d22d4fe.
Three remaining clang-tidy categories from the survey:
1. performance-unnecessary-value-param (53 hits)
Auto-fix on the .cpp + matching .h files: every value-typed
parameter that was only read as const (`AtomMatrix mode`,
`std::shared_ptr<X> ptr`, `std::string name`, ...) becomes a
const reference, and the few cases where the param is moved-into
storage (Matter::relax prefixMovie / prefixCheckpoint) get
pass-by-value-and-std::move. Same end semantics, no copy at the
API boundary.
2. bugprone-implicit-widening-of-multiplication-result (10 hits)
`int * int` chains assigned to `Eigen::Index` / `long` /
`size_t` widen implicitly and were flagged. Anchor the
multiplication in the destination type up front:
const Eigen::Index seg = static_cast<Eigen::Index>(3) * natoms;
in IDPPObjectiveFunction, NEBObjectiveFunction, and
NEBOcinebController.
3. Misc bugprone (5 hits)
Matter.cpp:operator= -- self-assignment guard
(cert-oop54-cpp / bugprone-unhandled-
self-assignment).
RandomNumbers.cpp:56 -- assignment-in-if hoisted out
(bugprone-assignment-in-if-condition).
ReplicaExchangeJob.cpp:28,32 -- (long)(x + 0.5) -> std::lround
(bugprone-incorrect-roundings).
BiasedGradientSquaredDescent.cpp:156 -- repeated branch body
collapsed into a ternary
(bugprone-branch-clone).
ReplicaExchangeJob.cpp:58 -- migrate the leftover deprecated
Dynamics ctor to DynamicsConfig::
fromParams(params).
Bundling.cpp:catch -- attach a std::cerr line to the formerly
empty std::stoi catch
(bugprone-empty-catch).
Follow-up not in this commit: the Optimizer / SaddleSearchMethod
status-int return values are still magic-number style (-1 / 0 / 1).
A typed enum (`enum class StepResult { Failed, Step, Converged }`)
would be the right move and crosses every call site -- separate
focused branch.
First half of the typed-status refactor the user asked for. Add
client/StatusTypes.h with two strongly-typed enums:
enum class StepResult : int { Failed = -1, NotConverged = 0,
Converged = 1 };
enum class SaddleStatus : int { Good = 0, Init = 1, ... }; // 22+1 states
This commit migrates `Optimizer::step()` / `Optimizer::run()` (and
every subclass: LBFGS, ConjugateGradients, FIRE, Quickmin,
SteepestDescent) from `int` returns to `StepResult`. Driver call
sites that captured the return value (MinModeSaddleSearch::run's
optStatus) switch to comparing against `StepResult::Failed` rather
than the bare `< 0` check; sites that discard the return
(NEBInitialPaths, NudgedElasticBand, BiasedGradientSquaredDescent,
HelperFunctions::relaxMatter) compile unchanged.
Underlying integer values are preserved (`Failed = -1`,
`NotConverged = 0`, `Converged = 1`) so any out-of-tree code that
read the int continues to do so via to_int(StepResult).
Followup commit applies SaddleStatus to MinModeSaddleSearch /
ARTnSaddleSearch / BasinHoppingSaddleSearch /
BiasedGradientSquaredDescent / DynamicsSaddleSearch and the
SaddleSearchMethod base.
Pre-StepResult build failure surfaced one driver site that captured the int return value of `Optimizer::run()` without actually using it -- `int status = idpp_optim->run(...)` in NEBInitialPaths::idppOptimizePath. Cast to (void) since the convergence is read from idpp_objf->getConvergence() right after.
…rchy Phase two of the typed-status work (StepResult landed in 915dd04). Replace the magic-int saddle-search status with a strongly-typed `enum class SaddleStatus : int` defined in StatusTypes.h alongside StepResult. API surface: client/SaddleSearchMethod.h virtual SaddleStatus run() = 0; virtual SaddleStatus getStatus() const { return SaddleStatus::Good; } virtual std::string_view describeStatus(SaddleStatus) const { return statusMessage(s); } // default forwards to canonical // statusMessage in StatusTypes.h Subclasses (MinMode / ARTn / BasinHopping / DynamicsSaddle / BGSD): SaddleStatus run() override; SaddleStatus getStatus() const override { return status; } SaddleStatus status{SaddleStatus::Good}; Job drivers (SaddleSearchJob / ProcessSearchJob): SaddleStatus do{Saddle,Process}Search(); void saveData(SaddleStatus status); void printEndState(SaddleStatus status); Migration notes: * MinModeSaddleSearch::Status (the unscoped enum with 22 STATUS_* enumerators) goes away entirely; canonical definition is eonc::SaddleStatus in StatusTypes.h. The old 22 names migrate 1:1 (STATUS_GOOD -> SaddleStatus::Good, STATUS_BAD_NO_CONVEX -> SaddleStatus::BadNoConvex, ...) preserving the underlying int values byte-for-byte so results.dat parsers, the akmc.py driver, and the JobIntegrationTest fixtures continue to read the same integers via to_int(SaddleStatus). * ARTnSaddleSearch's static constexpr int STATUS_BAD_ARTN_ERROR = 22 (and Status_Good / BadMaxIterations) are gone; ARTn now returns SaddleStatus::BadArtnError directly. The ad-hoc describeStatus override is dropped -- the base default describeStatus(SaddleStatus) -> statusMessage(SaddleStatus) handles the full vocabulary. * NEBOcinebController::runDimer keeps its own int return values (-2 / -1 / 0 / 1) since they're MMF-internal control flags with semantics distinct from saddle-search status; that subset is a candidate for its own typed enum in a follow-up. * Termination-reason serialization uses to_int(status) so the int written to results.dat is unchanged. 110+ STATUS_* call sites migrated mechanically via sed. clang-format pass applied; all targets compile cleanly on cosmolab gcc 13.3.
Out-of-line method definitions in BasinHoppingSaddleSearch / BiasedGradientSquaredDescent / DynamicsSaddleSearch / MinModeSaddleSearch needed `using eonc::SaddleStatus;` after the header include to resolve the unqualified `SaddleStatus` return type now that the typed enum lives in StatusTypes.h. clang-format pass landed alongside.
Stray `int minModeStatus` left over from the SaddleStatus migration in the inner `refineTransition` loop. Bare int can't return-implicit- convert to SaddleStatus; tighten the local to the typed enum so the `return minModeStatus;` path on a non-Good search compiles.
…MFStatus enum The remainder of the clang-tidy advisory pass on PR TheochemUI#338. Five buckets: 1. bugprone-implicit-widening-of-multiplication-result (ARTnSaddleSearch.cpp) Anchor `nat * 3` chains in size_t / ptrdiff_t up front: const std::size_t natz = static_cast<std::size_t>(nat); std::vector<int> if_pos(natz * 3, 1); Pointer-offset arithmetic on tau_sad_ptr / evec_ptr is now `evec_ptr + static_cast<std::ptrdiff_t>(3) * nat`. 2. performance-unnecessary-value-param (BasinHoppingSaddleSearch / BiasedGradientSquaredDescent / DynamicsSaddleSearch / ARTnSaddleSearch ctors; DynamicsSaddleSearch::refineTransition::prod) shared_ptr-by-value -> const-ref where the param is just read, pass-by-value-and-std::move where it gets stored. ARTn ctor reorders the member init list to match declaration order (avoids the -Wreorder follow-on warning). 3. bugprone-incorrect-roundings (DynamicsSaddleSearch.cpp:93,97) `(int)(x + 0.5)` is broken for negative arguments and exact floats. Replace with std::lround. 4. New typed enum MMFStatus (NEBOcinebController.h/.cpp) The runDimer return values (-2 / -1 / 0 / 1) had distinct semantic meanings (PositiveCurvature / Skipped / Helped / MaxIterations) that didn't map onto SaddleStatus -- positive curvature on the climbing image is an *expected* short-circuit for the OCINEB controller, not a generic saddle-search failure. Promote to its own scoped enum: enum class MMFStatus : int { Helped = 0, MaxIterations = 1, Skipped = -1, PositiveCurvature = -2 }; to_int(MMFStatus) for the log format strings; underlying values preserved. 5. Internal docs at ~/Git/Gitlab/obsidian-notes/Software/eon/ issues.org get a new DONE entry summarising PR TheochemUI#338's full scope (runtime-load + bundle + typed-status + clang-tidy pass), and the eon-7qqp + eon-7416 bug entries flip to DONE with CLOSED stamps pointing at the landed fixes.
The C++ side migrated saddle-search termination reasons to
`enum class SaddleStatus : int` in client/StatusTypes.h. The Python
orchestrator was still comparing raw ints (`termination_reason == 0`,
`== 15`, `= 9`, `= 6`) against magic literals and carrying two
duplicated label tables -- one title-cased in akmcstate.py, one
slug-cased in explorer.py. Both tables had drifted out of sync with
the C++ enumeration.
Introduce eon/status_codes.py as the single source of truth: an
IntEnum mirror of StatusTypes.h plus label / slug lookup helpers.
Wire format is unchanged -- IntEnum members compare equal to the int
they wrap, so existing results.dat files round-trip without churn --
but every server-side comparison now reads as the named member.
Two latent bugs surface and get fixed by the consolidation:
- akmcstate.py:599 had a 20-entry result_state_code list, so any
termination_reason in {20, 21, 22} (DimerLostMode,
DimerRestoredBest, BadArtnError -- all live status codes since the
ARTn / dimer warm-start work landed) would IndexError out of
register_bad_saddle and crash the akmc loop. The new helper falls
back to "Unknown termination code (N)" for any value SaddleStatus
cannot construct.
- explorer.py:584 had "nonlocal abort" with a space (typo) where
every other slug uses snake_case. Nothing downstream consumed the
typo'd form, so the fix is just collapsing to "nonlocal_abort".
Net cleanup:
- eon/akmcstate.py: 20-line label table -> 1 helper call,
`== 15` -> `== SaddleStatus.BadMdTrajectoryTooShort`.
- eon/basinhopping.py:251 `== 0` -> `== SaddleStatus.Good`.
- eon/explorer.py: slug table replaced by callable dispatch into
saddle_status_slug / minimization_status_slug; magic-int writes
`= 9` / `= 6` -> `= int(SaddleStatus.BadMinima)` / `BadNotConnected`;
three `== 0` checks named.
Towncrier fragment in docs/newsfragments/338-py-status-codes.changed.md.
Three reds on PR TheochemUI#338 that the cosmolab smoke didn't catch because it built libs+binary, not the test target or the docs workflow. 1. client/unit_tests/SaddleSearchTest.cpp -- five `int status = search.run()` sites left over from the SaddleStatus enum class migration. The base-class signature changed to `SaddleStatus run() = 0`, but the unit test was never updated and the test target wasn't part of the cosmolab build, so this didn't surface locally. Adds `#include "StatusTypes.h"` + namespace-scoped `using eonc::SaddleStatus`, retypes `status` as `SaddleStatus`. The existing `>= Good` / `<= DimerRestoredBest` / `!= Good` comparisons compile cleanly because C++20 scoped-enum relational operators are well-defined within a single enum type. 2. .github/workflows/ci_docs.yml -- the `pixi run ensure_cbindgen` step was kept after commit ee27bba dropped that pixi task with the readcon v0.9.0 bump (cargo-c handles header generation now). Pixi exits 127 ("Available tasks: gen-ref / makedocs / ...") with no ensure_cbindgen listed. Drop the line. 3. clang-format drift in StatusTypes.h (inline-comment alignment) and ARTnTest.cpp (line-wrap) that landed unformatted in earlier commits and tripped the lints job's `uvx prek -a`.
…ptimizer + ARTn migration Catch2 v3 streams the lhs/rhs of every REQUIRE/CHECK/INFO via m_oss << value, so the SaddleStatus / StepResult migration broke every assertion on those types -- 'no match for operator<<'. Adds inline ostream insertion overloads in StatusTypes.h: SaddleStatus -> "<message> (<int>)" StepResult -> "<int>" Then finishes the test-side migration: * OptimizerTest: 4 sites, `int status = opt.run(...)` becomes `StepResult status = opt.run(...)`; `CHECK(status == 1)` becomes `CHECK(status == StepResult::Converged)`. * ARTnTest: `int dimerStatus / artnStatus / status` -> `SaddleStatus`, the loose `>= 0` becomes `>= SaddleStatus::Good`. Adds `using eonc::SaddleStatus`. * JobIntegrationTest: 8 sites that read `int status = std::stoi(results["termination_reason"])` and then compared against `SaddleStatus::*`. `std::stoi` is the wire-format reader (results.dat carries an int), so we cast through to keep the wire side honest: `SaddleStatus status = static_cast<SaddleStatus>(std::stoi(...))`. Then 5 `REQUIRE(status == 0)` and `(status == 0) // GOOD` sites become `REQUIRE(status == SaddleStatus::Good)`. The pre-existing `REQUIRE(std::stoi(results[...]) == 0)` form (no intermediate variable, comparing two ints) is left as-is -- it documents the wire-format check. Cosmolab full-tree build now green: 0 errors, 0 warnings. Runtime tests pass except `test_ira` and `test_jobs` cases that require libira / libartn on LD_LIBRARY_PATH (the dev-lite env on cosmolab does not stage them, the CI runners do).
ci_docs.yml lost its cbindgen-install step in the v2.14.0 release-prep
churn. The other workflows (ci_ase, ci_benchmark, ci_build_akmc,
ci_build_gprd, ci_metatomic, ci_sanitizers, ci_serve, ci_xtb) all
keep the same `command -v cbindgen || cargo install cbindgen` step.
The wrap subproject at subprojects/readcon-core/meson.build:22
still calls find_program('cbindgen', required: true) at v0.8.0 *and*
v0.9.0 -- the conda-forge readcon recipe is what dropped its
cbindgen dependency by shipping pre-generated headers via cargo-c,
but our meson build path goes through the subproject, not the conda
package, so cbindgen is still required at meson-setup time.
Restore the install step at the same position the other workflows
have it (between sccache config and the meson setup step).
ce4f17d added eon/status_codes.py (the SaddleStatus IntEnum mirror of client/StatusTypes.h) but did not register it in eon/meson.build. meson-python therefore never copied it into site-packages, so the wheel-installed package raised at import time: File ".../site-packages/eon/akmcstate.py", line 17, in <module> from eon.status_codes import SaddleStatus, saddle_status_label ModuleNotFoundError: No module named 'eon.status_codes' Surfaced on macos-14 first because that runner does a clean meson-python install rather than running from the source tree; ubuntu CI ran the eon module via a path that resolved against the checkout for some test cases. Listing the new file in py.install_sources fixes both runners and the local `pip install .` path.
The IRA / ARTn migration to runtime dlopen (df7ee8d) made libira and libartn optional at runtime: missing libs surface as a thrown exception from require_loaded() and IRACompare::match returns error=-1 instead of 0. The integration tests in IRATest.cpp and one missed JobIntegrationTest case did not actually have the runtime-skip guard the IRATest header docstring claims they did, so on macos-14 (where the dev-mta env stages neither libira nor libartn) the tests REQUIRE(result.error == 0) and crash out with Catch2 exit 42. Add the same `if (!eonc::get_*_resource().is_loaded()) SKIP("... not available at runtime");` guard the LAMMPS / XTB / ARTn tests already use. Affected test cases: - IRATest: 5 TEST_CASE_METHOD blocks (every match/symmetry case). - JobIntegrationTest: "SaddleSearchJob standalone ARTn works without direction file" -- the other ARTn integration cases at lines 1028 and 1091 already had the guard, this one was missed during 092e... when ARTn went runtime-loaded. linux-64 dev-mta CI passes today because the runner happens to have libira / libartn on LD_LIBRARY_PATH; the SKIP path keeps the macos build green without weakening the linux assertions.
…, NumPy ndarray views, zstd)
… dtors XtbLoader::~XtbLoader() / LammpsLoader::~LammpsLoader() / ARTnResource::~ARTnResource() / IRAResource::~IRAResource() each held a process-lifetime dlopen'd handle and called dynlib::close() at static destruction. That ordering crashes on the ubuntu-22.04 ci-xtb runner (test_xtb / test_cineb_xtb SIGSEGV right after Catch2 prints "All tests passed", before exit). It does not crash on cosmolab dev-xtb (release or debugoptimized) because the libstdc++ / glibc destructor ordering happens to land elsewhere. Reproducing on every CI runner is not the bar; leaking the handle is the right move regardless. Why: each of these libraries (libxtb, liblammps, libartn, libira) drags in either a gfortran runtime (atexit hooks running FINAL/STOP, stdio flush) or an MPI / OpenMP / KIM teardown chain whose ordering relative to a Meyer-singleton dtor is unspecified. dlclose-during-process-fini runs those chains against a partially torn-down process state; the canonical symptom is a segfault inside libgfortran's stop_string or inside libc's stdio flush. Plugin loaders in long-lived processes (LAMMPS, GROMACS, i-PI) all leak their dlopen'd handles for the same reason -- the OS unmaps the mapping at exit, so the "leak" is bounded by process lifetime. The error-path dlclose inside the *constructor* is kept: that fires before any consumer has dlsym'd or held a function pointer, so it is safe to unwind.
…aders fc1005d only landed the XtbLoader half of the fix because the other three loader files (LammpsLoader.cpp, ARTnResource.cpp, IRAResource.cpp) hadn't been Read'd in this session. Same change as fc1005d: drop the dlclose call from the destructor, leak the handle to OS-reap at exit. Reasons for each are in the per-file comment.
5270adf added a NaN-force guard to LBFGS / CG step() and run() to avoid spinning forever when the underlying potential returns NaN forces. The guard is the right behaviour but the implementation called `m_objf->getGradient()` afresh at three of the four sites: LBFGS::run end: if (!m_objf->getGradient().allFinite()) ... CG::step entry: if (!m_objf->getGradient().allFinite()) ... CG::run end: if (!m_objf->getGradient().allFinite()) ... Each fresh getGradient invalidates the cached forces the saddle-search outer loop just populated, charging +1 force evaluation per call. On SaddleSearch / morse_pt the end-to-end cost was a 28 - 39% regression (39 -> 50 force calls dimer, 44 -> 61 Lanczos), exactly the JobIntegrationTest "force calls match SVN" assertions tripped on the macos / ubuntu CI runs of TheochemUI#338. Bisected by reverting the four checks on cosmolab and watching the SVN bounds satisfy. Trim to one cached check per optimizer: * LBFGS::step keeps `if (!f.allFinite())` -- `f` is the local already loaded by `f = -m_objf->getGradient()` at the top of step() so the check is free. The exit-side fresh getGradient in LBFGS::run goes. * CG::step gates the check on `m_cg_i > 0` and uses the stored `m_force` from the previous step's line_search / single_step body. No fresh evaluation, so the saddle-search cache stays warm. The exit-side fresh getGradient in CG::run goes. The per-step guard is sufficient: if NaN appears between steps, the next step()'s body still loads it via getGradient and the guard fires there; if NaN appears mid-step, the local-`f` / local-`m_force` check inside the body bails on the same iteration. Validated on cosmolab dev-lite (`meson test test_jobs`): all SaddleSearch SVN bounds satisfied (1/1 PASS).
Two test-build hygiene changes that together speed up the sanitizer
matrix dramatically and silence the libstdc++ regex maybe-uninitialized
false positive at -O3 + LTO.
1. Catch2 amalgamated as a single static_library
`thirdparty/catch2/catch_amalgamated.cpp` (~3000 lines including
<regex>) was listed in every test executable's `sources`, so each
of the 30+ test targets re-compiled it from scratch. Under
ASan + UBSan instrumentation that's the dominant cost of the
sanitizer build (45+ min observed on the GHA ubuntu runner; the
in-progress step that prompted the ask). Promote it to its own
`static_library('catch2_amalgamated', ...)` and link the artifact
into every test exe + the TestMain helper + the nwchem socket exe
+ bench_pot. Catch2 compiles once per build dir per buildtype.
Side benefit: the lib is built with `-Wno-maybe-uninitialized`,
suppressing the libstdc++ 13.3 false positive on
`function<bool(char)>` move-ctor inside `_NFA::_M_insert_alt`
that fires only at -O3 + LTO (the release-buildtype CI matrix).
This is the long-running GCC PR93499 / 96012 family. eOn's own
test sources stay under -Wmaybe-uninitialized.
2. bench_pot off the default-build set
`bench_pot` is the meson benchmark target invoked by
`meson test --benchmark`. The default `meson compile` builds it
too, which is wasted work in:
- sanitizer matrix (instruments + links a benchmark binary
nobody runs in `meson test --suite eon`)
- test matrix (same; benchmarks have their own job)
- dev-loop (anyone who ran `ninja -C bbdir`)
`build_by_default = false` keeps `meson test --benchmark` and any
explicit `ninja bench_pot` working but skips it from the default
target set. That alone trims ~20% off the sanitizer build wall.
Local cosmolab dev-lite: full `ninja -C bbdir-warn` 0 errors / 0
warnings, 25s incremental.
The metatomic-docs CI step was emitting four legitimate warnings near
the end of the sphinx pass:
release.md:179 myst.xref_missing 'post-bump-actions'
testing.md:59 misc.highlighting_failure meson lexer chokes
on `[['name', 'file']]`
v2.8.0/publications.md:29 bibtex.key_not_found
gardenReassignmentMagicNumbers2018a
v2.8.1/publications.md:23 myst.xref_missing
goswamiEfficientExplorationChemical2025
Per-fix:
* release.md: the heading `## 3. Post-bump actions` auto-anchors as
`3-post-bump-actions`, so the link target `#post-bump-actions`
doesn't resolve. Add an explicit `(post-bump-actions)=` MyST label
before the heading so the in-page reference works.
* testing.md: pygments' `meson` lexer doesn't handle the
`[['name', 'file', 'data_dir']]` nested-string-array form used in
the registration snippet. Switch the fence to `bash` (which still
highlights strings + brackets reasonably) and add a caption so the
snippet origin is obvious.
* v2.8.0 publications: typo in the cite key -- the bib entry is
`gardenReassignmentMagicNumbers2018` (no trailing `a`); drop the
stray `a`.
* v2.8.1 publications: the `[ArXiV preprint](...)` link erroneously
used the cite key as the URL target. Replace with the canonical
arXiv URL https://arxiv.org/abs/2510.21368 .
The remaining `eon.schema.NudgedElasticBandConfig:1` lexing warning
in pydantic-generated docstrings is auto-output from
`pydantic.BaseModel.__doc__` (literal JSON in the model schema)
and is benign; not silenced here.
…ctor d60a9b8 split catch_amalgamated.cpp into its own static_library and removed it from `testMain`'s sources, but `testMain` was already an orphan target -- nothing in the test_array foreach links against it and bench_pot uses TestMain.cpp directly. Without `catch_amalgamated.cpp` in its source list the testMain shared library exposed undefined Catch2 references (its TestMain.cpp uses `Catch2ApprovalListener::testCaseStarting` etc.) and `b_lundef=true` LTO link blew up across every test workflow: undefined reference to `Catch::AssertionHandler::handleUnexpectedInflightException()' undefined reference to `Catch::AssertionHandler::handleExpr(...)' ... `link_whole: catch2_lib` would also fix the link, but a target nobody consumes is just dead weight that LTOs every clean build for no reason. Drop the `testMain = library(...)` declaration outright and leave a comment pointing at the (one) bench_pot consumer of TestMain.cpp. Cosmolab dev-lite full rebuild: 0 errors, 0 link errors, 0 warnings.
d60a9b8 introduced catch2_lib as a static_library to dedupe the 3000- line amalgamated TU across 30+ test executables. The dedup works at debug / debugoptimized buildtypes (cosmolab green), but at the CI's `--buildtype release` (which sets `-flto=auto`) LTO sees through the archive boundary and prunes any object whose symbols aren't directly referenced by the test source. That includes the Catch2-amalgamated `main()` -- no test source calls main, so LTO drops it -- and the AssertionHandler / EventListener vtable carriers for the listener that TestMain.cpp registers. Result: every test linker step blew up with (.text+0x24): undefined reference to `main' undefined reference to `Catch::AssertionHandler::handleExpr(...)' undefined reference to `Catch::EventListenerBase::testRunStarting(...)' ... Switch from `link_with: catch2_lib` to `link_whole: catch2_lib` everywhere catch2_lib is consumed (test executables foreach, test_socket_nwchem, bench_pot). `link_whole` translates to `-Wl,--whole-archive` which forces the linker to pull in every object before LTO runs, so main() and the listener machinery survive the LTO pass. The size cost is bounded by the catch2 amalgamated itself (~250 KiB per exe at -O3 -flto), which is the same cost the pre-d60a9b80 per-target compile already paid. Reproduced on cosmolab dev-lite at `--buildtype release` (the same flags as CI): without link_whole it fails identically; with link_whole all 390 targets link clean.
…lags d60a9b8 added `-Wno-maybe-uninitialized` unconditionally to the catch2_lib cpp_args to silence the libstdc++ <regex> false positive under -O3 + LTO. That works on gcc / clang but MSVC (the windows-latest matrix) parses the flag as a number and bails with cl : Command line error D8021 : invalid numeric argument '/Wno-maybe-uninitialized' Route the flag through `cppc.get_supported_arguments(...)` so only compilers that recognise it pass it through. The pattern matches the rest of the meson build (`cppc.get_supported_arguments` is already used for `-fvisibility-inlines-hidden`, `-O3 -flto=auto`, `-march=native`, `-ffast-math`). Local cosmolab dev-lite at `--buildtype release`: still 0 errors, 0 warnings -- the GCC build still gets the suppression and the catch2 amalgamated still compiles cleanly.
The static_library('catch2_amalgamated') + link_whole approach gives
the Linux sanitizer matrix a real dedup win (~30 test exes share one
compile of the 3000-line amalgamated TU), but on the windows-latest
matrix it fails to link with
client\libcatch2_amalgamated.a : fatal error LNK1136: invalid or corrupt file
at the /WHOLEARCHIVE step. Conda-forge's clang 19 toolchain on Windows
makes meson's `static_library` emit a GNU `.a` archive, but the MSVC
link.exe `/WHOLEARCHIVE` flag expects a COFF `.lib`. The two formats
are not interchangeable.
Branch the meson logic on `host_machine.system() == 'windows'`. Linux
and macOS keep the catch2_lib + link_whole path (the sanitizer-time
win that prompted d60a9b8). Windows reverts to the pre-d60a9b80
behaviour: catch_amalgamated.cpp is added to each test executable's
sources directly. The cost is one extra compile per test target on
the single Windows matrix entry; the Windows builds are not the slow
ones.
Verified on cosmolab dev-lite at `--buildtype release` (matches
CI flags): all 390 targets link clean.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
LAMMPS is unconditionally compiled into the client (see client/meson.build subdir('potentials/LAMMPS') with comment 'always compiled, loaded at runtime via dlopen'). Yet the features banner check at line 168 looks for -DLAMMPS_POT in the compile-args string -- a flag the meson build never sets, so LAMMPS was always reported as 'disabled'.
Extends the features_list tuple to (flag, name, mode):
'compile' -- existing behaviour, check -Dflag in _args_str
'runtime' -- always compiled in; LammpsLoader::instance()
does the dlopen lookup at run time
LAMMPS now reads 'enabled (dlopen at runtime)' which matches reality. Other entries unchanged.
Note: the runtime-loaded report does NOT verify liblammps.so is actually present on LD_LIBRARY_PATH; that diagnostic still comes from LammpsLoader::require_loaded() throwing when a LAMMPS potential is requested without the library.
Some more of this needs to be done too.