feat: implement ASAN and UBSAN#865
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds AddressSanitizer and UndefinedBehaviorSanitizer support to the ePIC DD4hep geometry build and enables sanitizer builds in the main Linux eic-shell workflow.
Changes:
- Adds CMake options and build/link/preload handling for ASAN and UBSAN.
- Enables sanitizer/debug builds in the CI build matrix.
- Adds sanitizer suppression files and workflow sanitizer environment variables.
Reviewed changes
Copilot reviewed 2 out of 5 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
CMakeLists.txt |
Adds sanitizer options, runtime preloading for DD4hep component generation, sanitizer compile/link flags, and suppression-file installation. |
.github/workflows/linux-eic-shell.yml |
Adds sanitizer runtime options and enables ASAN/UBSAN Debug builds in CI. |
.github/asan.supp |
Adds ASAN suppression file placeholder. |
.github/lsan.supp |
Adds LSAN suppression file placeholder. |
.github/ubsan.supp |
Adds UBSAN suppression file placeholder. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Capybara summary for PR 865
|
veprbl
left a comment
There was a problem hiding this comment.
LGTM. Redefining dd4hep_generate_rootmap is not the best, I would gate it away for non-asan-ubsan uses.
Agreed, the redefining is annoying. Gating is a good idea. Wish it could be upstreamed somehow. |
|
Filed AIDASoft/DD4hep#1632 to allow for avoiding the redefinition in the future. |
|
This may appear to indicate that all is hunky dory, no leaks or UB, but https://github.com/eic/npsim/actions/runs/26313766460/job/77468707988#step:6:182 tells another story. We are loading asan/ubsan too late if we wait until the geometry plugin is loaded, and it makes the whole thing rather ineffective. |
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Transfer fixes learned from npsim Clang ASAN/UBSAN debugging:
1. Clang runtime library names: clang++ -print-file-name=libasan.so returns
the bare name unchanged (not a path), so LD_PRELOAD fails silently. Use
arch-specific Clang names: libclang_rt.asan-${CMAKE_SYSTEM_PROCESSOR}.so
and libclang_rt.ubsan_standalone-${CMAKE_SYSTEM_PROCESSOR}.so.
2. CMAKE_BUILD_RPATH: embed the sanitizer runtime directory in built libs so
the linker can resolve transitive dependencies without a warning.
3. Clang link flags: target_link_libraries(PRIVATE asan/ubsan) fails for
Clang (wrong library name). Use target_link_options(PRIVATE -shared-libasan
/ -shared-libsan) for Clang to force the shared runtime DSO as a dep.
4. detect_leaks=0 for listcomponents: without this, LSan scans the entire
DD4hep/ROOT/Geant4 heap at listcomponents exit and hangs the build
indefinitely. Switch to cmake -E env to set ASAN_OPTIONS=detect_leaks=0:
verify_asan_link_order=0 and LSAN_OPTIONS=detect_leaks=0 for this one
helper invocation only; full leak detection remains active for normal runs.
5. ASAN+UBSAN Clang preload conflict: Clang's ASAN runtime already integrates
UBSAN when both -fsanitize=address,undefined are active. Preloading
libclang_rt.ubsan_standalone alongside libclang_rt.asan causes overlapping
interceptors → CPU-spin hang. Skip the UBSAN preload entry when Clang+ASAN.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
GitHub Actions top-level env: is not propagated to reusable workflows
(uses: syntax). Epic has 5 reusable workflows that all source
install/bin/thisepic.sh but would not inherit ASAN_OPTIONS/LSAN_OPTIONS/
UBSAN_OPTIONS from the calling workflow, causing ASAN false-positives from
DD4hep/ROOT to fail geometry-check and conversion jobs.
Solution: generate the sanitizer env var exports directly into thisepic.sh
at CMake configure time when USE_ASAN=ON / USE_UBSAN=ON. Suppression files
are already installed to ${DETECTOR_PATH}/ (which thisepic.sh exports before
the new lines run), so paths are correct regardless of where the install
artifact is unpacked. No changes to any workflow files are required — all
reusable workflows already call setup: install/bin/thisepic.sh.
Uses CMake bracket strings ([[...]]) so ${DETECTOR_PATH} is preserved
literally (no CMake expansion) and configure_file @only passes it through
for the shell to expand at runtime.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sanitizer options are now embedded directly in the generated thisepic.sh by CMake when built with -DUSE_ASAN=ON / -DUSE_UBSAN=ON. All jobs that run the sanitizer-instrumented geometry (inline and reusable workflows alike) source install/bin/thisepic.sh and inherit the correct options automatically, with suppression paths resolved via $DETECTOR_PATH. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Refactor ASAN and UBSAN setup lines in the script.
Without sanitizers, DD4hep's built-in implementation is used unchanged. The custom version is only needed to LD_PRELOAD the sanitizer runtime so it initialises before any allocation occurs. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Dmitry Kalinkin <dmitry.kalinkin@gmail.com>
When built with USE_ASAN=ON or USE_UBSAN=ON, thisepic.sh now exports
LD_PRELOAD with the sanitizer runtime DSO path(s) baked in at cmake
configure time. This ensures the ASAN/UBSAN runtime is loaded before
the first allocation when a non-instrumented host binary (e.g. npsim
from a container) sources thisepic.sh and then runs a simulation.
The export uses the composable form:
export LD_PRELOAD=<lib1>:<lib2>${LD_PRELOAD:+:${LD_PRELOAD}}
so that a pre-existing LD_PRELOAD is preserved, and the line is absent
entirely when no sanitizer is enabled.
Implementation uses string(CONFIGURE @only) (CMake >=3.15) to expand
the cmake-computed PRELOAD_SANITIZER_LIB_JOINED path while leaving the
shell parameter expression ${LD_PRELOAD:+:...} untouched, mirroring
how ASAN_SETUP_LINES/UBSAN_SETUP_LINES use bracket strings to preserve
${DETECTOR_PATH} for shell-time expansion.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Suppress indirect leaks that all stem from DD4hep's global Detector singleton never calling destructors at process exit: - FieldMapB::Configure / create_field_map_b: field-map data vectors and the FieldMapB object registered in the Detector handle map - libDDCore.so / libDDCorePlugins.so: Constant, NamedObject, DetElement, LimitSet, BitFieldCoder, Segmentation etc. — structural process-lifetime registry entries in DD4hep - libCore.so: TStorage and TList allocations used by DD4hep at detector construction time npsim/OpenCASCADE leaks (npdet_to_step, libTKernel, libTKVCAF) belong in npsim's own suppression file and are not added here. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When thisepic.sh is sourced before a Python-based tool (ddsim, checkGeometry), LD_PRELOAD now injects the ASAN runtime before the Python interpreter starts. This exposes CPyCppyy and CPython allocations that LSAN reports as leaked at process exit: - libCPyCppyy.so: proxy type objects (CPPOverload, CPPType, CPPScope), scope dictionaries, and converter factory entries built at import time and cached in Python's global type registry for the interpreter lifetime - _PyObject_Malloc: CPython dicts/tuples/type objects created during CPyCppyy module initialisation - _M_mutate: std::string growth (truncated stacks, same import-time context) None of these involve epic geometry code. Python does not release module-level type objects at interpreter shutdown, so all reports are structural and expected. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
This needs more suppressions (DDG4 and others) |
Two issues from CI run 26365490190:
1. OpenCASCADE LSAN leaks in the geometry-to-STEP export step
(job 77608626180, epic_dirc_only.stp):
- TCollection_AsciiString::reallocate in libTKernel.so.7.9 (20 bytes):
OpenCASCADE's ASCII string grows its char buffer in a context that
outlives the STEP writer; singleton/static storage is never freed.
- TPrsStd_DriverTable::Get() in libTKVCAF.so.7.9 (8 bytes):
lazy-initialised XCAF driver-table singleton never destroyed at exit.
Fix: add leak:libTKernel.so and leak:libTKVCAF.so to lsan.supp.
2. alloc-dealloc-mismatch from gprofng in the profiling step
(job 77608626118):
/usr/bin/gprofng-collect-app calls realpath() (malloc) then frees the
result with operator delete — a system package bug. This became
visible because LD_PRELOAD (set by thisepic.sh) is inherited by every
child process, including the gprofng profiler wrapper.
The profiling step collects performance data, not sanitizer reports;
ASAN should not be active there.
Fix: prefix the gprofng invocation with 'env -u LD_PRELOAD' so the
profiling tool and its children run without the ASAN preload.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The checkOverlaps.py step runs a full Geant4 geometry initialisation which creates objects registered in Geant4's global stores: - libDDG4.so: Geant4Converter allocates G4VSolid, G4Material, G4LogicalVolume, G4OpticalSurface/SkinSurface, G4VisAttributes, and Geant4UIMessenger objects; all stored in Geant4 global registries and never destroyed at process exit. - libG4*.so (all Geant4 libraries): G4MaterialPropertiesTable, G4PhysicsFreeVector, G4UIcommand, G4UIcmdWithAString, G4TransportationLogger, G4LogicalSurface — global-store entries created during G4RunManager initialisation, never freed at exit. Also add leak:_M_assign alongside the existing leak:_M_mutate to suppress truncated-stack std::string allocation reports that appear in the Geant4/DDG4 and CPyCppyy initialisation contexts. No libepic.so frames appear in any of the remaining leaks. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The previous fix only covered 'gprofng collect app'; the 'gprofng display text' call in the same step also inherited LD_PRELOAD, causing ASAN to intercept libgprofng.so.0 and libbfd allocations and report them as leaks. Apply env -u LD_PRELOAD to both gprofng invocations. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Two new leak categories from CI run 26411732242: 1. libDDG4Plugins.so (job 77747790162): Geant4FieldTrackingSetup::execute registers a Geant4 field object in the global Geant4 field manager. libDDG4Plugins is not matched by the existing 'leak:libDDG4.so' pattern; widen to 'leak:libDDG4' to cover both libDDG4.so and libDDG4Plugins.so. 2. cc1plus (job 77747790236): ROOT's ACLiC compiles test_ACTS.cxx by spawning /usr/libexec/gcc/.../ cc1plus, which inherits LD_PRELOAD. GCC intentionally does not free compiler-internal memory at exit; suppress with 'leak:cc1plus'. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Two new leak categories from CI run 26412422920: 1. pybind11 Python extension modules (job 77749837495): scipy/_highspy uses pybind11::cpp_function::make_function_record to register C++ function bindings in Python's global internals at import time; these are never freed at interpreter shutdown. Same structural pattern as CPyCppyy proxy objects. Add 'leak:pybind11' to cover all pybind11-based extension modules. 2. x86_64-linux-gnu-g++-14 compiler driver (job 77749837548): ROOT's ACLiC spawns both the g++ driver and cc1plus backend; the existing 'leak:cc1plus' suppresses the backend but not the driver. Add 'leak:x86_64-linux-gnu-g++' to cover the driver binary. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Job 77753658220 (run 26413319067) exposed several more Python extension module global-state leaks with the same structural pattern as the already-suppressed CPyCppyy and pybind11 leaks: - boost_histogram/_core.cpython-313-*.so (pybind11-based) - numpy/_multiarray_umath.cpython-313-*.so (PyUFunc global registration) - cryptography/hazmat/bindings/_rust.abi3.so (PyO3/Rust) - Python interpreter itself: resize_compact (unicode interning) and _PyObject_Calloc (Python object allocator) Rather than suppress each extension individually, add broad patterns: leak:.cpython- covers all CPython extension .so files leak:.abi3.so covers all stable-ABI extension .so files leak:pyo3 covers PyO3 Rust extensions by symbol name leak:PyUFunc_FromFuncAndData covers NumPy ufunc registration leak:libpython covers Python interpreter itself leak:resize_compact / leak:_PyObject_Calloc Python allocators Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
TGeoToOCC::OCC_SimpleShape in npsim's libGeoCad.so leaks the Double_t[] mesh vertex buffer allocated for TGeoPgon shapes. This is a real bug fixed upstream in eic/npsim#63. Add a temporary suppression so the epic CI does not fail until the fix is released and picked up by the container image. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Two new external-library leak categories from CI run 26416997459: 1. OpenSSL libcrypto.so.3 (job 77764477323): global cryptographic state (PRNG pool, algorithm tables, thread error state) allocated via CRYPTO_malloc / CRYPTO_aligned_alloc / CRYPTO_realloc at first use. OpenSSL does not release these unless OPENSSL_cleanup() is called explicitly; triggered here by the Python cryptography module loading libcrypto as a side-effect at import time. 2. libbfd-2.44 (job 77764477429): BFD string table data allocated by _bfd_elf_strtab_save during ELF object inspection; called by gprofng when processing profiled binaries; not freed at exit. Both are external-library structural leaks with no libepic.so frames. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
test_ACTS.cxx is compiled by ROOT's ACLiC into test_ACTS_cxx.so, loaded with dlopen, executed, then dlclose'd before process exit. When LSAN scans the heap at exit the library is already unmapped, so any heap addresses inside it cannot be resolved to a module name and appear as <unknown module>. GCC's libsanitizer stores this literal string in the module_name field, so 'leak:<unknown module>' is a valid suppression. The leaks total ~152 bytes (ACTS/ROOT global state registered inside the compiled .so) and contain no libepic.so frames. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
collect2 is GCC's linker driver, invoked by g++ to run ld as part of the ACLiC compilation pipeline. Like cc1plus and the g++ frontend, collect2 intentionally does not free its internal data structures at exit. Add 'leak:collect2' alongside the existing GCC tool suppressions. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Briefly, what does this PR introduce?
This PR adds ASAN and UBSAN support to the ePIC geometry. It took some tweaking similar to https://gitlab.cern.ch/lhcb/Detector/-/merge_requests/336 because DD4hep's
listcomponentsrequires preload of the required ASAN and UBSAN libraries, but these cannot be preloaded all over the place since it affects compiler feature detection. The approach here follows the approach taken in npsim as well (the other place wherelistcomponentswreaks havoc).What kind of change does this PR introduce?
Please check if this PR fulfills the following:
Does this PR introduce breaking changes? What changes might users need to make to their code?
No.
Does this PR change default behavior?
No.