Skip to content

feat: implement ASAN and UBSAN#865

Open
wdconinc wants to merge 26 commits into
mainfrom
asan-ubsan
Open

feat: implement ASAN and UBSAN#865
wdconinc wants to merge 26 commits into
mainfrom
asan-ubsan

Conversation

@wdconinc
Copy link
Copy Markdown
Contributor

@wdconinc wdconinc commented May 12, 2025

Briefly, what does this PR introduce?

This PR adds ASAN and UBSAN support to the ePIC geometry. It took some tweaking similar to https://gitlab.cern.ch/lhcb/Detector/-/merge_requests/336 because DD4hep's listcomponents requires preload of the required ASAN and UBSAN libraries, but these cannot be preloaded all over the place since it affects compiler feature detection. The approach here follows the approach taken in npsim as well (the other place where listcomponents wreaks havoc).

What kind of change does this PR introduce?

  • Bug fix (issue #__)
  • New feature (issue: memory leaks in ePIC geometry)
  • Documentation update
  • Other: __

Please check if this PR fulfills the following:

  • Tests for the changes have been added
  • Documentation has been added / updated
  • Changes have been communicated to collaborators

Does this PR introduce breaking changes? What changes might users need to make to their code?

No.

Does this PR change default behavior?

No.

@github-actions github-actions Bot added the topic: infrastructure Regarding build system, CI, CD label May 12, 2025
Comment thread .github/workflows/linux-eic-shell.yml Outdated
Copilot AI review requested due to automatic review settings May 17, 2026 22:55
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds AddressSanitizer and UndefinedBehaviorSanitizer support to the ePIC DD4hep geometry build and enables sanitizer builds in the main Linux eic-shell workflow.

Changes:

  • Adds CMake options and build/link/preload handling for ASAN and UBSAN.
  • Enables sanitizer/debug builds in the CI build matrix.
  • Adds sanitizer suppression files and workflow sanitizer environment variables.

Reviewed changes

Copilot reviewed 2 out of 5 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
CMakeLists.txt Adds sanitizer options, runtime preloading for DD4hep component generation, sanitizer compile/link flags, and suppression-file installation.
.github/workflows/linux-eic-shell.yml Adds sanitizer runtime options and enables ASAN/UBSAN Debug builds in CI.
.github/asan.supp Adds ASAN suppression file placeholder.
.github/lsan.supp Adds LSAN suppression file placeholder.
.github/ubsan.supp Adds UBSAN suppression file placeholder.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread CMakeLists.txt Outdated
Comment thread CMakeLists.txt Outdated
Comment thread .github/workflows/linux-eic-shell.yml Outdated
Comment thread CMakeLists.txt Outdated
Comment thread CMakeLists.txt Outdated
Comment thread CMakeLists.txt Outdated
Comment thread CMakeLists.txt Outdated
@wdconinc wdconinc enabled auto-merge (squash) May 18, 2026 00:27
@wdconinc wdconinc requested a review from veprbl May 18, 2026 00:33
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 18, 2026

veprbl
veprbl previously approved these changes May 18, 2026
Copy link
Copy Markdown
Member

@veprbl veprbl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Redefining dd4hep_generate_rootmap is not the best, I would gate it away for non-asan-ubsan uses.

@wdconinc
Copy link
Copy Markdown
Contributor Author

LGTM. Redefining dd4hep_generate_rootmap is not the best, I would gate it away for non-asan-ubsan uses.

Agreed, the redefining is annoying. Gating is a good idea. Wish it could be upstreamed somehow.

Comment thread CMakeLists.txt Outdated
Comment thread CMakeLists.txt Outdated
Comment thread CMakeLists.txt Outdated
Comment thread CMakeLists.txt Outdated
Comment thread .github/workflows/linux-eic-shell.yml
@wdconinc
Copy link
Copy Markdown
Contributor Author

Filed AIDASoft/DD4hep#1632 to allow for avoiding the redefinition in the future.

veprbl
veprbl previously approved these changes May 18, 2026
@wdconinc
Copy link
Copy Markdown
Contributor Author

This may appear to indicate that all is hunky dory, no leaks or UB, but https://github.com/eic/npsim/actions/runs/26313766460/job/77468707988#step:6:182 tells another story. We are loading asan/ubsan too late if we wait until the geometry plugin is loaded, and it makes the whole thing rather ineffective.

wdconinc and others added 9 commits May 23, 2026 19:07
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Transfer fixes learned from npsim Clang ASAN/UBSAN debugging:

1. Clang runtime library names: clang++ -print-file-name=libasan.so returns
   the bare name unchanged (not a path), so LD_PRELOAD fails silently. Use
   arch-specific Clang names: libclang_rt.asan-${CMAKE_SYSTEM_PROCESSOR}.so
   and libclang_rt.ubsan_standalone-${CMAKE_SYSTEM_PROCESSOR}.so.

2. CMAKE_BUILD_RPATH: embed the sanitizer runtime directory in built libs so
   the linker can resolve transitive dependencies without a warning.

3. Clang link flags: target_link_libraries(PRIVATE asan/ubsan) fails for
   Clang (wrong library name). Use target_link_options(PRIVATE -shared-libasan
   / -shared-libsan) for Clang to force the shared runtime DSO as a dep.

4. detect_leaks=0 for listcomponents: without this, LSan scans the entire
   DD4hep/ROOT/Geant4 heap at listcomponents exit and hangs the build
   indefinitely. Switch to cmake -E env to set ASAN_OPTIONS=detect_leaks=0:
   verify_asan_link_order=0 and LSAN_OPTIONS=detect_leaks=0 for this one
   helper invocation only; full leak detection remains active for normal runs.

5. ASAN+UBSAN Clang preload conflict: Clang's ASAN runtime already integrates
   UBSAN when both -fsanitize=address,undefined are active. Preloading
   libclang_rt.ubsan_standalone alongside libclang_rt.asan causes overlapping
   interceptors → CPU-spin hang. Skip the UBSAN preload entry when Clang+ASAN.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
GitHub Actions top-level env: is not propagated to reusable workflows
(uses: syntax). Epic has 5 reusable workflows that all source
install/bin/thisepic.sh but would not inherit ASAN_OPTIONS/LSAN_OPTIONS/
UBSAN_OPTIONS from the calling workflow, causing ASAN false-positives from
DD4hep/ROOT to fail geometry-check and conversion jobs.

Solution: generate the sanitizer env var exports directly into thisepic.sh
at CMake configure time when USE_ASAN=ON / USE_UBSAN=ON. Suppression files
are already installed to ${DETECTOR_PATH}/ (which thisepic.sh exports before
the new lines run), so paths are correct regardless of where the install
artifact is unpacked. No changes to any workflow files are required — all
reusable workflows already call setup: install/bin/thisepic.sh.

Uses CMake bracket strings ([[...]]) so ${DETECTOR_PATH} is preserved
literally (no CMake expansion) and configure_file @only passes it through
for the shell to expand at runtime.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
wdconinc and others added 5 commits May 23, 2026 19:07
Sanitizer options are now embedded directly in the generated thisepic.sh
by CMake when built with -DUSE_ASAN=ON / -DUSE_UBSAN=ON. All jobs that
run the sanitizer-instrumented geometry (inline and reusable workflows
alike) source install/bin/thisepic.sh and inherit the correct options
automatically, with suppression paths resolved via $DETECTOR_PATH.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Refactor ASAN and UBSAN setup lines in the script.
Without sanitizers, DD4hep's built-in implementation is used unchanged.
The custom version is only needed to LD_PRELOAD the sanitizer runtime
so it initialises before any allocation occurs.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Dmitry Kalinkin <dmitry.kalinkin@gmail.com>
When built with USE_ASAN=ON or USE_UBSAN=ON, thisepic.sh now exports
LD_PRELOAD with the sanitizer runtime DSO path(s) baked in at cmake
configure time.  This ensures the ASAN/UBSAN runtime is loaded before
the first allocation when a non-instrumented host binary (e.g. npsim
from a container) sources thisepic.sh and then runs a simulation.

The export uses the composable form:
  export LD_PRELOAD=<lib1>:<lib2>${LD_PRELOAD:+:${LD_PRELOAD}}
so that a pre-existing LD_PRELOAD is preserved, and the line is absent
entirely when no sanitizer is enabled.

Implementation uses string(CONFIGURE @only) (CMake >=3.15) to expand
the cmake-computed PRELOAD_SANITIZER_LIB_JOINED path while leaving the
shell parameter expression ${LD_PRELOAD:+:...} untouched, mirroring
how ASAN_SETUP_LINES/UBSAN_SETUP_LINES use bracket strings to preserve
${DETECTOR_PATH} for shell-time expansion.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
wdconinc and others added 2 commits May 24, 2026 09:29
Suppress indirect leaks that all stem from DD4hep's global Detector
singleton never calling destructors at process exit:

- FieldMapB::Configure / create_field_map_b: field-map data vectors
  and the FieldMapB object registered in the Detector handle map
- libDDCore.so / libDDCorePlugins.so: Constant, NamedObject, DetElement,
  LimitSet, BitFieldCoder, Segmentation etc. — structural process-lifetime
  registry entries in DD4hep
- libCore.so: TStorage and TList allocations used by DD4hep at
  detector construction time

npsim/OpenCASCADE leaks (npdet_to_step, libTKernel, libTKVCAF) belong
in npsim's own suppression file and are not added here.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When thisepic.sh is sourced before a Python-based tool (ddsim,
checkGeometry), LD_PRELOAD now injects the ASAN runtime before the
Python interpreter starts. This exposes CPyCppyy and CPython allocations
that LSAN reports as leaked at process exit:

 - libCPyCppyy.so: proxy type objects (CPPOverload, CPPType, CPPScope),
   scope dictionaries, and converter factory entries built at import time
   and cached in Python's global type registry for the interpreter lifetime
 - _PyObject_Malloc: CPython dicts/tuples/type objects created during
   CPyCppyy module initialisation
 - _M_mutate: std::string growth (truncated stacks, same import-time
   context)

None of these involve epic geometry code. Python does not release
module-level type objects at interpreter shutdown, so all reports are
structural and expected.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@veprbl
Copy link
Copy Markdown
Member

veprbl commented May 24, 2026

This needs more suppressions (DDG4 and others)

wdconinc and others added 9 commits May 25, 2026 08:47
Two issues from CI run 26365490190:

1. OpenCASCADE LSAN leaks in the geometry-to-STEP export step
   (job 77608626180, epic_dirc_only.stp):
   - TCollection_AsciiString::reallocate in libTKernel.so.7.9 (20 bytes):
     OpenCASCADE's ASCII string grows its char buffer in a context that
     outlives the STEP writer; singleton/static storage is never freed.
   - TPrsStd_DriverTable::Get() in libTKVCAF.so.7.9 (8 bytes):
     lazy-initialised XCAF driver-table singleton never destroyed at exit.
   Fix: add leak:libTKernel.so and leak:libTKVCAF.so to lsan.supp.

2. alloc-dealloc-mismatch from gprofng in the profiling step
   (job 77608626118):
   /usr/bin/gprofng-collect-app calls realpath() (malloc) then frees the
   result with operator delete — a system package bug.  This became
   visible because LD_PRELOAD (set by thisepic.sh) is inherited by every
   child process, including the gprofng profiler wrapper.
   The profiling step collects performance data, not sanitizer reports;
   ASAN should not be active there.
   Fix: prefix the gprofng invocation with 'env -u LD_PRELOAD' so the
   profiling tool and its children run without the ASAN preload.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The checkOverlaps.py step runs a full Geant4 geometry initialisation
which creates objects registered in Geant4's global stores:

 - libDDG4.so: Geant4Converter allocates G4VSolid, G4Material,
   G4LogicalVolume, G4OpticalSurface/SkinSurface, G4VisAttributes, and
   Geant4UIMessenger objects; all stored in Geant4 global registries
   and never destroyed at process exit.

 - libG4*.so (all Geant4 libraries): G4MaterialPropertiesTable,
   G4PhysicsFreeVector, G4UIcommand, G4UIcmdWithAString,
   G4TransportationLogger, G4LogicalSurface — global-store entries
   created during G4RunManager initialisation, never freed at exit.

Also add leak:_M_assign alongside the existing leak:_M_mutate to
suppress truncated-stack std::string allocation reports that appear
in the Geant4/DDG4 and CPyCppyy initialisation contexts.

No libepic.so frames appear in any of the remaining leaks.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The previous fix only covered 'gprofng collect app'; the 'gprofng
display text' call in the same step also inherited LD_PRELOAD, causing
ASAN to intercept libgprofng.so.0 and libbfd allocations and report
them as leaks.  Apply env -u LD_PRELOAD to both gprofng invocations.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Two new leak categories from CI run 26411732242:

1. libDDG4Plugins.so (job 77747790162):
   Geant4FieldTrackingSetup::execute registers a Geant4 field object in
   the global Geant4 field manager.  libDDG4Plugins is not matched by
   the existing 'leak:libDDG4.so' pattern; widen to 'leak:libDDG4' to
   cover both libDDG4.so and libDDG4Plugins.so.

2. cc1plus (job 77747790236):
   ROOT's ACLiC compiles test_ACTS.cxx by spawning /usr/libexec/gcc/.../
   cc1plus, which inherits LD_PRELOAD.  GCC intentionally does not free
   compiler-internal memory at exit; suppress with 'leak:cc1plus'.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Two new leak categories from CI run 26412422920:

1. pybind11 Python extension modules (job 77749837495):
   scipy/_highspy uses pybind11::cpp_function::make_function_record to
   register C++ function bindings in Python's global internals at import
   time; these are never freed at interpreter shutdown.  Same structural
   pattern as CPyCppyy proxy objects.  Add 'leak:pybind11' to cover all
   pybind11-based extension modules.

2. x86_64-linux-gnu-g++-14 compiler driver (job 77749837548):
   ROOT's ACLiC spawns both the g++ driver and cc1plus backend; the
   existing 'leak:cc1plus' suppresses the backend but not the driver.
   Add 'leak:x86_64-linux-gnu-g++' to cover the driver binary.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Job 77753658220 (run 26413319067) exposed several more Python extension
module global-state leaks with the same structural pattern as the
already-suppressed CPyCppyy and pybind11 leaks:

- boost_histogram/_core.cpython-313-*.so (pybind11-based)
- numpy/_multiarray_umath.cpython-313-*.so (PyUFunc global registration)
- cryptography/hazmat/bindings/_rust.abi3.so (PyO3/Rust)
- Python interpreter itself: resize_compact (unicode interning) and
  _PyObject_Calloc (Python object allocator)

Rather than suppress each extension individually, add broad patterns:
  leak:.cpython-   covers all CPython extension .so files
  leak:.abi3.so    covers all stable-ABI extension .so files
  leak:pyo3        covers PyO3 Rust extensions by symbol name
  leak:PyUFunc_FromFuncAndData  covers NumPy ufunc registration
  leak:libpython   covers Python interpreter itself
  leak:resize_compact / leak:_PyObject_Calloc  Python allocators

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
TGeoToOCC::OCC_SimpleShape in npsim's libGeoCad.so leaks the Double_t[]
mesh vertex buffer allocated for TGeoPgon shapes.  This is a real bug
fixed upstream in eic/npsim#63.  Add a temporary
suppression so the epic CI does not fail until the fix is released and
picked up by the container image.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Two new external-library leak categories from CI run 26416997459:

1. OpenSSL libcrypto.so.3 (job 77764477323): global cryptographic state
   (PRNG pool, algorithm tables, thread error state) allocated via
   CRYPTO_malloc / CRYPTO_aligned_alloc / CRYPTO_realloc at first use.
   OpenSSL does not release these unless OPENSSL_cleanup() is called
   explicitly; triggered here by the Python cryptography module loading
   libcrypto as a side-effect at import time.

2. libbfd-2.44 (job 77764477429): BFD string table data allocated by
   _bfd_elf_strtab_save during ELF object inspection; called by gprofng
   when processing profiled binaries; not freed at exit.

Both are external-library structural leaks with no libepic.so frames.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
test_ACTS.cxx is compiled by ROOT's ACLiC into test_ACTS_cxx.so, loaded
with dlopen, executed, then dlclose'd before process exit.  When LSAN
scans the heap at exit the library is already unmapped, so any heap
addresses inside it cannot be resolved to a module name and appear as
<unknown module>.  GCC's libsanitizer stores this literal string in the
module_name field, so 'leak:<unknown module>' is a valid suppression.

The leaks total ~152 bytes (ACTS/ROOT global state registered inside the
compiled .so) and contain no libepic.so frames.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
collect2 is GCC's linker driver, invoked by g++ to run ld as part of
the ACLiC compilation pipeline.  Like cc1plus and the g++ frontend,
collect2 intentionally does not free its internal data structures at
exit.  Add 'leak:collect2' alongside the existing GCC tool suppressions.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

topic: infrastructure Regarding build system, CI, CD

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants