Skip to content

fix(qudarap): floor WLS absorption step at float32 precision#316

Open
ggalgoczi wants to merge 9 commits into
mainfrom
precision-fix-wls
Open

fix(qudarap): floor WLS absorption step at float32 precision#316
ggalgoczi wants to merge 9 commits into
mainfrom
precision-fix-wls

Conversation

@ggalgoczi
Copy link
Copy Markdown
Contributor

When wls_absorption_distance is below the float32 ulp at world coords (e.g. 100 nm absorption depth at 30 m world coord, ulp ~3.6 µm), the update p.pos += dist*mom rounds to a no-op and the photon stays on the boundary, causing BVH ambiguity on the next trace.

Floor the step at 4 * ulp(p.pos) along the entering direction so the absorption position is unambiguously inside the absorbing material. For typical world coordinates the floor is ~14 µm at 30 m or 0.5 µm at 1 m - negligible vs typical WLS layer thickness, but resolves the BVH ambiguity that loses ~12% of photons in single-stage WLS at DUNE scale and ~1% in two-stage WLS at the same scale.

Per-photon overhead: 3 fmaxf, 1 fabsf x3, 1 multiply per WLS absorption. Measured kernel-time impact: +6.2% on a 24M-photon DUNE-module event.

Validation on tests/geom/nested_dune_module.gdml (not yet in repo but will be oushed) seeds 42-45 single 1 GeV e- events:

  • GPU/G4 hit ratio: 0.988 -> 1.0037
  • arrival-time chi^2/ndf (0-200 ns @ 2 ns): 6.67 -> 1.06
  • wavelength chi^2/ndf (380-540 nm @ 2 nm): 3.99 -> 1.80

When wls_absorption_distance is below the float32 ulp at world coords
(e.g. 100 nm absorption depth at 30 m world coord, ulp ~3.6 µm), the
update p.pos += dist*mom rounds to a no-op and the photon stays on the
boundary, causing BVH ambiguity on the next trace.

Floor the step at 4 * ulp(p.pos) along the entering direction so the
absorption position is unambiguously inside the absorbing material.
For typical world coordinates the floor is ~14 µm at 30 m or 0.5 µm at
1 m - negligible vs typical WLS layer thickness, but resolves the BVH
ambiguity that loses ~12% of photons in single-stage WLS at DUNE scale
and ~1% in two-stage WLS at the same scale.

Per-photon overhead: 3 fmaxf, 1 fabsf x3, 1 multiply per WLS absorption.
Measured kernel-time impact: +6.2% on a 24M-photon DUNE-module event.

Validation on tests/geom/nested_dune_module.gdml seeds 42-45 single 1
GeV e- events:
- GPU/G4 hit ratio: 0.988 -> 1.0037
- arrival-time chi^2/ndf (0-200 ns @ 2 ns): 6.67 -> 1.06
- wavelength chi^2/ndf (380-540 nm @ 2 nm): 3.99 -> 1.80
@ggalgoczi ggalgoczi requested a review from plexoos May 1, 2026 12:26
@ggalgoczi ggalgoczi self-assigned this May 1, 2026
@ggalgoczi ggalgoczi added the bug Something isn't working label May 1, 2026
Copy link
Copy Markdown
Member

@plexoos plexoos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. The reported effect is pretty substantial, both in terms of physics impact (~12% photon loss) and performance (~6% kernel-time increase), so it would be helpful to make the validation path reproducible for reviewers.

  • Could you please provide the test geometry tests/geom/nested_dune_module.gdml, or alternatively a smaller reproducer that triggers the same sub-ULP WLS case?

  • I also see a potential correctness issue with the current subprecision floor implementation: if the remaining distance to the exit surface is smaller than the imposed floor, p.pos += eff_wls_distance * p.mom can move the photon past the boundary, which would then re-emit it from the wrong material.

ggalgoczi added 2 commits May 9, 2026 15:38
Two-stage DUNE module (pTP -> bluewls -> SiPM): 60 x 13.5 x 13 m^3
inner LAr inside a 200 um pTP shell, a 6 mm bluewls acrylic shell,
and a 1 mm outer LAr detector shell with SiPM skin. Full RINDEX,
GROUPVEL, WLSCOMPONENT, WLSABSLENGTH matrices included for pTP and
bluewlsacrylic; LAr scintillation uses the narrow-band Babicz
emission spectrum.
The subprecision floor at min_step = 4 * ulp(p.pos) can exceed
distance_to_boundary in geometries where the slab thickness along
the trajectory is below 4 ulps of world coords (e.g. near-grazing
entry into a 200 um pTP shell at decameter world coords:
distance_to_boundary ~ 200 um / cos(theta) drops below 14 um at
theta > 86 deg). Without a ceiling the floor pushes the photon
past the exit boundary, re-creating the BVH ambiguity at the far
face that the floor was added to avoid at the near face.

Clamping at distance_to_boundary itself parks the photon exactly
on the exit face, same ambiguity. Clamp at distance_to_boundary
* 0.5f instead so the absorption point stays clear of both faces.

Cost: +1 fminf + 1 *0.5f (free exponent decrement on FP32) per
WLS absorption.
@ggalgoczi
Copy link
Copy Markdown
Contributor Author

ggalgoczi commented May 9, 2026

Thanks for the PR. The reported effect is pretty substantial, both in terms of physics impact (~12% photon loss) and performance (~6% kernel-time increase), so it would be helpful to make the validation path reproducible for reviewers.

* Could you please provide the test geometry tests/geom/nested_dune_module.gdml, or alternatively a smaller reproducer that triggers the same sub-ULP WLS case?

* I also see a potential correctness issue with the current subprecision floor implementation: if the remaining distance to the exit surface is smaller than the imposed floor, `p.pos += eff_wls_distance * p.mom` can move the photon past the boundary, which would then re-emit it from the wrong material.

Pushed two follow-ups: tests/geom/nested_dune_module.gdml, the actual two-stage DUNE module (60×13.5×13 m³ LAr, 200 µm pTP, 6 mm bluewls, SiPM skin) used for the validation, and a half-thickness ceiling on the WLS step at qudarap/qsim.h:793 to address the boundary overshoot. The clamp caps eff_wls_distance at distance_to_boundary * 0.5f rather than at distance_to_boundary itself, since clamping right on the exit face would re-create the same BVH ambiguity there. Re-ran 4-seed validation on this geometry with 2.5 GeV e⁻ and photon-stacking off: floor reduces GPU hits by 1.79 % vs no-floor (deterministic per seed) and wall-time delta is at the noise floor (−0.08 % mean, ±0.5 % per seed). On the previously-tested thinner-pTP variant the half-thickness ceiling added another sub-0.1 % hit shift on top of the floor, runtime-neutral; expect similar here.

Reproducer commands below for both no-floor and floor-on cases:

Default primary is 0.5 MeV e- (src/GPURaytrace.h:306). Sufficient to see the floor's effect direction. For the 2.5 GeV numbers from the PR description, edit that line to 2.5 * GeV and rebuild.

OPTICKS_MAX_SLOT=M1 OPTICKS_MAX_BOUNCE=10000 ./GPURaytrace -g $PWD/tests/geom/nested_dune_module.gdml -m run.mac

@plexoos
Copy link
Copy Markdown
Member

plexoos commented May 11, 2026

Thank you for providing the reproducer. How long is it supposed to run? The exact command you provided did not work for me but here is what I am using from the repo directory:

time OPTICKS_MAX_SLOT=M1 OPTICKS_MAX_BOUNCE=10000 GPURaytrace \
  -g tests/geom/nested_dune_module.gdml \
  -m tests/run.mac

@plexoos
Copy link
Copy Markdown
Member

plexoos commented May 11, 2026

Please confirm the reproducer command. It is taking too long for me or gets stuck:

...
## Run 0 starts.

========================================================================================
--> G4TaskRunManager::CreateAndStartWorkers() --> Creating 1 tasks with 1 events/task...
========================================================================================

G4WT26 > ### Run 0 starts on worker thread 26.

@ggalgoczi
Copy link
Copy Markdown
Contributor Author

Please try the following lines in the mac file:

/process/optical/scintillation/setStackPhotons false
/process/optical/cerenkov/setStackPhotons false

@plexoos
Copy link
Copy Markdown
Member

plexoos commented May 15, 2026

I added the lines to the mac file, but my test job crashed with std::bad_alloc:

root@fe4c133e0d2c:/workspaces/simphony# time OPTICKS_MAX_SLOT=M1 OPTICKS_MAX_BOUNCE=10000 GPURaytrace   -g tests/geom/nested_dune_module.gdml   -m tests/run.mac &> log &
root@fe4c133e0d2c:/workspaces/simphony# tail log
2026-05-15 16:27:35.112 INFO  [20753] [QEvt::gatherPhoton@664] [ evt.num_photon 998887 p.sstr (998887, 4, 4, ) evt.photon 0x7fa480000000
//qcerenkov::wavelength_sampled_bndtex idx 51530589 sampledRI   1.552 cosTheta   0.960 sin2Theta   0.078 wavelength 112.137 count 52 matline 0 
2026-05-15 16:27:38.733 INFO  [20753] [QSim::simulate@554]  eventID 0 xxl YES i   51 dt    0.285537 slice   51 : sslice {   55383,   56511,  50946310,    999158}  0.999158
2026-05-15 16:27:38.829 INFO  [20753] [QEvt::gatherPhoton@664] [ evt.num_photon 999158 p.sstr (999158, 4, 4, ) evt.photon 0x7fa480000000
2026-05-15 16:27:42.453 INFO  [20753] [QSim::simulate@554]  eventID 0 xxl YES i   52 dt    0.309441 slice   52 : sslice {   56511,   57649,  51945468,    999791}  0.999791
2026-05-15 16:27:42.548 INFO  [20753] [QEvt::gatherPhoton@664] [ evt.num_photon 999791 p.sstr (999791, 4, 4, ) evt.photon 0x7fa480000000
2026-05-15 16:27:46.243 INFO  [20753] [QSim::simulate@554]  eventID 0 xxl YES i   53 dt    0.295765 slice   53 : sslice {   57649,   58635,  52945259,    999925}  0.999925
2026-05-15 16:27:46.349 INFO  [20753] [QEvt::gatherPhoton@664] [ evt.num_photon 999925 p.sstr (999925, 4, 4, ) evt.photon 0x7fa480000000
terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc

@ggalgoczi
Copy link
Copy Markdown
Contributor Author

I added the lines to the mac file, but my test job crashed with std::bad_alloc:

root@fe4c133e0d2c:/workspaces/simphony# time OPTICKS_MAX_SLOT=M1 OPTICKS_MAX_BOUNCE=10000 GPURaytrace   -g tests/geom/nested_dune_module.gdml   -m tests/run.mac &> log &
root@fe4c133e0d2c:/workspaces/simphony# tail log
2026-05-15 16:27:35.112 INFO  [20753] [QEvt::gatherPhoton@664] [ evt.num_photon 998887 p.sstr (998887, 4, 4, ) evt.photon 0x7fa480000000
//qcerenkov::wavelength_sampled_bndtex idx 51530589 sampledRI   1.552 cosTheta   0.960 sin2Theta   0.078 wavelength 112.137 count 52 matline 0 
2026-05-15 16:27:38.733 INFO  [20753] [QSim::simulate@554]  eventID 0 xxl YES i   51 dt    0.285537 slice   51 : sslice {   55383,   56511,  50946310,    999158}  0.999158
2026-05-15 16:27:38.829 INFO  [20753] [QEvt::gatherPhoton@664] [ evt.num_photon 999158 p.sstr (999158, 4, 4, ) evt.photon 0x7fa480000000
2026-05-15 16:27:42.453 INFO  [20753] [QSim::simulate@554]  eventID 0 xxl YES i   52 dt    0.309441 slice   52 : sslice {   56511,   57649,  51945468,    999791}  0.999791
2026-05-15 16:27:42.548 INFO  [20753] [QEvt::gatherPhoton@664] [ evt.num_photon 999791 p.sstr (999791, 4, 4, ) evt.photon 0x7fa480000000
2026-05-15 16:27:46.243 INFO  [20753] [QSim::simulate@554]  eventID 0 xxl YES i   53 dt    0.295765 slice   53 : sslice {   57649,   58635,  52945259,    999925}  0.999925
2026-05-15 16:27:46.349 INFO  [20753] [QEvt::gatherPhoton@664] [ evt.num_photon 999925 p.sstr (999925, 4, 4, ) evt.photon 0x7fa480000000
terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc

You might have OPTICKS_EVENT_MODE set for Debug or similar that tries to allocate way too much RAM. Could you try using:
OPTICKS_EVENT_MODE=Hit OPTICKS_MAX_SLOT=M1 OPTICKS_MAX_BOUNCE=10000 GPURaytrace -g tests/geom/nested_dune_module.gdml -m tests/run.mac

@plexoos
Copy link
Copy Markdown
Member

plexoos commented May 15, 2026

You might have OPTICKS_EVENT_MODE set for Debug or similar that tries to allocate way too much RAM

It is possible. I was just following your instructions to reproduce the bug and confirm the fix. For exact reproducibility, it would be helpful to provide a config file known to work.

@plexoos
Copy link
Copy Markdown
Member

plexoos commented May 15, 2026

Still fails for me:

time OPTICKS_EVENT_MODE=Hit OPTICKS_MAX_SLOT=M1 OPTICKS_MAX_BOUNCE=10000 GPURaytrace   -g tests/geom/nested_dune_module.gdml   -m tests/run.mac
2026-05-15 22:55:01.482 INFO  [70634] [QEvt::gatherPhoton@664] [ evt.num_photon 999351 p.sstr (999351, 4, 4, ) evt.photon 0x7fdf60000000
2026-05-15 22:55:05.369 INFO  [70634] [QSim::simulate@554]  eventID 0 xxl YES i   51 dt    0.307375 slice   51 : sslice {   55173,   56235,  50953701,    999068}  0.999068
2026-05-15 22:55:05.487 INFO  [70634] [QEvt::gatherPhoton@664] [ evt.num_photon 999068 p.sstr (999068, 4, 4, ) evt.photon 0x7fdf60000000
2026-05-15 22:55:08.861 INFO  [70634] [QSim::simulate@554]  eventID 0 xxl YES i   52 dt    0.310035 slice   52 : sslice {   56235,   57346,  51952769,    999855}  0.999855
2026-05-15 22:55:08.954 INFO  [70634] [QEvt::gatherPhoton@664] [ evt.num_photon 999855 p.sstr (999855, 4, 4, ) evt.photon 0x7fdf60000000
terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
Aborted (core dumped)

real    3m25.340s
user    2m34.616s
sys     0m38.132s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants