fix(qudarap): floor WLS absorption step at float32 precision#316
fix(qudarap): floor WLS absorption step at float32 precision#316ggalgoczi wants to merge 9 commits into
Conversation
When wls_absorption_distance is below the float32 ulp at world coords (e.g. 100 nm absorption depth at 30 m world coord, ulp ~3.6 µm), the update p.pos += dist*mom rounds to a no-op and the photon stays on the boundary, causing BVH ambiguity on the next trace. Floor the step at 4 * ulp(p.pos) along the entering direction so the absorption position is unambiguously inside the absorbing material. For typical world coordinates the floor is ~14 µm at 30 m or 0.5 µm at 1 m - negligible vs typical WLS layer thickness, but resolves the BVH ambiguity that loses ~12% of photons in single-stage WLS at DUNE scale and ~1% in two-stage WLS at the same scale. Per-photon overhead: 3 fmaxf, 1 fabsf x3, 1 multiply per WLS absorption. Measured kernel-time impact: +6.2% on a 24M-photon DUNE-module event. Validation on tests/geom/nested_dune_module.gdml seeds 42-45 single 1 GeV e- events: - GPU/G4 hit ratio: 0.988 -> 1.0037 - arrival-time chi^2/ndf (0-200 ns @ 2 ns): 6.67 -> 1.06 - wavelength chi^2/ndf (380-540 nm @ 2 nm): 3.99 -> 1.80
plexoos
left a comment
There was a problem hiding this comment.
Thanks for the PR. The reported effect is pretty substantial, both in terms of physics impact (~12% photon loss) and performance (~6% kernel-time increase), so it would be helpful to make the validation path reproducible for reviewers.
-
Could you please provide the test geometry tests/geom/nested_dune_module.gdml, or alternatively a smaller reproducer that triggers the same sub-ULP WLS case?
-
I also see a potential correctness issue with the current subprecision floor implementation: if the remaining distance to the exit surface is smaller than the imposed floor,
p.pos += eff_wls_distance * p.momcan move the photon past the boundary, which would then re-emit it from the wrong material.
Two-stage DUNE module (pTP -> bluewls -> SiPM): 60 x 13.5 x 13 m^3 inner LAr inside a 200 um pTP shell, a 6 mm bluewls acrylic shell, and a 1 mm outer LAr detector shell with SiPM skin. Full RINDEX, GROUPVEL, WLSCOMPONENT, WLSABSLENGTH matrices included for pTP and bluewlsacrylic; LAr scintillation uses the narrow-band Babicz emission spectrum.
The subprecision floor at min_step = 4 * ulp(p.pos) can exceed distance_to_boundary in geometries where the slab thickness along the trajectory is below 4 ulps of world coords (e.g. near-grazing entry into a 200 um pTP shell at decameter world coords: distance_to_boundary ~ 200 um / cos(theta) drops below 14 um at theta > 86 deg). Without a ceiling the floor pushes the photon past the exit boundary, re-creating the BVH ambiguity at the far face that the floor was added to avoid at the near face. Clamping at distance_to_boundary itself parks the photon exactly on the exit face, same ambiguity. Clamp at distance_to_boundary * 0.5f instead so the absorption point stays clear of both faces. Cost: +1 fminf + 1 *0.5f (free exponent decrement on FP32) per WLS absorption.
Pushed two follow-ups: Reproducer commands below for both no-floor and floor-on cases: Default primary is 0.5 MeV e- (src/GPURaytrace.h:306). Sufficient to see the floor's effect direction. For the 2.5 GeV numbers from the PR description, edit that line to OPTICKS_MAX_SLOT=M1 OPTICKS_MAX_BOUNCE=10000 ./GPURaytrace -g $PWD/tests/geom/nested_dune_module.gdml -m run.mac |
|
Thank you for providing the reproducer. How long is it supposed to run? The exact command you provided did not work for me but here is what I am using from the repo directory: |
|
Please confirm the reproducer command. It is taking too long for me or gets stuck: |
|
Please try the following lines in the mac file: |
|
I added the lines to the mac file, but my test job crashed with std::bad_alloc: |
You might have OPTICKS_EVENT_MODE set for Debug or similar that tries to allocate way too much RAM. Could you try using: |
It is possible. I was just following your instructions to reproduce the bug and confirm the fix. For exact reproducibility, it would be helpful to provide a config file known to work. |
|
Still fails for me: |
When wls_absorption_distance is below the float32 ulp at world coords (e.g. 100 nm absorption depth at 30 m world coord, ulp ~3.6 µm), the update p.pos += dist*mom rounds to a no-op and the photon stays on the boundary, causing BVH ambiguity on the next trace.
Floor the step at 4 * ulp(p.pos) along the entering direction so the absorption position is unambiguously inside the absorbing material. For typical world coordinates the floor is ~14 µm at 30 m or 0.5 µm at 1 m - negligible vs typical WLS layer thickness, but resolves the BVH ambiguity that loses ~12% of photons in single-stage WLS at DUNE scale and ~1% in two-stage WLS at the same scale.
Per-photon overhead: 3 fmaxf, 1 fabsf x3, 1 multiply per WLS absorption. Measured kernel-time impact: +6.2% on a 24M-photon DUNE-module event.
Validation on tests/geom/nested_dune_module.gdml (not yet in repo but will be oushed) seeds 42-45 single 1 GeV e- events: