fix(mlmg): stop GPU segfault in EB mask (default solver broken on CUDA wheel) by jameslehoux · Pull Request #299 · BASE-Laboratory/OpenImpala

jameslehoux · 2026-06-13T09:48:28Z

Headline: the default solver (`solver="auto"` = MLMG) segfaults on every GPU build

TortuosityMLMG::solve() passed a GPU DeviceVector pointer to ActiveMaskIF, the implicit function consumed by amrex::EB2::Build. AMReX evaluates that implicit function on the host while generating the EB geometry, so on a GPU build the CPU code dereferenced a device pointer and segfaulted (SIGSEGV → MPI_ABORT errorcode 11) at the very first EB step — before the solve started, for every geometry, with the GPU otherwise idle.

Because MLMG is the library default, this broke the entire openimpala-cuda wheel. CPU-only CI never caught it: on a CPU build the mask pointer is host memory, so the host-side IF evaluation is valid and tTortuosityMLMG passes.

How it was diagnosed

Captured, isolated runs on a Colab T4 (CUDA wheel) ruled out the obvious culprits in turn:

Not OOM — the failing solve used 13 MiB with 14.6 GB free.
Not cut-cell handling — MLMG segfaults on a fully-active uniform block and on porespy blobs, identically.
Localised to solve()'s first EB step — output always dies right after the constructor's Initialized with eps=... line, inside EB2::Build.

The code path (TortuosityMLMG.cpp): device pointer at the #ifdef AMREX_USE_GPU block → ActiveMaskIF{… mask_data_ptr} → EB2::Build(...) → ActiveMaskIF::operator() dereferences mask[idx].

The fix

Allocate the mask as amrex::Gpu::ManagedVector<int> so the single pointer is valid on both host and device, regardless of where AMReX evaluates the IF. On CPU builds ManagedVector degrades to an ordinary host allocation, so behaviour there is unchanged.

⚠️ Validation status

The diagnosis is code-confirmed, but this has not been run on GPU hardware — I had no GPU/CUDA build available, and CPU CI is structurally blind to this class of bug. Please rebuild the CUDA wheel and re-run solver="mlmg" on a T4 (and the profiling-notebook bake-off) before merging the fix with confidence. A regression test that exercises the EB/MLMG path with a device mask pointer would prevent recurrence.

Also bundled in this branch (per request)

These are independent and can be reviewed separately:

docs: refresh CLAUDE.md — the file described Fortran kernels (*.F90/*_F.H) that no longer exist (migrated to native C++ in TortuosityKernels.H); also adds the MLMG solver, TortuositySolverBase, the homogenisation/microstructure modules, and the Python/pybind11 layer to the reference tables.
notebooks: harden the §3 solver bake-off — notebooks/profiling_and_tuning.ipynb. Standalone SMG/PFMG produce a NaN residual on GPU (numerical breakdown on masked rows) and could hard-kill the kernel; the bake-off now skips them on GPU builds (they're used as preconditioners regardless) and notes why.
chore: add SessionStart hook — sets the git commit identity in the ephemeral web containers so commits are attributed correctly.

Follow-ups worth a separate issue

Several Krylov configs "converge" to disagreeing τ on the same 64³ system (2.38 / 2.46 / 2.63 / 2.51), and gmres reported "converged but produced an invalid result" (flux-conservation check). Smells like the convergence tolerance not pinning down τ — needs its own investigation.

https://claude.ai/code/session_01VYc1je5VpiW46QvRBw8QFv

Generated by Claude Code

The previous subprocess-isolation attempt backfired: spawning a CUDA- initialising child per combo while the parent kernel still holds the GPU caused even pcg to abort (MPI_Abort, exit 6) from device-memory contention, and did not reliably keep the kernel alive. Revert to the proven in-process loop and instead drop only the two solvers that actually hard-abort on a GPU build — standalone smg and pfmg (no Krylov wrapper). In the original report every other combo up to bicgstab ran fine in-process; standalone SMG was the sole kernel-killer. As preconditioners (pcg+smg, flexgmres+pfmg) these multigrids are capped at one V-cycle and stay. On CPU builds nothing is skipped. Soft non-convergence is still caught and shown as a FAILED row.

TortuosityMLMG::solve() passed a GPU DeviceVector pointer to ActiveMaskIF, the implicit function consumed by EB2::Build. AMReX evaluates that implicit function on the host while generating the EB geometry, so on a GPU build the host code dereferenced a device pointer and segfaulted (SIGSEGV/MPI_ABORT errorcode 11) at the very first EB step — before the solve began, for every geometry, with the GPU otherwise idle. Because MLMG is the default solver (solver="auto"), this broke the entire openimpala-cuda wheel for all geometries. CPU-only CI never caught it: on a CPU build the mask pointer is host memory, so the host-side IF evaluation is valid and the tTortuosityMLMG tests pass. Allocate the mask as amrex::Gpu::ManagedVector so the single pointer is valid on both host and device, regardless of where AMReX evaluates the IF. On CPU builds ManagedVector degrades to an ordinary host allocation, so behaviour is unchanged there. Needs validation on real GPU hardware: rebuild the CUDA wheel and re-run solver="mlmg" on a T4 (and the profiling notebook bake-off).

github-actions · 2026-06-13T09:51:13Z

Performance Benchmark Results

Size	Solver	Wall Time (s)	Tortuosity	Expected	Rel. Error	Iters	Status
64³	pcg	0.6247	0.984375	0.984375	0.00e+00	1	PASS
64³	flexgmres	0.3816	0.984375	0.984375	0.00e+00	N/A	PASS
64³	bicgstab	0.3737	0.984375	0.984375	0.00e+00	N/A	PASS
64³	gmres	0.3757	0.984375	0.984375	0.00e+00	N/A	PASS
128³	pcg	7.3815	0.992188	0.992188	0.00e+00	1	PASS
128³	flexgmres	5.2642	0.992188	0.992188	0.00e+00	N/A	PASS
128³	bicgstab	5.1410	0.992188	0.992188	0.00e+00	N/A	PASS
128³	gmres	5.1432	0.992188	0.992188	0.00e+00	N/A	PASS

Fastest solver: bicgstab at 64³ (0.3737s)

Benchmark: uniform block (analytical τ = (N-1)/N)

github-actions · 2026-06-13T10:00:46Z

Code Coverage Report

------------------------------------------------------------------------------
                           GCC Code Coverage Report
Directory: .
------------------------------------------------------------------------------
File                                       Lines     Exec  Cover   Missing
------------------------------------------------------------------------------
src/io/CathodeWrite.cpp                       95       83    87%   40-41,97-100,115-116,182-185
src/io/CathodeWrite.H                          1        1   100%
src/io/DatReader.cpp                         136      106    77%   28-29,32,37,94-95,101-102,109-111,137-139,143,146-150,154-157,164,166,210-211,256,259
src/io/DatReader.H                             1        1   100%
src/io/HDF5Reader.cpp                        344       84    24%   40-41,43-44,46-49,52,54-56,58-59,62,64-66,68-74,92-93,126-128,144-145,154-157,174-180,182-187,204,213-215,217,219-228,230-233,236-238,240-251,253-258,266,266,266,266,266,266,266,270,270,270,270,270,270,270,274,276,278,280,282,288,290,297,297,297,297,297,297,297,301,301,301,301,301,301,301,305,305,305,305,305,305,305-306,306,306,306,306,306,306,309,309,309,309,309,309,309-310,310,310,310,310,310,310-311,311,311,311,311,311,311,313,313,313,313,313,313,313-314,314,314,314,314,314,314-315,315,315,315,315,315,315,319,319,319,319,319,319,319,324,324,324,324,324,324,324-325,325,325,325,325,325,325-326,326,326,326,326,326,326-327,327,327,327,327,327,327,332,332,332,332,332,332,332,337,337,337,337,337,337,337-338,338,338,338,338,338,338,343,343,343,343,343,343,343,350,350,350,350,350,350,350,357-358,432-435,437-440
src/io/HDF5Reader.H                            3        3   100%
src/io/ImageLoader.cpp                        61       42    68%   25,38,48,60-62,64-70,72,77,89-90,92,94
src/io/RawReader.cpp                         267      136    50%   51-52,91-92,113-114,117-119,122-123,142-144,157-159,168-170,176-179,187-188,194-198,202-206,211-214,221-226,233-239,273,275-276,278,285-286,303,314,316,320,327,329,333-336,340,348-349,355-357,363-365,367-368,371,374,376,379-382,384-386,388,390-391,393,395-396,398,400-401,403,405-406,408,412-413,415,419-420,422,427,467,473-474,535-538,552,554-556,558,560-562,572,576-578,580,602
src/io/RawReader.H                             1        1   100%
src/io/TiffReader.cpp                        385      131    34%   60-66,68-70,72-74,76-78,80-81,83-85,87-89,91-93,95-97,99-100,102-104,107-109,112-113,115-118,120,123,125-128,144-145,149-151,153-159,161,187,211,218,227,229-232,241,243-246,249,256,289-294,307,310-318,320-321,324-328,332-336,339-343,345-349,352-358,360-364,368,370,376-378,380-394,397,399-403,405-410,414-419,421-426,429-430,433-435,569-589,591-592,595-602,604,607-623,626-628,684,687-688,691-697,699,703-714,716-717
src/io/TiffReader.H                            5        5   100%
src/props/BoundaryCondition.H                131       74    56%   63,68,70,216,224-229,233-236,238-244,247-249,252-253,255,258-261,264-265,271-272,274-279,285-287,290-296,299,303,365-366,371,373
src/props/ConnectedComponents.cpp             71       69    97%   115-116
src/props/ConnectedComponents.H                4        4   100%
src/props/DeffTensor.cpp                      62       59    95%   122,128-129
src/props/Diffusion.cpp                      510      378    74%   93-94,97-98,103-104,106-116,118,123-132,134-141,144-150,153-157,159-163,165,168-173,175-177,179,182-184,186-187,190-191,193,195-198,200,202-203,288-289,297-298,300,349,359-360,368-371,373-375,404-413,415,453,461,465-467,526-527,533,535,539,547,581,610,638,646,735-736,739-740,757-760,771-772,774,824
src/props/EffDiffFillMtx.H                   120      106    88%   58,216-217,221-225,229,231-235
src/props/EffectiveDiffusivityHypre.cpp      413      372    90%   189-191,193-197,352-355,458,610-613,615-617,619-622,631-634,641,670,682-685,687-689,691,706,724,726
src/props/EffectiveDiffusivityHypre.H          7        7   100%
src/props/FloodFill.cpp                       90       87    96%   109-110,250
src/props/HypreStructSolver.cpp              343      210    61%   87-88,121,133-134,145,303,313,315,318,350,360,362,365,371-374,376-380,382-383,385-389,392-393,395-396,398,401-402,405-406,408-411,413-417,419-420,422-426,429-430,432-433,435,438-439,442-443,445-447,449-455,457-461,464-465,467-468,470,473-474,477,479-481,483-489,491-495,498-499,501-502,504,507-508,511,513-515,517-520,522-526,529-530,532-533,535,538-539,542,545-546,559
src/props/HypreStructSolver.H                  6        6   100%
src/props/MacroGeometry.H                     17       17   100%
src/props/ParticleSizeDistribution.cpp        11       11   100%
src/props/ParticleSizeDistribution.H           6        6   100%
src/props/PercolationCheck.cpp                53       46    86%   32-33,49-51,68,73
src/props/PercolationCheck.H                   4        4   100%
src/props/PhysicsConfig.H                     90       89    98%   150
src/props/ResultsJSON.H                      225      222    98%   242,395,416
src/props/REVStudy.cpp                       151      128    84%   72,83-91,159,170-173,175,183-186,188-190
src/props/SolverConfig.H                      32       20    62%   30,32,37-44,75-76
src/props/SpecificSurfaceArea.cpp             56       55    98%   59
src/props/SpecificSurfaceArea.H                6        6   100%
src/props/ThroughThicknessProfile.cpp         38       38   100%
src/props/ThroughThicknessProfile.H            5        5   100%
src/props/Tortuosity.H                         2        2   100%
src/props/TortuosityDirect.cpp               219      191    87%   81-83,86,100-106,113-114,125,134,140,202-209,226,394,424,433
src/props/TortuosityDirect.H                   5        5   100%
src/props/TortuosityHypre.cpp                793      567    71%   149-150,155-156,240-243,246-248,311,335-337,340-341,343,371-373,376-378,408-411,620,644,648,669,686-687,689-691,694-701,708-709,711,713,716-726,730-736,738-742,746-748,750-752,755-762,769-770,772,774-784,788-796,798-801,803,813,819-822,824-826,835-838,840-842,878,881-882,902-904,907,918-921,923,960,965-968,971-973,977-980,982,984-987,989,994-996,998,1047,1056,1061,1064-1069,1085-1088,1102-1106,1111-1116,1126-1130,1135-1140,1145-1149,1152-1155,1162-1165,1176,1185,1187,1191,1193,1218,1259-1260,1346-1348,1474-1477
src/props/TortuosityHypre.H                   15       15   100%
src/props/TortuosityHypreFill.H              127       98    77%   85,203,205-212,237-239,241-245,247-248,250,252,255-256,258-262
src/props/TortuosityKernels.H                 97       53    54%   52,56-60,62-65,69-74,76-80,84-85,90,129,143,157,243,245-248,250-253,257-260,262-265
src/props/TortuosityMLMG.cpp                 152      145    95%   285-287,289-290,295,316
src/props/TortuosityMLMG.H                     1        1   100%
src/props/TortuositySolverBase.cpp           311      247    79%   70-72,74-75,94-100,118,122,124,160-163,218,221,223,409,412-414,416,424-427,429-435,440,445-447,453-454,456-458,494,498-500,503,508-511,513,544,548-550,552,554,558
src/props/TortuositySolverBase.H              13       13   100%
src/props/VolumeFraction.cpp                  25       25   100%
src/props/VolumeFraction.H                     4        4   100%
------------------------------------------------------------------------------
TOTAL                                       5514     3978    72%
------------------------------------------------------------------------------

Generated by CI — coverage data from gcovr

codecov · 2026-06-13T10:01:10Z

Codecov Report

❌ Patch coverage is 33.33333% with 2 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/props/TortuosityMLMG.cpp	33.33%	0 Missing and 2 partials ⚠️

📢 Thoughts on this report? Let us know!

jameslehoux added 2 commits June 13, 2026 09:08

github-actions Bot added devops gpu physics labels Jun 13, 2026

jameslehoux merged commit b0ae42d into master Jun 13, 2026
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(mlmg): stop GPU segfault in EB mask (default solver broken on CUDA wheel)#299

fix(mlmg): stop GPU segfault in EB mask (default solver broken on CUDA wheel)#299
jameslehoux merged 2 commits into
masterfrom
claude/elegant-ritchie-qjw4r0

jameslehoux commented Jun 13, 2026

Uh oh!

Uh oh!

github-actions Bot commented Jun 13, 2026

Uh oh!

github-actions Bot commented Jun 13, 2026

Uh oh!

codecov Bot commented Jun 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jameslehoux commented Jun 13, 2026

Headline: the default solver (solver="auto" = MLMG) segfaults on every GPU build

How it was diagnosed

The fix

⚠️ Validation status

Also bundled in this branch (per request)

Follow-ups worth a separate issue

Uh oh!

Uh oh!

github-actions Bot commented Jun 13, 2026

Performance Benchmark Results

Uh oh!

github-actions Bot commented Jun 13, 2026

Code Coverage Report

Uh oh!

codecov Bot commented Jun 13, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Headline: the default solver (`solver="auto"` = MLMG) segfaults on every GPU build