Skip to content

[BUG] - PETSc error code on HPC #321

@brmather

Description

@brmather

Hi team,

Here's a difficult bug to reproduce! A script which normally runs fine on a small number of processors is failing when I run it across a large number of processors. It's failing at the SNES solve stage with PETSc error 62 (although sometimes error code 73 is printed out).

Things I've discovered:

  • It actually runs for mesh refinement < 5 for the cubed sphere mesh
  • It can run for mesh refinement < 3 on the spherical shell mesh
  • It's not a memory usage limitation

I attach a modified Darcy flow benchmark I used to reproduce this error, but again, it's difficult to reproduce this error unless you have access to >= 128 cores.

[0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
[0]PETSC ERROR: Invalid argument
[0]PETSC ERROR: Scalar value must be same on all processes, argument # 3
[0]PETSC ERROR: WARNING! There are unused option(s) set! Could be the program crashed before usage or a spelling mistake, etc!
[0]PETSC ERROR:   Option left: name:-Solver_14_mg_levels_ksp_converged_maxits (no value) source: code
[0]PETSC ERROR:   Option left: name:-Solver_14_mg_levels_ksp_max_it value: 3 source: code
[0]PETSC ERROR:   Option left: name:-Solver_14_pc_mg_type value: additive source: code
[0]PETSC ERROR:   Option left: name:-Solver_15_ksp_rtol value: 0.001 source: code
[0]PETSC ERROR:   Option left: name:-Solver_15_ksp_type value: gmres source: code
[0]PETSC ERROR:   Option left: name:-Solver_15_mg_levels_ksp_converged_maxits (no value) source: code
[0]PETSC ERROR:   Option left: name:-Solver_15_mg_levels_ksp_max_it value: 3 source: code
[0]PETSC ERROR:   Option left: name:-Solver_15_pc_gamg_agg_nsmooths value: 2 source: code
[0]PETSC ERROR:   Option left: name:-Solver_15_pc_gamg_repartition value: true source: code
[0]PETSC ERROR:   Option left: name:-Solver_15_pc_gamg_type value: agg source: code
[0]PETSC ERROR:   Option left: name:-Solver_15_pc_mg_type value: additive source: code
[0]PETSC ERROR:   Option left: name:-Solver_15_pc_type value: gamg source: code
[0]PETSC ERROR:   Option left: name:-Solver_15_snes_atol value: 1e-08 source: code
[0]PETSC ERROR:   Option left: name:-Solver_15_snes_rtol value: 0.001 source: code
[0]PETSC ERROR:   Option left: name:-Solver_15_snes_type value: newtonls source: code
[0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.
[0]PETSC ERROR: Petsc Release Version 3.21.6, unknown
[0]PETSC ERROR: 05-assimilate-conductivity.py on a  named ip-0A3A580C by ben.r.mather Mon Jun 16 00:33:30 2025
[0]PETSC ERROR: Configure options --with-debugging=1 --prefix=/shared/home/ben.r.mather/petsc-3.21.5-hpcx-mt3-debug --COPTFLAGS="-g -O3" --CXXOPTFLAGS="-g -O3" --FOPTFLAGS="-g -O3" --with-petsc4py=1 --with-zlib=1 --with-shared-libraries=1 --with-cxx-dialect=C++11 --with-make-np=4 --download-bison --download-hdf5=[https://github.com/HDFGroup/hdf5/archive/refs/tags/hdf5-1\_10\_8.tar.gz](https://github.com/HDFGroup/hdf5/archive/refs/tags/hdf5-1_10_8.tar.gz) --download-mumps=1 --download-parmetis=1 --download-metis=1 --download-superlu=1 --download-hypre=1 --download-scalapack=1 --download-superlu_dist=1 --download-pragmatic=1 --download-ctetgen=1 --download-eigen --download-triangle --download-ptscotch --download-fblaslapack
[0]PETSC ERROR: #1 VecMAXPYAsync_Private() at /shared/home/ben.r.mather/petsc/src/vec/vec/interface/rvector.c:1242
[0]PETSC ERROR: #2 VecMAXPY() at /shared/home/ben.r.mather/petsc/src/vec/vec/interface/rvector.c:1286
[0]PETSC ERROR: #3 KSPGMRESClassicalGramSchmidtOrthogonalization() at /shared/home/ben.r.mather/petsc/src/ksp/ksp/impls/gmres/borthog2.c:73
[0]PETSC ERROR: #4 KSPGMRESCycle() at /shared/home/ben.r.mather/petsc/src/ksp/ksp/impls/gmres/gmres.c:149
[0]PETSC ERROR: #5 KSPSolve_GMRES() at /shared/home/ben.r.mather/petsc/src/ksp/ksp/impls/gmres/gmres.c:227
[0]PETSC ERROR: #6 KSPSolve_Private() at /shared/home/ben.r.mather/petsc/src/ksp/ksp/interface/itfunc.c:905
[0]PETSC ERROR: #7 KSPSolve() at /shared/home/ben.r.mather/petsc/src/ksp/ksp/interface/itfunc.c:1078
[0]PETSC ERROR: #8 PCGAMGOptProlongator_AGG() at /shared/home/ben.r.mather/petsc/src/ksp/pc/impls/gamg/agg.c:1365
[0]PETSC ERROR: #9 PCSetUp_GAMG() at /shared/home/ben.r.mather/petsc/src/ksp/pc/impls/gamg/gamg.c:710
[0]PETSC ERROR: #10 PCSetUp() at /shared/home/ben.r.mather/petsc/src/ksp/pc/interface/precon.c:1079
[0]PETSC ERROR: #11 KSPSetUp() at /shared/home/ben.r.mather/petsc/src/ksp/ksp/interface/itfunc.c:415
[0]PETSC ERROR: #12 KSPSolve_Private() at /shared/home/ben.r.mather/petsc/src/ksp/ksp/interface/itfunc.c:831
[0]PETSC ERROR: #13 KSPSolve() at /shared/home/ben.r.mather/petsc/src/ksp/ksp/interface/itfunc.c:1078
[0]PETSC ERROR: #14 SNESSolve_NEWTONLS() at /shared/home/ben.r.mather/petsc/src/snes/impls/ls/ls.c:221
[0]PETSC ERROR: #15 SNTraceback (most recent call last):
File "/shared/home/ben.r.mather/mge/data-engineering/feature_pool/fluid-flow-modelling/05-assimilate-conductivity.py", line 414, in <module>
Traceback (most recent call last):
File "/shared/home/ben.r.mather/miniforge3/envs/uw3/lib/python3.11/site-packages/underworld3/systems/solvers.py", line 319, in solve
darcy.solve()
File "src/underworld3/cython/petsc_generic_snes_solvers.pyx", line 898, in underworld3.cython.generic_solvers.SNES_Scalar.solve
super().solve(zero_init_guess, _force_setup)
File "petsc4py/PETSc/SNES.pyx", line 1601, in petsc4py.PETSc.SNES.solve
petsc4py.PETSc.Error: error code 62 

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions