Context
For DFT inversion work, we need 4-center 1-electron overlap integrals:
∫ φ_i(r) φ_j(r) φ_k(r) φ_l(r) dr
These are distinct from the 2-electron ERIs (ij|1/r₁₂|kl) that SlaterGPU already wraps via gen_eri. Same 4-index structure, but a 1-electron integral with no Coulomb operator.
libcint provides this via cint4c1e_sph / cint4c1e_cart (available since libcint v2.8.16, implemented in src/cint4c1e.c and src/g4c1e.c), but these functions are not compiled by default. Paul has a working prototype in his personal (unversioned) code — including both a libcint cache bug fix and a SlaterGPU-style wrapper — that produces reasonable DFT inversion results.
Priority 1: Enable -DWITH_4C1E=1 in the build
The CMake flag -DWITH_4C1E=1 must be passed when building libcint for these functions to be available. Update SlaterGPU's pixi.toml and/or CMake configuration so that libcint is built with this flag. This lets us maintain a unified build rather than requiring a separate manual libcint compilation.
Future work: Integrate Paul's wrapper
Paul already has a working cint4c1e wrapper in his local code that follows the gen_eri pattern in cintwrapper.cpp. This needs to be brought into the repo. The wrapper:
- Loops over all 4 shell quartets
- Calls
cint4c1e_cart or cint4c1e_sph depending on BT::DO_CART
- Unpacks buffer with standard column-major indexing
- Full permutational symmetry over all 4 indices
Future work: Known upstream libcint bug (cache buffer sizing)
libcint's own CMakeLists.txt contains the warning: "Note there are bugs in 4c1e integral functions".
The likely bug is a cache buffer sizing issue in src/cint4c1e.c and src/g4c1e.c:
CINTg4c1e_ovlp (in g4c1e.c) needs 3 * b_size doubles of scratch space from the cache pointer, where b_size = db * (MAX(nmax, mmax) + 1) and db = nmax + mmax + 1 (with nmax = li + lj, mmax = lk + ll).
- The cache size calculations in
CINT4c1e_drv (both the size-query path when out == NULL at ~line 270, and the self-allocation path when cache == NULL at ~line 278) do not include this 3 * b_size term.
CINT4c1e_loop_nopt (~line 108) has the same omission in its len calculation.
- For low angular momentum shells, there's enough slack in the
leng allocation that it works by accident. For higher angular momentum (roughly li+lj+lk+ll >= 6), bufx/bufy/bufz can write past the end of the cache — a heap buffer overflow.
Proposed fix (applied by Paul in his prototype, not yet validated with tests): add ovlp_scratch = 3 * db * (MAX(nmax, mmax) + 1) to the len/cache_size calculations in all 4 affected locations in cint4c1e.c. The values nmax, mmax are available from envs->li_ceil + envs->lj_ceil and envs->lk_ceil + envs->ll_ceil.
This fix has not been tested — libcint has a test (testsuite/test_cint4c1e.py) that validates cint4c1e_sph against a Laplacian-based identity using cint2e_ipip1_sph and cint2e_ipvip1_sph. That should be run to confirm correctness. We should also consider filing the fix upstream on sunqm/libcint.
References
- libcint 4c1e source:
src/cint4c1e.c, src/g4c1e.c in sunqm/libcint
- libcint test:
testsuite/test_cint4c1e.py
- SlaterGPU ERI wrapper (pattern to follow):
src/libcintw/cintwrapper.cpp, gen_eri function
- libcint build flag:
CMakeLists.txt option WITH_4C1E
Context
For DFT inversion work, we need 4-center 1-electron overlap integrals:
These are distinct from the 2-electron ERIs
(ij|1/r₁₂|kl)that SlaterGPU already wraps viagen_eri. Same 4-index structure, but a 1-electron integral with no Coulomb operator.libcint provides this via
cint4c1e_sph/cint4c1e_cart(available since libcint v2.8.16, implemented insrc/cint4c1e.candsrc/g4c1e.c), but these functions are not compiled by default. Paul has a working prototype in his personal (unversioned) code — including both a libcint cache bug fix and a SlaterGPU-style wrapper — that produces reasonable DFT inversion results.Priority 1: Enable
-DWITH_4C1E=1in the buildThe CMake flag
-DWITH_4C1E=1must be passed when building libcint for these functions to be available. Update SlaterGPU's pixi.toml and/or CMake configuration so that libcint is built with this flag. This lets us maintain a unified build rather than requiring a separate manual libcint compilation.Future work: Integrate Paul's wrapper
Paul already has a working
cint4c1ewrapper in his local code that follows thegen_eripattern incintwrapper.cpp. This needs to be brought into the repo. The wrapper:cint4c1e_cartorcint4c1e_sphdepending onBT::DO_CARTFuture work: Known upstream libcint bug (cache buffer sizing)
libcint's own
CMakeLists.txtcontains the warning: "Note there are bugs in 4c1e integral functions".The likely bug is a cache buffer sizing issue in
src/cint4c1e.candsrc/g4c1e.c:CINTg4c1e_ovlp(ing4c1e.c) needs3 * b_sizedoubles of scratch space from thecachepointer, whereb_size = db * (MAX(nmax, mmax) + 1)anddb = nmax + mmax + 1(withnmax = li + lj,mmax = lk + ll).CINT4c1e_drv(both the size-query path whenout == NULLat ~line 270, and the self-allocation path whencache == NULLat ~line 278) do not include this3 * b_sizeterm.CINT4c1e_loop_nopt(~line 108) has the same omission in itslencalculation.lengallocation that it works by accident. For higher angular momentum (roughlyli+lj+lk+ll >= 6),bufx/bufy/bufzcan write past the end of the cache — a heap buffer overflow.Proposed fix (applied by Paul in his prototype, not yet validated with tests): add
ovlp_scratch = 3 * db * (MAX(nmax, mmax) + 1)to thelen/cache_sizecalculations in all 4 affected locations incint4c1e.c. The valuesnmax,mmaxare available fromenvs->li_ceil + envs->lj_ceilandenvs->lk_ceil + envs->ll_ceil.This fix has not been tested — libcint has a test (
testsuite/test_cint4c1e.py) that validatescint4c1e_sphagainst a Laplacian-based identity usingcint2e_ipip1_sphandcint2e_ipvip1_sph. That should be run to confirm correctness. We should also consider filing the fix upstream on sunqm/libcint.References
src/cint4c1e.c,src/g4c1e.cin sunqm/libcinttestsuite/test_cint4c1e.pysrc/libcintw/cintwrapper.cpp,gen_erifunctionCMakeLists.txtoptionWITH_4C1E