Fix master CI: expv zero-input NaN, JET-on-1.12 QA, GPU-in-All by ChrisRackauckas-Claude · Pull Request #229 · SciML/ExponentialUtilities.jl

ChrisRackauckas-Claude · 2026-06-19T09:18:07Z

Fixes three independent failures on the master grouped-tests CI.

1. Core: `NaN == 0.0` at `basictests.jl:307` (zero-input expv)

The real expv!(w, t::Real, Ks) method was missing the iszero(beta) guard the complex method already has. For a zero input vector firststep! skips initializing the Krylov basis V (it only fills V[:,1] when beta != 0), so the final lmul!(beta, mul!(w, @view(V[:,1:m]), expHe)) computes 0 * <uninitialized memory>, which is NaN whenever V holds garbage — explaining why the failure was flaky (heap-dependent: green on some OS/runs, NaN on others). Added the same early-return guard so expv of a zero vector is exactly zero.

Verified locally: full GROUP=Core Pkg.test passes on Julia 1.10 and 1.12 (it reliably produced NaN on 1.10 before).

2. QA: 6 JET failures on the `1` (= Julia 1.12) channel

lts (1.10) was green; only 1 (1.12) failed. On 1.12 JET traces into LinearAlgebra/Base internals — norm(::Vector) → norm_recursive_check → iterate(::Nothing), and the broadcast unalias/copyto_unaliased! path over Adjoint{T, Union{}} — and reports abstract-interpretation artifacts there that this package does not control. Scoped the QA report_calls to target_modules = (ExponentialUtilities,) (the standard JET-as-package-QA configuration), which keeps full coverage of this package's own code.

That scoping surfaced two genuine may be undefined findings, which are fixed here so the scoped analysis is clean (not silenced):

si in exponential! (exp_baseexp.jl) — conditionally assigned inside if s > 0, used inside a separate if s > 0; now initialized to 0 unconditionally.
order / kest in kiops (kiops.jl) — carried across loop iterations via the orderold/kestold "reuse" flags but only conditionally assigned; now seeded with their first-iteration defaults.

Verified locally: QA passes 17/17 on Julia 1.10 and 1.12.

3. Core (windows): "CUDA driver not functional"

On Windows the Core job runs the run_tests "All" aggregate, which pulled in the GPU group, and using CUDA errored on the non-GPU runner. Marked the GPU group in_all = false so it only ever runs under an explicit GROUP=GPU on the self-hosted CUDA runner. Verified locally: GROUP=All now runs only Core/basictests.jl, never GPU/gputests.jl.

Not addressed (reported separately)

Core (julia pre, macos-latest): Static Arrays tolerance failure at basictests.jl:265 (expv(t,A,b) ≈ exp(t*A)*b). On linux Julia 1.13-rc1 the worst relative error is 1.25e-15; the macOS-pre failure shows ~1e-7. This is a macOS/1.13-rc-specific accuracy difference I could not reproduce or correctly fix on linux, and I will not loosen the tolerance without being able to prove the macOS deviation is benign.
GPU (self-hosted): requires CUDA hardware (infra), out of scope here.

Please ignore until reviewed by @ChrisRackauckas.

Three independent master-CI failures on the grouped-tests workflow: 1. Core (NaN == 0.0 at basictests.jl:307, flaky across OS/version). The real `expv!(w, t::Real, Ks)` method lacked the `iszero(beta)` guard that the complex method already has. For a zero input vector `firststep!` skips initializing the Krylov basis V (it only fills it when beta != 0), so `lmul!(beta, mul!(w, V, expHe))` computes `0 * <uninitialized memory>`, which is NaN whenever V holds garbage. Add the same early-return guard, making expv of a zero vector exactly zero (matching the complex method). Verified: full Core suite now passes on Julia 1.10 and 1.12 (was reliably NaN on 1.10). 2. QA (6 JET failures on the Julia "1" = 1.12 channel; lts/1.10 was green). On 1.12 JET traces into LinearAlgebra/Base internals (`norm(::Vector)` -> `norm_recursive_check` -> `iterate(::Nothing)`, and the broadcast `unalias`/`copyto_unaliased!` path over `Adjoint{T, Union{}}`) and reports artifacts there that this package does not control. Scope the QA `report_call`s to `target_modules = (ExponentialUtilities,)` — the standard JET-as-QA configuration — which keeps full coverage of this package's own code. That scoping surfaced two genuine `may be undefined` findings, fixed here so the scoped analysis is clean: `si` in `exponential!` and `order`/`kest` in `kiops` are now unconditionally initialized before use. Verified: QA passes 17/17 on Julia 1.10 and 1.12. 3. Core (windows, all versions: "CUDA driver not functional"). On Windows the Core job runs the run_tests "All" aggregate, which pulled in the GPU group and `using CUDA` errored on the non-GPU runner. Mark the GPU group `in_all = false` so it only runs under an explicit GROUP=GPU on the self-hosted CUDA runner. Verified locally: GROUP=All now runs only Core/basictests.jl, never GPU/gputests.jl. Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The "Static Arrays" testset compared `expv(t, A, b)` against `exp(t * A) * b` where `exp(t * A)` is StaticArrays' SMatrix matrix exponential. That reference uses an unbalanced scaling-and-squaring Padé path which loses ~7-9 digits for the larger non-normal N=8 cases on macOS + Julia prerelease (relerr ~1e-7..1e-5), tripping the default-tolerance isapprox in "Core (julia pre, macos-latest)". Verified against a 512-bit BigFloat ground truth that the macOS `expv` output is correct to ~1e-16 on both platforms; it was the StaticArrays `exp` reference, not `expv`, that drifted. Switching the reference to the dense LAPACK `exp`, which is balanced and accurate on every platform, keeps this a machine-precision assertion that still catches real `expv` regressions (no tolerance loosening). Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

ChrisRackauckas-Claude · 2026-06-23T01:59:06Z

Resolved the last red — Core (julia pre, macos-latest) failing at test/basictests.jl:265 in the "Static Arrays" testset.

Root cause: not an expv bug. The assertion was expv(t, A, b) ≈ exp(t * A) * b, where exp(t * A) dispatches to StaticArrays.jl's own SMatrix matrix exponential. I reconstructed the two exact failing matrices (N=8, t=1.0 and N=8, t=10.0; RNG seed 0) and computed a 512-bit BigFloat ground truth:

quantity	relerr vs BigFloat truth (case N=8,t=1.0 / N=8,t=10.0)
macOS `expv` output (under test)	3.2e-16 / 7.7e-16 (correct)
macOS `exp(t*A)` StaticArrays reference	4.0e-7 / 7.5e-6 (wrong)
Linux `expv`	3.2e-16 / 7.3e-16
Linux `exp(t*A)` StaticArrays reference	1.9e-16 / 7.3e-16

So expv is machine-accurate on both platforms. It was the reference exp(t*A) (StaticArrays' unbalanced scaling-and-squaring Padé path — the source even notes "omitted: matrix balancing") that drifted ~7-9 digits on macOS + Julia 1.13-rc1. The test was comparing a correct value against a platform-fragile reference that is less accurate than the thing under test.

Default tolerances for context. The SMatrix expv extension targets eps(T)/2 ≈ 1.1e-16 (default_tolerance), and the Krylov expv path's happy-breakdown tol is 1e-7; neither is the issue here. The test gate is the default isapprox (rtol ≈ 1.49e-8). The macOS error of ~1e-7..1e-5 is far above any plausible expv FP floor (I confirmed across 400 seeds and forced mo/s/break-tol perturbations that faithful expv stays ≤5e-14), which is what pointed at the reference, not expv.

Fix (no tolerance loosening): compare expv against the dense LAPACK exponential exp(t * Matrix(A)) * Vector(b), which is balanced and accurate on every platform. This keeps a machine-precision (default-tolerance) assertion that still catches real expv regressions.

Verified locally (Pkg.test GROUP=Core, full basictests.jl):

Julia 1.13.0-rc1 (= CI pre): 329 pass, 1 broken (pre-existing @test_broken)
Julia 1.12.6: 329 pass, 1 broken
Julia 1.10.11 (lts): 329 pass, 1 broken

The "Static Arrays" testset itself: 12 pass / 12 on rc.

Ignore until reviewed by @ChrisRackauckas.

Convert the hand-rolled test/qa/qa.jl (raw Aqua.test_* + per-function JET.report_call) to the SciMLTesting 1.6 `run_qa` form and enable the ExplicitImports checks. ExplicitImports findings (run vs released SciMLTesting 1.6.0): * no_stale_explicit_imports: removed the genuinely stale `ArrayInterface.allowed_getindex` import (never referenced; only `ismutable`/`allowed_setindex!` are used). * Made the `for i in 1:13 include("exp_generated/exp_$i.jl")` dynamic include in exp_noalloc.jl static (13 literal includes) so the module is analyzable — this unblocked no_implicit_imports and no_stale_explicit_imports (previously UnanalyzableModuleException). Verified Higham2005 matrix-exp still matches Base `exp` to ~6.7e-16. * all_qualified_accesses_via_owners / all_qualified_accesses_are_public / all_explicit_imports_are_public: ignore-listed other packages' non-public names (Base / LinearAlgebra(.BLAS/.LAPACK, incl. Stegr submodule) / ArrayInterface / libblastrampoline_jll); they go public as the base libs declare them. * no_implicit_imports: ~31 implicit names from `using LinearAlgebra, SparseArrays, Printf, PrecompileTools`. Making them explicit is a large refactor; marked ei_broken and tracked in SciML#231 (auto-flags when fixed). Deps: test/qa/Project.toml SciMLTesting compat -> "1.6" (Aqua + ExplicitImports are transitive via SciMLTesting; Aqua kept a direct dep so the ambiguities sub-check's child process can resolve it; JET kept for the JET check). Root Project.toml SciMLTesting compat -> "1.6". QA group on Julia 1.10 (lts), released SciMLTesting 1.6.0: Quality Assurance | 17 Pass, 1 Broken, 0 Fail, 0 Error (no_implicit_imports broken per SciML#231). On Julia 1.12 the JET typo check reports pre-existing "may be undefined" findings (kiops order/kest, Higham2005 ilo/ihi/scale/bal); master is already red there and the source fixes live in draft PR SciML#229. Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* QA: run_qa v1.6 form + ExplicitImports Convert the hand-rolled test/qa/qa.jl (raw Aqua.test_* + per-function JET.report_call) to the SciMLTesting 1.6 `run_qa` form and enable the ExplicitImports checks. ExplicitImports findings (run vs released SciMLTesting 1.6.0): * no_stale_explicit_imports: removed the genuinely stale `ArrayInterface.allowed_getindex` import (never referenced; only `ismutable`/`allowed_setindex!` are used). * Made the `for i in 1:13 include("exp_generated/exp_$i.jl")` dynamic include in exp_noalloc.jl static (13 literal includes) so the module is analyzable — this unblocked no_implicit_imports and no_stale_explicit_imports (previously UnanalyzableModuleException). Verified Higham2005 matrix-exp still matches Base `exp` to ~6.7e-16. * all_qualified_accesses_via_owners / all_qualified_accesses_are_public / all_explicit_imports_are_public: ignore-listed other packages' non-public names (Base / LinearAlgebra(.BLAS/.LAPACK, incl. Stegr submodule) / ArrayInterface / libblastrampoline_jll); they go public as the base libs declare them. * no_implicit_imports: ~31 implicit names from `using LinearAlgebra, SparseArrays, Printf, PrecompileTools`. Making them explicit is a large refactor; marked ei_broken and tracked in #231 (auto-flags when fixed). Deps: test/qa/Project.toml SciMLTesting compat -> "1.6" (Aqua + ExplicitImports are transitive via SciMLTesting; Aqua kept a direct dep so the ambiguities sub-check's child process can resolve it; JET kept for the JET check). Root Project.toml SciMLTesting compat -> "1.6". QA group on Julia 1.10 (lts), released SciMLTesting 1.6.0: Quality Assurance | 17 Pass, 1 Broken, 0 Fail, 0 Error (no_implicit_imports broken per #231). On Julia 1.12 the JET typo check reports pre-existing "may be undefined" findings (kiops order/kest, Higham2005 ilo/ihi/scale/bal); master is already red there and the source fixes live in draft PR #229. Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * QA: fix latent undefined-balancing locals in exponential!(::ExpMethodHigham2005) The run_qa v1.6 conversion runs JET in report_package typo mode, which analyzes each method signature in isolation. The hand-rolled qa.jl this replaced used JET.report_call(exponential!, (Matrix{Float64},)), where ExpMethodHigham2005(A) sets do_balancing = (A isa StridedMatrix) as a constant that JET could constant-propagate, so both `if method.do_balancing` blocks folded to true and ilo/ihi/scale/bal were seen as always defined. In report_package the method is analyzed with an abstract ExpMethodHigham2005, so do_balancing is a runtime Bool, the two balancing blocks are not provably correlated, and the undo block reads ilo/ihi/scale/bal as possibly-undefined locals (20 JET typo reports on Julia 1.12; 1.10 abstract-interp did not reach them). Seed ilo=1/ihi=n/scale=_scale as no-op defaults and lift the GenericSchur row/col permutations into prow/pcol locals (nothing on the BLAS path, which never reads them), so every local read in the symmetric undo block is unconditionally defined. Behavior is unchanged: the seeds are only live when do_balancing is false (where the undo block does not run), and the BLAS vs GenericSchur branches use exactly the values they used before. Verified Julia 1.12.6 (released SciMLTesting 1.7.0, JET 0.11.5): report_package typo mode goes from 20 reports to 0. Verified Julia 1.10.11 numerics unchanged: strided-BLAS balancing relerr 3.3e-16, GenericSchur (BigFloat) balancing relerr 1.1e-16, no-balancing relerr 1.6e-16 vs reference exp. Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com> --------- Co-authored-by: ChrisRackauckas-Claude <accounts@chrisrackauckas.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

ChrisRackauckas and others added 2 commits June 19, 2026 05:17

ChrisRackauckas-Claude mentioned this pull request Jun 25, 2026

QA: run_qa v1.6 form + ExplicitImports #232

Merged

ChrisRackauckas marked this pull request as ready for review June 26, 2026 11:32

ChrisRackauckas merged commit ee91ce6 into SciML:master Jun 26, 2026
21 of 36 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fix master CI: expv zero-input NaN, JET-on-1.12 QA, GPU-in-All#229

Fix master CI: expv zero-input NaN, JET-on-1.12 QA, GPU-in-All#229
ChrisRackauckas merged 2 commits into
SciML:masterfrom
ChrisRackauckas-Claude:fix-master-ci-1.12-nan-jet-gpu

ChrisRackauckas-Claude commented Jun 19, 2026

Uh oh!

ChrisRackauckas-Claude commented Jun 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Uh oh!

Conversation

ChrisRackauckas-Claude commented Jun 19, 2026

1. Core: NaN == 0.0 at basictests.jl:307 (zero-input expv)

2. QA: 6 JET failures on the 1 (= Julia 1.12) channel

3. Core (windows): "CUDA driver not functional"

Not addressed (reported separately)

Uh oh!

ChrisRackauckas-Claude commented Jun 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

1. Core: `NaN == 0.0` at `basictests.jl:307` (zero-input expv)

2. QA: 6 JET failures on the `1` (= Julia 1.12) channel