Fix master CI: expv zero-input NaN, JET-on-1.12 QA, GPU-in-All#229
Conversation
Three independent master-CI failures on the grouped-tests workflow:
1. Core (NaN == 0.0 at basictests.jl:307, flaky across OS/version).
The real `expv!(w, t::Real, Ks)` method lacked the `iszero(beta)`
guard that the complex method already has. For a zero input vector
`firststep!` skips initializing the Krylov basis V (it only fills it
when beta != 0), so `lmul!(beta, mul!(w, V, expHe))` computes
`0 * <uninitialized memory>`, which is NaN whenever V holds garbage.
Add the same early-return guard, making expv of a zero vector exactly
zero (matching the complex method). Verified: full Core suite now
passes on Julia 1.10 and 1.12 (was reliably NaN on 1.10).
2. QA (6 JET failures on the Julia "1" = 1.12 channel; lts/1.10 was
green). On 1.12 JET traces into LinearAlgebra/Base internals
(`norm(::Vector)` -> `norm_recursive_check` -> `iterate(::Nothing)`,
and the broadcast `unalias`/`copyto_unaliased!` path over
`Adjoint{T, Union{}}`) and reports artifacts there that this package
does not control. Scope the QA `report_call`s to
`target_modules = (ExponentialUtilities,)` — the standard JET-as-QA
configuration — which keeps full coverage of this package's own code.
That scoping surfaced two genuine `may be undefined` findings, fixed
here so the scoped analysis is clean: `si` in `exponential!` and
`order`/`kest` in `kiops` are now unconditionally initialized before
use. Verified: QA passes 17/17 on Julia 1.10 and 1.12.
3. Core (windows, all versions: "CUDA driver not functional"). On
Windows the Core job runs the run_tests "All" aggregate, which pulled
in the GPU group and `using CUDA` errored on the non-GPU runner. Mark
the GPU group `in_all = false` so it only runs under an explicit
GROUP=GPU on the self-hosted CUDA runner. Verified locally: GROUP=All
now runs only Core/basictests.jl, never GPU/gputests.jl.
Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The "Static Arrays" testset compared `expv(t, A, b)` against `exp(t * A) * b` where `exp(t * A)` is StaticArrays' SMatrix matrix exponential. That reference uses an unbalanced scaling-and-squaring Padé path which loses ~7-9 digits for the larger non-normal N=8 cases on macOS + Julia prerelease (relerr ~1e-7..1e-5), tripping the default-tolerance isapprox in "Core (julia pre, macos-latest)". Verified against a 512-bit BigFloat ground truth that the macOS `expv` output is correct to ~1e-16 on both platforms; it was the StaticArrays `exp` reference, not `expv`, that drifted. Switching the reference to the dense LAPACK `exp`, which is balanced and accurate on every platform, keeps this a machine-precision assertion that still catches real `expv` regressions (no tolerance loosening). Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Resolved the last red — Root cause: not an
So Default tolerances for context. The SMatrix Fix (no tolerance loosening): compare Verified locally (
The "Static Arrays" testset itself: Ignore until reviewed by @ChrisRackauckas. |
Convert the hand-rolled test/qa/qa.jl (raw Aqua.test_* + per-function
JET.report_call) to the SciMLTesting 1.6 `run_qa` form and enable the
ExplicitImports checks.
ExplicitImports findings (run vs released SciMLTesting 1.6.0):
* no_stale_explicit_imports: removed the genuinely stale
`ArrayInterface.allowed_getindex` import (never referenced; only
`ismutable`/`allowed_setindex!` are used).
* Made the `for i in 1:13 include("exp_generated/exp_$i.jl")` dynamic
include in exp_noalloc.jl static (13 literal includes) so the module is
analyzable — this unblocked no_implicit_imports and
no_stale_explicit_imports (previously UnanalyzableModuleException).
Verified Higham2005 matrix-exp still matches Base `exp` to ~6.7e-16.
* all_qualified_accesses_via_owners / all_qualified_accesses_are_public /
all_explicit_imports_are_public: ignore-listed other packages' non-public
names (Base / LinearAlgebra(.BLAS/.LAPACK, incl. Stegr submodule) /
ArrayInterface / libblastrampoline_jll); they go public as the base libs
declare them.
* no_implicit_imports: ~31 implicit names from `using LinearAlgebra,
SparseArrays, Printf, PrecompileTools`. Making them explicit is a large
refactor; marked ei_broken and tracked in SciML#231 (auto-flags when fixed).
Deps: test/qa/Project.toml SciMLTesting compat -> "1.6" (Aqua + ExplicitImports
are transitive via SciMLTesting; Aqua kept a direct dep so the ambiguities
sub-check's child process can resolve it; JET kept for the JET check). Root
Project.toml SciMLTesting compat -> "1.6".
QA group on Julia 1.10 (lts), released SciMLTesting 1.6.0:
Quality Assurance | 17 Pass, 1 Broken, 0 Fail, 0 Error (no_implicit_imports
broken per SciML#231). On Julia 1.12 the JET typo check reports pre-existing
"may be undefined" findings (kiops order/kest, Higham2005 ilo/ihi/scale/bal);
master is already red there and the source fixes live in draft PR SciML#229.
Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* QA: run_qa v1.6 form + ExplicitImports
Convert the hand-rolled test/qa/qa.jl (raw Aqua.test_* + per-function
JET.report_call) to the SciMLTesting 1.6 `run_qa` form and enable the
ExplicitImports checks.
ExplicitImports findings (run vs released SciMLTesting 1.6.0):
* no_stale_explicit_imports: removed the genuinely stale
`ArrayInterface.allowed_getindex` import (never referenced; only
`ismutable`/`allowed_setindex!` are used).
* Made the `for i in 1:13 include("exp_generated/exp_$i.jl")` dynamic
include in exp_noalloc.jl static (13 literal includes) so the module is
analyzable — this unblocked no_implicit_imports and
no_stale_explicit_imports (previously UnanalyzableModuleException).
Verified Higham2005 matrix-exp still matches Base `exp` to ~6.7e-16.
* all_qualified_accesses_via_owners / all_qualified_accesses_are_public /
all_explicit_imports_are_public: ignore-listed other packages' non-public
names (Base / LinearAlgebra(.BLAS/.LAPACK, incl. Stegr submodule) /
ArrayInterface / libblastrampoline_jll); they go public as the base libs
declare them.
* no_implicit_imports: ~31 implicit names from `using LinearAlgebra,
SparseArrays, Printf, PrecompileTools`. Making them explicit is a large
refactor; marked ei_broken and tracked in #231 (auto-flags when fixed).
Deps: test/qa/Project.toml SciMLTesting compat -> "1.6" (Aqua + ExplicitImports
are transitive via SciMLTesting; Aqua kept a direct dep so the ambiguities
sub-check's child process can resolve it; JET kept for the JET check). Root
Project.toml SciMLTesting compat -> "1.6".
QA group on Julia 1.10 (lts), released SciMLTesting 1.6.0:
Quality Assurance | 17 Pass, 1 Broken, 0 Fail, 0 Error (no_implicit_imports
broken per #231). On Julia 1.12 the JET typo check reports pre-existing
"may be undefined" findings (kiops order/kest, Higham2005 ilo/ihi/scale/bal);
master is already red there and the source fixes live in draft PR #229.
Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* QA: fix latent undefined-balancing locals in exponential!(::ExpMethodHigham2005)
The run_qa v1.6 conversion runs JET in report_package typo mode, which analyzes
each method signature in isolation. The hand-rolled qa.jl this replaced used
JET.report_call(exponential!, (Matrix{Float64},)), where ExpMethodHigham2005(A)
sets do_balancing = (A isa StridedMatrix) as a constant that JET could
constant-propagate, so both `if method.do_balancing` blocks folded to true and
ilo/ihi/scale/bal were seen as always defined. In report_package the method is
analyzed with an abstract ExpMethodHigham2005, so do_balancing is a runtime
Bool, the two balancing blocks are not provably correlated, and the undo block
reads ilo/ihi/scale/bal as possibly-undefined locals (20 JET typo reports on
Julia 1.12; 1.10 abstract-interp did not reach them).
Seed ilo=1/ihi=n/scale=_scale as no-op defaults and lift the GenericSchur
row/col permutations into prow/pcol locals (nothing on the BLAS path, which
never reads them), so every local read in the symmetric undo block is
unconditionally defined. Behavior is unchanged: the seeds are only live when
do_balancing is false (where the undo block does not run), and the BLAS vs
GenericSchur branches use exactly the values they used before.
Verified Julia 1.12.6 (released SciMLTesting 1.7.0, JET 0.11.5): report_package
typo mode goes from 20 reports to 0. Verified Julia 1.10.11 numerics unchanged:
strided-BLAS balancing relerr 3.3e-16, GenericSchur (BigFloat) balancing relerr
1.1e-16, no-balancing relerr 1.6e-16 vs reference exp.
Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>
---------
Co-authored-by: ChrisRackauckas-Claude <accounts@chrisrackauckas.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Fixes three independent failures on the master grouped-tests CI.
1. Core:
NaN == 0.0atbasictests.jl:307(zero-input expv)The real
expv!(w, t::Real, Ks)method was missing theiszero(beta)guard the complex method already has. For a zero input vectorfirststep!skips initializing the Krylov basisV(it only fillsV[:,1]whenbeta != 0), so the finallmul!(beta, mul!(w, @view(V[:,1:m]), expHe))computes0 * <uninitialized memory>, which isNaNwheneverVholds garbage — explaining why the failure was flaky (heap-dependent: green on some OS/runs,NaNon others). Added the same early-return guard soexpvof a zero vector is exactly zero.Verified locally: full
GROUP=CorePkg.testpasses on Julia 1.10 and 1.12 (it reliably producedNaNon 1.10 before).2. QA: 6 JET failures on the
1(= Julia 1.12) channellts(1.10) was green; only1(1.12) failed. On 1.12 JET traces intoLinearAlgebra/Baseinternals —norm(::Vector)→norm_recursive_check→iterate(::Nothing), and the broadcastunalias/copyto_unaliased!path overAdjoint{T, Union{}}— and reports abstract-interpretation artifacts there that this package does not control. Scoped the QAreport_calls totarget_modules = (ExponentialUtilities,)(the standard JET-as-package-QA configuration), which keeps full coverage of this package's own code.That scoping surfaced two genuine
may be undefinedfindings, which are fixed here so the scoped analysis is clean (not silenced):siinexponential!(exp_baseexp.jl) — conditionally assigned insideif s > 0, used inside a separateif s > 0; now initialized to0unconditionally.order/kestinkiops(kiops.jl) — carried across loop iterations via theorderold/kestold"reuse" flags but only conditionally assigned; now seeded with their first-iteration defaults.Verified locally: QA passes 17/17 on Julia 1.10 and 1.12.
3. Core (windows): "CUDA driver not functional"
On Windows the Core job runs the
run_tests"All" aggregate, which pulled in theGPUgroup, andusing CUDAerrored on the non-GPU runner. Marked theGPUgroupin_all = falseso it only ever runs under an explicitGROUP=GPUon the self-hosted CUDA runner. Verified locally:GROUP=Allnow runs onlyCore/basictests.jl, neverGPU/gputests.jl.Not addressed (reported separately)
Static Arraystolerance failure atbasictests.jl:265(expv(t,A,b) ≈ exp(t*A)*b). On linux Julia 1.13-rc1 the worst relative error is1.25e-15; the macOS-pre failure shows~1e-7. This is a macOS/1.13-rc-specific accuracy difference I could not reproduce or correctly fix on linux, and I will not loosen the tolerance without being able to prove the macOS deviation is benign.Please ignore until reviewed by @ChrisRackauckas.