Fix OptimizationProblem for SVector/SArray: use out-of-place form#88
Conversation
The newer SciMLBase enforces that immutable types like SVector/SArray
must use out-of-place OptimizationProblem{false}(...) form since they
cannot be mutated in-place.
This fixes precompilation failures where the optimization problem was
being created with the in-place form (auto-detected as true) but using
immutable initial conditions.
Fixes: ChrisRackauckas/InternalJunk#26
The previous attempt used OptimizationProblem{false} directly, but the
SciMLBase API requires that you pass an OptimizationFunction{false} to
the constructor instead.
Changed all usages of SVector/SArray with OptimizationProblem to:
1. Create OptimizationFunction{false}(f, ...) for the function
2. Pass that to OptimizationProblem(opt_f, ...)
GPU memory from gpu.jl (5000 particles × 3 sizes × 3 algorithms) accumulates and causes OOM when lbfgs.jl runs. Add GC.gc(true) between test includes and explicit CUDA.reclaim() at the start of lbfgs.jl to free GPU memory. Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The HybridPSO kernel (gpu_simplebfgs_run!) is the most complex and needs the most GPU memory for JIT compilation. Running it first when GPU memory is most available avoids OOM caused by accumulated kernel compilation caches from gpu.jl (5000 particles × 3 sizes). Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Revert to original test order (gpu → constraints → lbfgs) since reordering caused CUDA context init OOM to cascade to all tests. Keep GC.gc(true) between tests and CUDA.reclaim() in lbfgs.jl. GPU OOM is a pre-existing infrastructure issue — shared self-hosted runners have oversubscribed GPUs. The main branch also fails GPU tests (with a different error: precompilation failure that this PR fixes). Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
GPU OOM AnalysisThe GPU tests are failing with Root cause: The shared GPU runners (arctic1-2, arctic1-3, arctic1-4) have oversubscribed T4 GPUs (14.5 GiB). Other workloads consume most GPU memory, leaving insufficient memory for CUDA kernel JIT compilation. Key findings:
Changes in this push:
Recommendation: The GPU OOM is an infrastructure issue, not a code issue. This PR should be mergeable based on CI (CPU) + format check passing. The GPU runner capacity may need to be addressed separately. |
The T4 runners (14.5 GiB VRAM) are oversubscribed and consistently OOM during CUDA tests. Switch to V100 runners (32 GiB VRAM) which other SciML repos (DiffEqGPU.jl, SciMLSensitivity.jl) also use for memory-intensive GPU jobs. Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
V100 (compute capability 7.0) is not supported on CUDA 13+. Use the generic 'gpu' label (used by DiffEqFlux.jl, NeuralPDE.jl, DeepEquilibriumNetworks.jl) which routes to compatible GPUs. Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
T4 runners (arctic1-*) are oversubscribed and OOM consistently. V100 runners (demeter4-*) have 32GB VRAM but require CUDA 12.x since CUDA 13+ dropped support for compute capability 7.0. Pin JULIA_CUDA_VERSION=12.6 to use the CUDA 12.6 toolkit on V100. Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
JULIA_CUDA_VERSION env var is deprecated and ignored by CUDA.jl. Write LocalPreferences.toml directly to pin CUDA_Runtime_jll to v12.6, which supports V100 (compute 7.0). CUDA 13+ dropped support for compute < 7.5. Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
V100 runners cannot be used — CUDA.jl rejects compute capability 7.0 GPUs with CUDA 13+ drivers regardless of toolkit pinning. T4 (compute 7.5) is the only compatible GPU available. Earlier T4 runs passed 26/27 tests — the OOM failures are transient due to shared runner memory pressure. Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
CI Update — 26/27 GPU Tests PassAfter switching back to T4 runners (
The hybrid test failure is a Note on V100 runners: V100 (compute 7.0) is incompatible with CUDA 13+ drivers on the self-hosted runners. |
Switch to exclusive T4 runner to get dedicated GPU memory, avoiding OOM from shared GPU workloads. Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Pin CUDA.jl to v5.0-5.10 which uses CUDA 12.x runtime, compatible with V100 (compute 7.0). CUDA.jl 5.11+ resolves CUDA_Driver_jll v13.2+ which dropped compute 7.0 support. Use gpu-v100 runners (demeter4-*) which have 32GB VRAM, avoiding the OOM issues on oversubscribed T4 runners. See ChrisRackauckas/InternalJunk#17 for details. Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
This is a Base issue. JuliaGPU/CUDA.jl#3034 JuliaLang/julia#61154 is the fix for the test which should be released soon enough. |
Summary
This PR fixes a precompilation error that causes all tests (including CUDA) to fail.
The Problem
The newer SciMLBase enforces that immutable types like
SVector/SArraymust use the out-of-placeOptimizationProblem{false}(...)form since they cannot be mutated in-place.The error occurred during precompilation:
The Fix
All
OptimizationProblemcalls usingSVector/SArrayinitial conditions are updated to explicitly use the out-of-place formOptimizationProblem{false}(...).Files changed:
src/precompilation.jl- main precompilation workloadtest/gpu.jl- CUDA teststest/regression.jl- regression teststest/reinit.jl- reinit teststest/constraints.jl- constraint teststest/lbfgs.jl- LBFGS testsFixes: https://github.com/ChrisRackauckas/InternalJunk/issues/26