Keep PureKLUFactorization off the direct dual AD path (fixes #1064) by ChrisRackauckas-Claude · Pull Request #1066 · SciML/LinearSolve.jl

ChrisRackauckas-Claude · 2026-06-28T14:10:31Z

Problem

PR #1041 added PureKLUFactorization to the direct dual solve path (_use_direct_dual_solve), taken whenever A carries duals. For ForwardDiff over an implicit ODE solve (e.g. Rodas5P), the Rosenbrock W matrix carries duals, so every solve factorized W in dual arithmetic via PureKLU's pure-Julia KLU.

This is the same problem #1052/#1057 fixed for RFLUFactorization, which was opted out at the same time. PureKLU was explicitly left on the direct path ("PureKLU are unaffected"), but #1064 shows it is affected identically.

Fix

Drop PureKLUFactorization from _use_direct_dual_solve so it stays on the split primal/partials path, which factorizes the primal W once in fast Float64 arithmetic and reuses that factorization across the partial back-solves — instead of re-running the whole numeric factorization in (2N+1)-wide scalar dual arithmetic. Adds a regression test asserting PureKLU is not routed to the direct path (mirrors the #1052 RFLU test).

Local verification (Julia 1.12, Linux)

Exact issue reproducer (tridiagonal heat equation, N=20, Rodas5P, single derivative):

	time	alloc
PureKLU direct (before)	3.79 ms	7.93 MiB
PureKLU split (this PR)	1.56 ms	227 KiB
C-backed `KLUFactorization` (always split)	1.49 ms	222 KiB

Identical derivatives; the split path matches the trusted C-KLU path. test/Core/forwarddiff_overloads.jl passes end-to-end (including the existing PureKLU correctness tests for duals in A / in b / both, and the new #1064 regression test). Runic --check clean.

Scope note (the tradeoff is real, not universal)

The split path is the better default for the sparse-Jacobian / AD-through-implicit-ODE workload PureKLU+ForwardDiff is actually used in. I swept direct-vs-split across sparsity patterns and dual chunk sizes:

High fill (general sparse / PDE stencils / random sparse): split wins, by a margin that grows with fill and chunk size (e.g. rand-1%, n=500, chunk=8: 27.9 ms → 7.7 ms).
Very low fill (tridiagonal / narrow band) with a large dual chunk: the direct path can edge ahead in a single shot (~2×), because the factorization is so cheap the split path's per-solve overhead dominates. This is not the regime PureKLU+AD hits.

There is also a separate latent inefficiency I found while digging: _solve_direct_dual! calls __init on a fresh cache every solve!, re-doing KLU symbolic analysis and re-allocating the whole cache each time — that is the real source of the 7.93 MiB in the issue. Fixing that is out of scope here; I'll file it separately.

Please ignore until reviewed by @ChrisRackauckas.

🤖 Generated with Claude Code

PR SciML#1041 added PureKLUFactorization to the direct dual solve path, taken whenever A carries duals. For ForwardDiff over an implicit ODE solve (e.g. Rodas5P), the Rosenbrock W matrix carries duals, so every solve factorized W in dual arithmetic via PureKLU's pure-Julia KLU. This is the same problem that SciML#1052/SciML#1057 fixed for RFLUFactorization, which was opted out at the same time; PureKLU was left on the direct path ("PureKLU are unaffected"), but SciML#1064 shows it is affected identically. The split primal/partials path factorizes the primal W once in fast Float64 arithmetic and reuses that factorization across the partial back-solves, instead of re-running the whole numeric factorization in (2N+1)-wide scalar dual arithmetic. On the issue's reproducer (tridiagonal heat equation, N=20, Rodas5P, single derivative) the direct path was ~2.4x slower (3.79ms -> 1.56ms) with ~35x the allocation (7.93 MiB -> 227 KiB), and the split path matches the trusted C-backed KLUFactorization (1.49ms / 222 KiB), with identical derivatives. Drop PureKLU from _use_direct_dual_solve so it stays on the split path, and add a regression test asserting it is not routed to the direct path. Note: the split path is the better default for the sparse-Jacobian / AD-through-implicit-ODE workload PureKLU+ForwardDiff is used in. For a single solve of a very-low-fill matrix with a large dual chunk the direct path can edge ahead, but that is not the regime this hits; a separate inefficiency in _solve_direct_dual! (it re-inits a fresh cache on every solve, defeating KLU symbolic reuse) is the real source of the large allocation and is left for a follow-up. Co-Authored-By: Chris Rackauckas <accounts@chrisrackauckas.com>

hersle · 2026-06-28T20:36:35Z

Is this the right fix to make? With this, PureKLU would effectively just do the same split path as KLU, so I don't think PureKLU would be used to its full potential. Also there is no performance gain.

With the different changes in #1065, I am able to make PureKLU continue using the direct path, but fix the unnecessary factorizations. This makes PureKLU faster than KLU, so it makes sense to have both.

ChrisRackauckas · 2026-06-29T12:33:06Z

This isn't, no. But I'm also a bit skeptical of your PR. I think it needs to branch at a higher level and deal with non-promoting ldiv! in PureKLU itself.

hersle · 2026-06-29T12:39:09Z

Yeah I understand that. Please improve it when you find a better way :) At least it makes PureKLU perform (close) to its full potential through ForwardDiff, so any future improvements can check that performance against the reference.

ChrisRackauckas · 2026-06-29T12:41:16Z

Yeah your PR had nothing wrong really, but it's the fact that it exposed it wasn't caching was like "oh, wait, if we need to work around that... that means we should just make the opt out path be one dispatch higher... and that just means we just need to not promote..." etc. etc., so a bit of a yak shave but it would solve it at its root cause and should be much better.

hersle mentioned this pull request Jun 28, 2026

Cache inner LinearCache in DualLinearCache in direct dual path #1065

Merged

ChrisRackauckas closed this Jun 29, 2026

ChrisRackauckas-Claude mentioned this pull request Jun 29, 2026

Route duals-in-b PureKLU solves to a native LinearCache (uses PureKLU 1.1 mixed ldiv!) #1069

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Keep PureKLUFactorization off the direct dual AD path (fixes #1064)#1066

Keep PureKLUFactorization off the direct dual AD path (fixes #1064)#1066
ChrisRackauckas-Claude wants to merge 1 commit into
SciML:mainfrom
ChrisRackauckas-Claude:pureklu-off-direct-dual-path-1064

ChrisRackauckas-Claude commented Jun 28, 2026

Uh oh!

hersle commented Jun 28, 2026

Uh oh!

ChrisRackauckas commented Jun 29, 2026

Uh oh!

hersle commented Jun 29, 2026

Uh oh!

ChrisRackauckas commented Jun 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Uh oh!

Conversation

ChrisRackauckas-Claude commented Jun 28, 2026

Problem

Fix

Local verification (Julia 1.12, Linux)

Scope note (the tradeoff is real, not universal)

Uh oh!

hersle commented Jun 28, 2026

Uh oh!

ChrisRackauckas commented Jun 29, 2026

Uh oh!

hersle commented Jun 29, 2026

Uh oh!

ChrisRackauckas commented Jun 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants