From 5981faaea4937b917affd336f32852b3d8aa6e16 Mon Sep 17 00:00:00 2001 From: Hugh Perkins Date: Thu, 7 May 2026 00:46:32 -0700 Subject: [PATCH 01/26] [Docs] Add user-guide page for matrix decompositions and solvers MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Documents the per-thread small-matrix decomposition / solver entry points (qd.svd, qd.sym_eig, qd.polar_decompose, qd.eig, qd.solve). Companion to matrix_vector.md, which covers element-wise / arithmetic operations; this page covers the numerical-algorithm layer (Jacobi sweeps, Gauss elimination, etc.) sitting on top of those. Covers per-op semantics, the supported size set (2 / 3 only — 2 only for general qd.eig), the optional dt argument, sign-convention caveats for SVD / polar (det(R) handling), and what is missing (N>3 symmetric EVD, larger inverse, atomic_cas, 3×3 general eig). Worked examples include closest-rotation extraction (ARAP), 3×3 make_spd via sym_eig + clamp, per-thread linear solve, and 2×2 polar shape-matching. Adds decompositions.md to the Core concepts toctree (next to matrix_vector). --- docs/source/user_guide/decompositions.md | 154 +++++++++++++++++++++++ docs/source/user_guide/index.md | 1 + 2 files changed, 155 insertions(+) create mode 100644 docs/source/user_guide/decompositions.md diff --git a/docs/source/user_guide/decompositions.md b/docs/source/user_guide/decompositions.md new file mode 100644 index 0000000000..39c166979b --- /dev/null +++ b/docs/source/user_guide/decompositions.md @@ -0,0 +1,154 @@ +# Matrix decompositions and solvers + +Small fixed-size matrix decompositions and linear solvers — the kinds of operations a thread does on a 2×2 or 3×3 matrix held in registers. They are a different category from the element-wise / arithmetic matrix operations covered in [matrix_vector](matrix_vector.md): each entry point here implements a *numerical algorithm* (Jacobi sweeps, Gauss elimination, Givens rotations) rather than a single closed-form formula. + +All ops live at the top level (`qd.svd`, `qd.sym_eig`, `qd.polar_decompose`, `qd.eig`, `qd.solve`) and are intended to be called from inside a `@qd.kernel` or `@qd.func`. They run per thread — each thread independently decomposes its own matrix. + +## What's available + +| Op | Operates on | Sizes | Returns | +|---------------------------------|------------------------|--------|--------------------------------------------------------| +| `qd.svd(A)` | square real matrix | 2, 3 | `(U, S, V)` such that `A = U @ S @ V.transpose()` | +| `qd.sym_eig(A)` | symmetric real matrix | 2, 3 | `(eigenvalues, eigenvectors)` (real) | +| `qd.polar_decompose(A)` | square real matrix | 2, 3 | `(R, S)` such that `A = R @ S`, `R` orthogonal, `S` SPD | +| `qd.eig(A)` | square real matrix | 2 | `(eigenvalues, eigenvectors)` (complex, packed) | +| `qd.solve(A, b)` | square `A` + vector `b`| 2, 3 | `x` such that `A @ x = b` | + +A few patterns to note: + +- **Sizes are fixed.** Calling any of these on a matrix outside the supported size set raises an exception at trace time (`"SVD only supports 2D and 3D matrices."`, etc.). Larger matrices need a different path — typically a Jacobi-style sweep applied iteratively, which Quadrants does not currently provide out of the box. +- **All ops accept an optional `dt` argument.** When unspecified, it defaults to `impl.get_runtime().default_fp` — usually `qd.f32` unless overridden in `qd.init()`. Pass `dt=qd.f64` for the high-precision variant; passing `dt=qd.f32` explicitly is also fine if you want to be loud about it. +- **Output dimensions match the input dimensions.** A 3×3 input yields 3×3 outputs (and a length-3 vector for `solve` / eigenvalues); a 2×2 input yields 2×2 outputs. +- **Real matrices only.** `qd.eig` returns complex results in a packed real layout (see below); the others all assume real-valued input and return real-valued output. + +## Semantics + +### `qd.svd(A, dt=None)` + +Singular value decomposition — produces `(U, S, V)` such that `A = U @ S @ V.transpose()`, with: + +- `U` orthogonal (`U @ U.transpose() = I`). +- `V` orthogonal. +- `S` diagonal, with non-negative singular values. + +Sizes 2 and 3 use closed-form / Jacobi-style implementations specialized per dimension. Sign convention for `U` and `V` is the implementation's natural one and is **not** guaranteed to enforce `det(U) = det(V) = +1`; if you depend on a particular handedness (e.g. for an ARAP rotation `R = U @ V.transpose()`), check it explicitly and flip a column if needed. + +Singular values come out in implementation order — confirm via test if you depend on `S[0,0] >= S[1,1] >= S[2,2]`. + +### `qd.sym_eig(A, dt=None)` + +Symmetric eigendecomposition — for a real symmetric `A`, returns `(eigenvalues, eigenvectors)` with: + +- `eigenvalues`: a `Vector(n)` of real eigenvalues. +- `eigenvectors`: a `Matrix(n, n)` whose columns are the corresponding orthonormal eigenvectors. + +For 3×3, eigenvalues come out sorted in ascending order (the implementation explicitly sorts at the end). The 2×2 path does not perform an explicit sort — verify with a test if you need a particular order in 2D. + +`A` is *assumed* symmetric; the implementation does not symmetrize first. If your matrix is only approximately symmetric (e.g. accumulated floating-point error), explicitly compute `(A + A.transpose()) * 0.5` before calling. + +### `qd.polar_decompose(A, dt=None)` + +Polar decomposition — produces `(R, S)` such that `A = R @ S`, with: + +- `R` orthogonal (the closest-rotation factor, modulo handedness). +- `S` symmetric positive semi-definite (the stretch factor). + +Built on top of SVD: `A = U @ Σ @ Vᵀ` ⇒ `R = U @ Vᵀ`, `S = V @ Σ @ Vᵀ`. The same caveats about sign convention as `qd.svd` apply — `R` is the closest orthogonal factor to `A` but is not guaranteed to be a proper rotation (positive determinant) without a manual fix-up. + +### `qd.eig(A, dt=None)` + +General eigendecomposition for **2×2 only**. Returns: + +- `eigenvalues`: a `Matrix(n, 2)` where row `i` is `(real_part, imaginary_part)` of the i-th eigenvalue. +- `eigenvectors`: a `Matrix(n*2, n)` where each column is an eigenvector with each entry expanded into two scalars (real, imaginary). For a 2×2 input, the eigenvector matrix is 4×2. + +Eigenvalues of a real 2×2 matrix come in complex-conjugate pairs when the discriminant is negative; the packed layout keeps the function single-return-shape. Calling with `n=3` raises — for 3×3 general (non-symmetric) eigendecomposition, there is no Quadrants entry point today; the canonical path is `qd.sym_eig` if your matrix happens to be symmetric. + +### `qd.solve(A, b, dt=None)` + +Direct solve of `A @ x = b` via Gauss elimination with partial pivoting. Returns the solution vector `x`. + +- Sizes 2 and 3. +- The implementation asserts `A.n == A.m` and `A.m == b.n`. +- Singular `A` triggers an `assert` failure (`"Matrix is singular in linear solve."`); the assert is active by default and is your only signal of singularity. +- Use this for one-off small solves; for systems that decompose once and back-solve many times, currently you have to bake the LU yourself (or just call `qd.solve` per b — for 2×2 / 3×3 the gain from caching the LU is negligible compared to the launch). + +## Examples + +### Closest rotation to a 3×3 matrix (ARAP) + +```python +@qd.func +def closest_rotation(A: qd.types.matrix(3, 3, qd.f64)) -> qd.types.matrix(3, 3, qd.f64): + U, S, V = qd.svd(A, dt=qd.f64) + R = U @ V.transpose() + if R.determinant() < 0.0: + V[:, 2] *= -1.0 + R = U @ V.transpose() + return R +``` + +The `det(R) < 0` branch fixes the handedness when SVD's sign convention produces a reflection rather than a rotation — a standard ARAP / IPC trick. + +### Project to symmetric positive semi-definite (`make_spd`) + +```python +@qd.func +def make_spd_3x3(H: qd.types.matrix(3, 3, qd.f64)) -> qd.types.matrix(3, 3, qd.f64): + eigvals, Q = qd.sym_eig(H, dt=qd.f64) + for i in qd.static(range(3)): + if eigvals[i] < 0.0: + eigvals[i] = 0.0 + Lambda = qd.Matrix.zero(qd.f64, 3, 3) + for i in qd.static(range(3)): + Lambda[i, i] = eigvals[i] + return Q @ Lambda @ Q.transpose() +``` + +Used by IPC-style methods to project an indefinite Hessian to its closest SPD approximation per element. Note the size cap — only 3×3 today, since `qd.sym_eig` itself caps at 3×3. + +### Per-thread linear solve + +```python +@qd.kernel +def solve_each(A_field: qd.types.NDArray[qd.types.matrix(2, 2, qd.f32), 1], + b_field: qd.types.NDArray[qd.types.vector(2, qd.f32), 1], + x_field: qd.types.NDArray[qd.types.vector(2, qd.f32), 1]) -> None: + for i in range(A_field.shape[0]): + x_field[i] = qd.solve(A_field[i], b_field[i]) +``` + +Each thread does an independent Gauss elimination on its own 2×2 system. For larger systems a CG / PCG iteration over the whole array is the standard Quadrants pattern; `qd.solve` is for the per-element case. + +### 2×2 polar decomposition for shape matching + +```python +@qd.func +def shape_match(A: qd.types.matrix(2, 2, qd.f32)) -> qd.types.matrix(2, 2, qd.f32): + R, _ = qd.polar_decompose(A) + return R +``` + +The rotation factor `R` from `A = R @ S` is the rigid alignment that minimises `‖R - A‖_F` — the building block of position-based dynamics shape-matching. + +## Sizes, performance, portability + +- **Size cap is the dominant constraint.** For matrices outside the supported sizes, you currently have to write your own Jacobi sweep (for symmetric EVD up to ~12×12) or LU / Cholesky (for general inverse / solve). qipc maintains a Jacobi EVD that handles sizes up to 12; it is being considered for upstreaming but is not in `quadrants` today. +- **Compile time.** Each call is unrolled per thread, so a kernel that calls `qd.svd` on a 3×3 matrix per element compiles a moderately large block of straight-line code per thread. Compile time is generally fine for these sizes; matrices larger than the cap would not be — register pressure plus unrolling explode quickly. +- **Numerical conditioning.** All implementations use `f32` by default, which is fine for graphics / soft-body simulation but not always sufficient for stiff IPC-style problems. Pass `dt=qd.f64` whenever conditioning matters; the cost on modern GPUs is a constant factor, not order-of-magnitude. +- **Backend portability.** All ops compile cleanly on CUDA, AMDGPU, Vulkan, and Metal — they are pure register arithmetic with no SIMT primitives, so there is no codegen split. Numerical behaviour is bit-exact across backends only for `f64`; `f32` may differ in the last bit because of fused-multiply-add ordering choices. + +## What's missing + +For reference / planning purposes, the gaps users most often hit: + +- **Larger SVD / EVD.** Sizes > 3×3 are unsupported. For symmetric EVD up to ~12×12, a Jacobi sweep is the standard approach (qipc has one); for general SVD, a one-sided Jacobi or QR-with-shifts is the standard approach. +- **`Matrix.inverse` size cap.** Documented in [matrix_vector](matrix_vector.md): the closed-form cofactor inverse caps at 4×4. Larger inverses need an LU or Cholesky factorisation, neither of which is exposed today. +- **`atomic_cas`.** Unrelated to decompositions, but the building block for spinlocks and lock-free dictionaries; not exposed in Python — `qd.atomic_*` covers add / sub / mul / min / max / and / or / xor but does not currently include compare-and-swap. +- **3×3 `qd.eig`.** Only the 2×2 general (non-symmetric) eigendecomposition is provided. For 3×3, use `qd.sym_eig` if your matrix is symmetric. + +## Related + +- [matrix_vector](matrix_vector.md) — element-wise / arithmetic matrix operations (`@`, `inverse`, `determinant`, `transpose`, dot, cross, norm, `outer_product`). Covers the operations whose implementation is a single closed-form formula. +- `qd.math.*` — scalar math helpers (`qd.math.dot`, `qd.math.cross`, `qd.math.length`, etc.) that operate on vectors / matrices but are not decompositions. +- `qd.linalg.*` — sparse-matrix linear algebra (CG, sparse solvers); a different namespace and a different problem class. diff --git a/docs/source/user_guide/index.md b/docs/source/user_guide/index.md index f783e84264..971c52cf67 100644 --- a/docs/source/user_guide/index.md +++ b/docs/source/user_guide/index.md @@ -17,6 +17,7 @@ supported_systems tensor_types scalar_tensors matrix_vector +decompositions tensor compound_types buffer_view From e6c6029d8084ba164b51d2c4e96954de98d68bfa Mon Sep 17 00:00:00 2001 From: Hugh Perkins Date: Thu, 7 May 2026 08:30:05 -0700 Subject: [PATCH 02/26] [Docs] decompositions: drop qipc references --- docs/source/user_guide/decompositions.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/source/user_guide/decompositions.md b/docs/source/user_guide/decompositions.md index 39c166979b..a319cc6dcc 100644 --- a/docs/source/user_guide/decompositions.md +++ b/docs/source/user_guide/decompositions.md @@ -133,7 +133,7 @@ The rotation factor `R` from `A = R @ S` is the rigid alignment that minimises ` ## Sizes, performance, portability -- **Size cap is the dominant constraint.** For matrices outside the supported sizes, you currently have to write your own Jacobi sweep (for symmetric EVD up to ~12×12) or LU / Cholesky (for general inverse / solve). qipc maintains a Jacobi EVD that handles sizes up to 12; it is being considered for upstreaming but is not in `quadrants` today. +- **Size cap is the dominant constraint.** For matrices outside the supported sizes, you currently have to write your own Jacobi sweep (for symmetric EVD up to ~12×12) or LU / Cholesky (for general inverse / solve). - **Compile time.** Each call is unrolled per thread, so a kernel that calls `qd.svd` on a 3×3 matrix per element compiles a moderately large block of straight-line code per thread. Compile time is generally fine for these sizes; matrices larger than the cap would not be — register pressure plus unrolling explode quickly. - **Numerical conditioning.** All implementations use `f32` by default, which is fine for graphics / soft-body simulation but not always sufficient for stiff IPC-style problems. Pass `dt=qd.f64` whenever conditioning matters; the cost on modern GPUs is a constant factor, not order-of-magnitude. - **Backend portability.** All ops compile cleanly on CUDA, AMDGPU, Vulkan, and Metal — they are pure register arithmetic with no SIMT primitives, so there is no codegen split. Numerical behaviour is bit-exact across backends only for `f64`; `f32` may differ in the last bit because of fused-multiply-add ordering choices. @@ -142,7 +142,7 @@ The rotation factor `R` from `A = R @ S` is the rigid alignment that minimises ` For reference / planning purposes, the gaps users most often hit: -- **Larger SVD / EVD.** Sizes > 3×3 are unsupported. For symmetric EVD up to ~12×12, a Jacobi sweep is the standard approach (qipc has one); for general SVD, a one-sided Jacobi or QR-with-shifts is the standard approach. +- **Larger SVD / EVD.** Sizes > 3×3 are unsupported. For symmetric EVD up to ~12×12, a Jacobi sweep is the standard approach; for general SVD, a one-sided Jacobi or QR-with-shifts is the standard approach. - **`Matrix.inverse` size cap.** Documented in [matrix_vector](matrix_vector.md): the closed-form cofactor inverse caps at 4×4. Larger inverses need an LU or Cholesky factorisation, neither of which is exposed today. - **`atomic_cas`.** Unrelated to decompositions, but the building block for spinlocks and lock-free dictionaries; not exposed in Python — `qd.atomic_*` covers add / sub / mul / min / max / and / or / xor but does not currently include compare-and-swap. - **3×3 `qd.eig`.** Only the 2×2 general (non-symmetric) eigendecomposition is provided. For 3×3, use `qd.sym_eig` if your matrix is symmetric. From faa7fbbec22cde33d07a889efd09165940bbc1b7 Mon Sep 17 00:00:00 2001 From: Hugh Perkins Date: Thu, 7 May 2026 08:33:23 -0700 Subject: [PATCH 03/26] [Docs] decompositions: replace ambiguous 'size' with 'dimension' (the matrix dim n) --- docs/source/user_guide/decompositions.md | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/docs/source/user_guide/decompositions.md b/docs/source/user_guide/decompositions.md index a319cc6dcc..2ac6ca89d6 100644 --- a/docs/source/user_guide/decompositions.md +++ b/docs/source/user_guide/decompositions.md @@ -6,7 +6,7 @@ All ops live at the top level (`qd.svd`, `qd.sym_eig`, `qd.polar_decompose`, `qd ## What's available -| Op | Operates on | Sizes | Returns | +| Op | Operates on | Dim n | Returns | |---------------------------------|------------------------|--------|--------------------------------------------------------| | `qd.svd(A)` | square real matrix | 2, 3 | `(U, S, V)` such that `A = U @ S @ V.transpose()` | | `qd.sym_eig(A)` | symmetric real matrix | 2, 3 | `(eigenvalues, eigenvectors)` (real) | @@ -14,9 +14,9 @@ All ops live at the top level (`qd.svd`, `qd.sym_eig`, `qd.polar_decompose`, `qd | `qd.eig(A)` | square real matrix | 2 | `(eigenvalues, eigenvectors)` (complex, packed) | | `qd.solve(A, b)` | square `A` + vector `b`| 2, 3 | `x` such that `A @ x = b` | -A few patterns to note: +The "Dim n" column is the matrix dimension — `n = 2` means a 2×2 input, `n = 3` means 3×3. A few patterns to note: -- **Sizes are fixed.** Calling any of these on a matrix outside the supported size set raises an exception at trace time (`"SVD only supports 2D and 3D matrices."`, etc.). Larger matrices need a different path — typically a Jacobi-style sweep applied iteratively, which Quadrants does not currently provide out of the box. +- **Dimensions are fixed.** Calling any of these on a matrix outside the supported dimensions raises an exception at trace time (`"SVD only supports 2D and 3D matrices."`, etc.). Larger matrices need a different path — typically a Jacobi-style sweep applied iteratively, which Quadrants does not currently provide out of the box. - **All ops accept an optional `dt` argument.** When unspecified, it defaults to `impl.get_runtime().default_fp` — usually `qd.f32` unless overridden in `qd.init()`. Pass `dt=qd.f64` for the high-precision variant; passing `dt=qd.f32` explicitly is also fine if you want to be loud about it. - **Output dimensions match the input dimensions.** A 3×3 input yields 3×3 outputs (and a length-3 vector for `solve` / eigenvalues); a 2×2 input yields 2×2 outputs. - **Real matrices only.** `qd.eig` returns complex results in a packed real layout (see below); the others all assume real-valued input and return real-valued output. @@ -31,7 +31,7 @@ Singular value decomposition — produces `(U, S, V)` such that `A = U @ S @ V.t - `V` orthogonal. - `S` diagonal, with non-negative singular values. -Sizes 2 and 3 use closed-form / Jacobi-style implementations specialized per dimension. Sign convention for `U` and `V` is the implementation's natural one and is **not** guaranteed to enforce `det(U) = det(V) = +1`; if you depend on a particular handedness (e.g. for an ARAP rotation `R = U @ V.transpose()`), check it explicitly and flip a column if needed. +Dimensions 2 and 3 use closed-form / Jacobi-style implementations specialized per dimension. Sign convention for `U` and `V` is the implementation's natural one and is **not** guaranteed to enforce `det(U) = det(V) = +1`; if you depend on a particular handedness (e.g. for an ARAP rotation `R = U @ V.transpose()`), check it explicitly and flip a column if needed. Singular values come out in implementation order — confirm via test if you depend on `S[0,0] >= S[1,1] >= S[2,2]`. @@ -68,7 +68,7 @@ Eigenvalues of a real 2×2 matrix come in complex-conjugate pairs when the discr Direct solve of `A @ x = b` via Gauss elimination with partial pivoting. Returns the solution vector `x`. -- Sizes 2 and 3. +- Dimensions 2 and 3. - The implementation asserts `A.n == A.m` and `A.m == b.n`. - Singular `A` triggers an `assert` failure (`"Matrix is singular in linear solve."`); the assert is active by default and is your only signal of singularity. - Use this for one-off small solves; for systems that decompose once and back-solve many times, currently you have to bake the LU yourself (or just call `qd.solve` per b — for 2×2 / 3×3 the gain from caching the LU is negligible compared to the launch). @@ -105,7 +105,7 @@ def make_spd_3x3(H: qd.types.matrix(3, 3, qd.f64)) -> qd.types.matrix(3, 3, qd.f return Q @ Lambda @ Q.transpose() ``` -Used by IPC-style methods to project an indefinite Hessian to its closest SPD approximation per element. Note the size cap — only 3×3 today, since `qd.sym_eig` itself caps at 3×3. +Used by IPC-style methods to project an indefinite Hessian to its closest SPD approximation per element. Note the dimension cap — only 3×3 today, since `qd.sym_eig` itself caps at 3×3. ### Per-thread linear solve @@ -131,10 +131,10 @@ def shape_match(A: qd.types.matrix(2, 2, qd.f32)) -> qd.types.matrix(2, 2, qd.f3 The rotation factor `R` from `A = R @ S` is the rigid alignment that minimises `‖R - A‖_F` — the building block of position-based dynamics shape-matching. -## Sizes, performance, portability +## Dimensions, performance, portability -- **Size cap is the dominant constraint.** For matrices outside the supported sizes, you currently have to write your own Jacobi sweep (for symmetric EVD up to ~12×12) or LU / Cholesky (for general inverse / solve). -- **Compile time.** Each call is unrolled per thread, so a kernel that calls `qd.svd` on a 3×3 matrix per element compiles a moderately large block of straight-line code per thread. Compile time is generally fine for these sizes; matrices larger than the cap would not be — register pressure plus unrolling explode quickly. +- **Dimension cap is the dominant constraint.** For matrices outside the supported dimensions, you currently have to write your own Jacobi sweep (for symmetric EVD up to ~12×12) or LU / Cholesky (for general inverse / solve). +- **Compile time.** Each call is unrolled per thread, so a kernel that calls `qd.svd` on a 3×3 matrix per element compiles a moderately large block of straight-line code per thread. Compile time is generally fine at these dimensions; matrices larger than the cap would not be — register pressure plus unrolling explode quickly. - **Numerical conditioning.** All implementations use `f32` by default, which is fine for graphics / soft-body simulation but not always sufficient for stiff IPC-style problems. Pass `dt=qd.f64` whenever conditioning matters; the cost on modern GPUs is a constant factor, not order-of-magnitude. - **Backend portability.** All ops compile cleanly on CUDA, AMDGPU, Vulkan, and Metal — they are pure register arithmetic with no SIMT primitives, so there is no codegen split. Numerical behaviour is bit-exact across backends only for `f64`; `f32` may differ in the last bit because of fused-multiply-add ordering choices. @@ -142,8 +142,8 @@ The rotation factor `R` from `A = R @ S` is the rigid alignment that minimises ` For reference / planning purposes, the gaps users most often hit: -- **Larger SVD / EVD.** Sizes > 3×3 are unsupported. For symmetric EVD up to ~12×12, a Jacobi sweep is the standard approach; for general SVD, a one-sided Jacobi or QR-with-shifts is the standard approach. -- **`Matrix.inverse` size cap.** Documented in [matrix_vector](matrix_vector.md): the closed-form cofactor inverse caps at 4×4. Larger inverses need an LU or Cholesky factorisation, neither of which is exposed today. +- **Larger SVD / EVD.** Dimensions above 3 are unsupported. For symmetric EVD up to ~12×12, a Jacobi sweep is the standard approach; for general SVD, a one-sided Jacobi or QR-with-shifts is the standard approach. +- **`Matrix.inverse` dimension cap.** Documented in [matrix_vector](matrix_vector.md): the closed-form cofactor inverse caps at 4×4. Larger inverses need an LU or Cholesky factorisation, neither of which is exposed today. - **`atomic_cas`.** Unrelated to decompositions, but the building block for spinlocks and lock-free dictionaries; not exposed in Python — `qd.atomic_*` covers add / sub / mul / min / max / and / or / xor but does not currently include compare-and-swap. - **3×3 `qd.eig`.** Only the 2×2 general (non-symmetric) eigendecomposition is provided. For 3×3, use `qd.sym_eig` if your matrix is symmetric. From 4a729de2188db346cd84c79f90a94415b9bf53c1 Mon Sep 17 00:00:00 2001 From: Hugh Perkins Date: Thu, 7 May 2026 08:34:13 -0700 Subject: [PATCH 04/26] [Docs] decompositions: drop redundant 'fixed-size' qualifier in intro --- docs/source/user_guide/decompositions.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/user_guide/decompositions.md b/docs/source/user_guide/decompositions.md index 2ac6ca89d6..f88560ff3e 100644 --- a/docs/source/user_guide/decompositions.md +++ b/docs/source/user_guide/decompositions.md @@ -1,6 +1,6 @@ # Matrix decompositions and solvers -Small fixed-size matrix decompositions and linear solvers — the kinds of operations a thread does on a 2×2 or 3×3 matrix held in registers. They are a different category from the element-wise / arithmetic matrix operations covered in [matrix_vector](matrix_vector.md): each entry point here implements a *numerical algorithm* (Jacobi sweeps, Gauss elimination, Givens rotations) rather than a single closed-form formula. +Small matrix decompositions and linear solvers — the kinds of operations a thread does on a 2×2 or 3×3 matrix held in registers. They are a different category from the element-wise / arithmetic matrix operations covered in [matrix_vector](matrix_vector.md): each entry point here implements a *numerical algorithm* (Jacobi sweeps, Gauss elimination, Givens rotations) rather than a single closed-form formula. All ops live at the top level (`qd.svd`, `qd.sym_eig`, `qd.polar_decompose`, `qd.eig`, `qd.solve`) and are intended to be called from inside a `@qd.kernel` or `@qd.func`. They run per thread — each thread independently decomposes its own matrix. From 91a183c36c2cf85d7cfb4f890caebbff5989a442 Mon Sep 17 00:00:00 2001 From: Hugh Perkins Date: Thu, 7 May 2026 09:45:16 -0700 Subject: [PATCH 05/26] [Docs] decompositions: switch to unambiguous 'shape' (e.g. 2x2, 3x3) terminology --- docs/source/user_guide/decompositions.md | 36 ++++++++++++------------ 1 file changed, 18 insertions(+), 18 deletions(-) diff --git a/docs/source/user_guide/decompositions.md b/docs/source/user_guide/decompositions.md index f88560ff3e..ab3b02c6ac 100644 --- a/docs/source/user_guide/decompositions.md +++ b/docs/source/user_guide/decompositions.md @@ -6,19 +6,19 @@ All ops live at the top level (`qd.svd`, `qd.sym_eig`, `qd.polar_decompose`, `qd ## What's available -| Op | Operates on | Dim n | Returns | -|---------------------------------|------------------------|--------|--------------------------------------------------------| -| `qd.svd(A)` | square real matrix | 2, 3 | `(U, S, V)` such that `A = U @ S @ V.transpose()` | -| `qd.sym_eig(A)` | symmetric real matrix | 2, 3 | `(eigenvalues, eigenvectors)` (real) | -| `qd.polar_decompose(A)` | square real matrix | 2, 3 | `(R, S)` such that `A = R @ S`, `R` orthogonal, `S` SPD | -| `qd.eig(A)` | square real matrix | 2 | `(eigenvalues, eigenvectors)` (complex, packed) | -| `qd.solve(A, b)` | square `A` + vector `b`| 2, 3 | `x` such that `A @ x = b` | +| Op | Operates on | Shapes | Returns | +|---------------------------------|------------------------|---------------|--------------------------------------------------------| +| `qd.svd(A)` | square real matrix | 2×2, 3×3 | `(U, S, V)` such that `A = U @ S @ V.transpose()` | +| `qd.sym_eig(A)` | symmetric real matrix | 2×2, 3×3 | `(eigenvalues, eigenvectors)` (real) | +| `qd.polar_decompose(A)` | square real matrix | 2×2, 3×3 | `(R, S)` such that `A = R @ S`, `R` orthogonal, `S` SPD | +| `qd.eig(A)` | square real matrix | 2×2 | `(eigenvalues, eigenvectors)` (complex, packed) | +| `qd.solve(A, b)` | square `A` + vector `b`| 2×2, 3×3 | `x` such that `A @ x = b` | -The "Dim n" column is the matrix dimension — `n = 2` means a 2×2 input, `n = 3` means 3×3. A few patterns to note: +A few patterns to note: -- **Dimensions are fixed.** Calling any of these on a matrix outside the supported dimensions raises an exception at trace time (`"SVD only supports 2D and 3D matrices."`, etc.). Larger matrices need a different path — typically a Jacobi-style sweep applied iteratively, which Quadrants does not currently provide out of the box. +- **Shapes are fixed.** Calling any of these on a matrix outside the supported shapes raises an exception at trace time (`"SVD only supports 2D and 3D matrices."`, etc.). Larger matrices need a different path — typically a Jacobi-style sweep applied iteratively, which Quadrants does not currently provide out of the box. - **All ops accept an optional `dt` argument.** When unspecified, it defaults to `impl.get_runtime().default_fp` — usually `qd.f32` unless overridden in `qd.init()`. Pass `dt=qd.f64` for the high-precision variant; passing `dt=qd.f32` explicitly is also fine if you want to be loud about it. -- **Output dimensions match the input dimensions.** A 3×3 input yields 3×3 outputs (and a length-3 vector for `solve` / eigenvalues); a 2×2 input yields 2×2 outputs. +- **Output shape matches the input shape.** A 3×3 input yields 3×3 outputs (and a length-3 vector for `solve` / eigenvalues); a 2×2 input yields 2×2 outputs. - **Real matrices only.** `qd.eig` returns complex results in a packed real layout (see below); the others all assume real-valued input and return real-valued output. ## Semantics @@ -31,7 +31,7 @@ Singular value decomposition — produces `(U, S, V)` such that `A = U @ S @ V.t - `V` orthogonal. - `S` diagonal, with non-negative singular values. -Dimensions 2 and 3 use closed-form / Jacobi-style implementations specialized per dimension. Sign convention for `U` and `V` is the implementation's natural one and is **not** guaranteed to enforce `det(U) = det(V) = +1`; if you depend on a particular handedness (e.g. for an ARAP rotation `R = U @ V.transpose()`), check it explicitly and flip a column if needed. +Shapes 2×2 and 3×3 use closed-form / Jacobi-style implementations specialized per shape. Sign convention for `U` and `V` is the implementation's natural one and is **not** guaranteed to enforce `det(U) = det(V) = +1`; if you depend on a particular handedness (e.g. for an ARAP rotation `R = U @ V.transpose()`), check it explicitly and flip a column if needed. Singular values come out in implementation order — confirm via test if you depend on `S[0,0] >= S[1,1] >= S[2,2]`. @@ -68,7 +68,7 @@ Eigenvalues of a real 2×2 matrix come in complex-conjugate pairs when the discr Direct solve of `A @ x = b` via Gauss elimination with partial pivoting. Returns the solution vector `x`. -- Dimensions 2 and 3. +- Shapes 2×2 and 3×3. - The implementation asserts `A.n == A.m` and `A.m == b.n`. - Singular `A` triggers an `assert` failure (`"Matrix is singular in linear solve."`); the assert is active by default and is your only signal of singularity. - Use this for one-off small solves; for systems that decompose once and back-solve many times, currently you have to bake the LU yourself (or just call `qd.solve` per b — for 2×2 / 3×3 the gain from caching the LU is negligible compared to the launch). @@ -105,7 +105,7 @@ def make_spd_3x3(H: qd.types.matrix(3, 3, qd.f64)) -> qd.types.matrix(3, 3, qd.f return Q @ Lambda @ Q.transpose() ``` -Used by IPC-style methods to project an indefinite Hessian to its closest SPD approximation per element. Note the dimension cap — only 3×3 today, since `qd.sym_eig` itself caps at 3×3. +Used by IPC-style methods to project an indefinite Hessian to its closest SPD approximation per element. Note the shape cap — only 3×3 today, since `qd.sym_eig` itself caps at 3×3. ### Per-thread linear solve @@ -131,10 +131,10 @@ def shape_match(A: qd.types.matrix(2, 2, qd.f32)) -> qd.types.matrix(2, 2, qd.f3 The rotation factor `R` from `A = R @ S` is the rigid alignment that minimises `‖R - A‖_F` — the building block of position-based dynamics shape-matching. -## Dimensions, performance, portability +## Shapes, performance, portability -- **Dimension cap is the dominant constraint.** For matrices outside the supported dimensions, you currently have to write your own Jacobi sweep (for symmetric EVD up to ~12×12) or LU / Cholesky (for general inverse / solve). -- **Compile time.** Each call is unrolled per thread, so a kernel that calls `qd.svd` on a 3×3 matrix per element compiles a moderately large block of straight-line code per thread. Compile time is generally fine at these dimensions; matrices larger than the cap would not be — register pressure plus unrolling explode quickly. +- **Shape cap is the dominant constraint.** For matrices outside the supported shapes, you currently have to write your own Jacobi sweep (for symmetric EVD up to ~12×12) or LU / Cholesky (for general inverse / solve). +- **Compile time.** Each call is unrolled per thread, so a kernel that calls `qd.svd` on a 3×3 matrix per element compiles a moderately large block of straight-line code per thread. Compile time is generally fine at these shapes; matrices larger than the cap would not be — register pressure plus unrolling explode quickly. - **Numerical conditioning.** All implementations use `f32` by default, which is fine for graphics / soft-body simulation but not always sufficient for stiff IPC-style problems. Pass `dt=qd.f64` whenever conditioning matters; the cost on modern GPUs is a constant factor, not order-of-magnitude. - **Backend portability.** All ops compile cleanly on CUDA, AMDGPU, Vulkan, and Metal — they are pure register arithmetic with no SIMT primitives, so there is no codegen split. Numerical behaviour is bit-exact across backends only for `f64`; `f32` may differ in the last bit because of fused-multiply-add ordering choices. @@ -142,8 +142,8 @@ The rotation factor `R` from `A = R @ S` is the rigid alignment that minimises ` For reference / planning purposes, the gaps users most often hit: -- **Larger SVD / EVD.** Dimensions above 3 are unsupported. For symmetric EVD up to ~12×12, a Jacobi sweep is the standard approach; for general SVD, a one-sided Jacobi or QR-with-shifts is the standard approach. -- **`Matrix.inverse` dimension cap.** Documented in [matrix_vector](matrix_vector.md): the closed-form cofactor inverse caps at 4×4. Larger inverses need an LU or Cholesky factorisation, neither of which is exposed today. +- **Larger SVD / EVD.** Shapes above 3×3 are unsupported. For symmetric EVD up to ~12×12, a Jacobi sweep is the standard approach; for general SVD, a one-sided Jacobi or QR-with-shifts is the standard approach. +- **`Matrix.inverse` shape cap.** Documented in [matrix_vector](matrix_vector.md): the closed-form cofactor inverse caps at 4×4. Larger inverses need an LU or Cholesky factorisation, neither of which is exposed today. - **`atomic_cas`.** Unrelated to decompositions, but the building block for spinlocks and lock-free dictionaries; not exposed in Python — `qd.atomic_*` covers add / sub / mul / min / max / and / or / xor but does not currently include compare-and-swap. - **3×3 `qd.eig`.** Only the 2×2 general (non-symmetric) eigendecomposition is provided. For 3×3, use `qd.sym_eig` if your matrix is symmetric. From aa980f1010196da5c1660a9e930e67c97e6c05b6 Mon Sep 17 00:00:00 2001 From: Hugh Perkins Date: Thu, 7 May 2026 09:47:55 -0700 Subject: [PATCH 06/26] [Docs] decompositions: FIXME note on misleading '2D/3D matrices' wording --- docs/source/user_guide/decompositions.md | 1 + 1 file changed, 1 insertion(+) diff --git a/docs/source/user_guide/decompositions.md b/docs/source/user_guide/decompositions.md index ab3b02c6ac..0d50123a9f 100644 --- a/docs/source/user_guide/decompositions.md +++ b/docs/source/user_guide/decompositions.md @@ -17,6 +17,7 @@ All ops live at the top level (`qd.svd`, `qd.sym_eig`, `qd.polar_decompose`, `qd A few patterns to note: - **Shapes are fixed.** Calling any of these on a matrix outside the supported shapes raises an exception at trace time (`"SVD only supports 2D and 3D matrices."`, etc.). Larger matrices need a different path — typically a Jacobi-style sweep applied iteratively, which Quadrants does not currently provide out of the box. + - **FIXME (message wording):** these exception strings are misleading — "2D matrix" / "3D matrix" conventionally means "rank-2 / rank-3 tensor" (any matrix is rank-2), but here the intent is "matrix of shape 2×2 / 3×3". They should be updated to e.g. `"SVD only supports 2×2 and 3×3 matrices."`. Tracked separately; this page reproduces the messages as they are emitted today. - **All ops accept an optional `dt` argument.** When unspecified, it defaults to `impl.get_runtime().default_fp` — usually `qd.f32` unless overridden in `qd.init()`. Pass `dt=qd.f64` for the high-precision variant; passing `dt=qd.f32` explicitly is also fine if you want to be loud about it. - **Output shape matches the input shape.** A 3×3 input yields 3×3 outputs (and a length-3 vector for `solve` / eigenvalues); a 2×2 input yields 2×2 outputs. - **Real matrices only.** `qd.eig` returns complex results in a packed real layout (see below); the others all assume real-valued input and return real-valued output. From 0d9ea6027c5ff610253882461d86ca7ed403a070 Mon Sep 17 00:00:00 2001 From: Hugh Perkins Date: Thu, 7 May 2026 09:49:10 -0700 Subject: [PATCH 07/26] [Docs] decompositions: drop 'tracked separately' from FIXME note --- docs/source/user_guide/decompositions.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/user_guide/decompositions.md b/docs/source/user_guide/decompositions.md index 0d50123a9f..b4fc29fb23 100644 --- a/docs/source/user_guide/decompositions.md +++ b/docs/source/user_guide/decompositions.md @@ -17,7 +17,7 @@ All ops live at the top level (`qd.svd`, `qd.sym_eig`, `qd.polar_decompose`, `qd A few patterns to note: - **Shapes are fixed.** Calling any of these on a matrix outside the supported shapes raises an exception at trace time (`"SVD only supports 2D and 3D matrices."`, etc.). Larger matrices need a different path — typically a Jacobi-style sweep applied iteratively, which Quadrants does not currently provide out of the box. - - **FIXME (message wording):** these exception strings are misleading — "2D matrix" / "3D matrix" conventionally means "rank-2 / rank-3 tensor" (any matrix is rank-2), but here the intent is "matrix of shape 2×2 / 3×3". They should be updated to e.g. `"SVD only supports 2×2 and 3×3 matrices."`. Tracked separately; this page reproduces the messages as they are emitted today. + - **FIXME (message wording):** these exception strings are misleading — "2D matrix" / "3D matrix" conventionally means "rank-2 / rank-3 tensor" (any matrix is rank-2), but here the intent is "matrix of shape 2×2 / 3×3". They should be updated to e.g. `"SVD only supports 2×2 and 3×3 matrices."`. This page reproduces the messages as they are emitted today. - **All ops accept an optional `dt` argument.** When unspecified, it defaults to `impl.get_runtime().default_fp` — usually `qd.f32` unless overridden in `qd.init()`. Pass `dt=qd.f64` for the high-precision variant; passing `dt=qd.f32` explicitly is also fine if you want to be loud about it. - **Output shape matches the input shape.** A 3×3 input yields 3×3 outputs (and a length-3 vector for `solve` / eigenvalues); a 2×2 input yields 2×2 outputs. - **Real matrices only.** `qd.eig` returns complex results in a packed real layout (see below); the others all assume real-valued input and return real-valued output. From d7598efff3b78d4ed5b651967937ded46625cc57 Mon Sep 17 00:00:00 2001 From: Hugh Perkins Date: Thu, 7 May 2026 09:49:53 -0700 Subject: [PATCH 08/26] [Docs] decompositions: drop chatty 'be loud about it' aside --- docs/source/user_guide/decompositions.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/user_guide/decompositions.md b/docs/source/user_guide/decompositions.md index b4fc29fb23..6fa3bf6460 100644 --- a/docs/source/user_guide/decompositions.md +++ b/docs/source/user_guide/decompositions.md @@ -18,7 +18,7 @@ A few patterns to note: - **Shapes are fixed.** Calling any of these on a matrix outside the supported shapes raises an exception at trace time (`"SVD only supports 2D and 3D matrices."`, etc.). Larger matrices need a different path — typically a Jacobi-style sweep applied iteratively, which Quadrants does not currently provide out of the box. - **FIXME (message wording):** these exception strings are misleading — "2D matrix" / "3D matrix" conventionally means "rank-2 / rank-3 tensor" (any matrix is rank-2), but here the intent is "matrix of shape 2×2 / 3×3". They should be updated to e.g. `"SVD only supports 2×2 and 3×3 matrices."`. This page reproduces the messages as they are emitted today. -- **All ops accept an optional `dt` argument.** When unspecified, it defaults to `impl.get_runtime().default_fp` — usually `qd.f32` unless overridden in `qd.init()`. Pass `dt=qd.f64` for the high-precision variant; passing `dt=qd.f32` explicitly is also fine if you want to be loud about it. +- **All ops accept an optional `dt` argument.** When unspecified, it defaults to `impl.get_runtime().default_fp` — usually `qd.f32` unless overridden in `qd.init()`. Pass `dt=qd.f64` for the high-precision variant. - **Output shape matches the input shape.** A 3×3 input yields 3×3 outputs (and a length-3 vector for `solve` / eigenvalues); a 2×2 input yields 2×2 outputs. - **Real matrices only.** `qd.eig` returns complex results in a packed real layout (see below); the others all assume real-valued input and return real-valued output. From aab8ac84cd00dc6a541cc70ad1bddd3679984ff8 Mon Sep 17 00:00:00 2001 From: Hugh Perkins Date: Thu, 7 May 2026 09:52:53 -0700 Subject: [PATCH 09/26] [Docs] decompositions: state SVD ordering precisely (2x2 sorted, 3x3 not) --- docs/source/user_guide/decompositions.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/user_guide/decompositions.md b/docs/source/user_guide/decompositions.md index 6fa3bf6460..0ffa799bb7 100644 --- a/docs/source/user_guide/decompositions.md +++ b/docs/source/user_guide/decompositions.md @@ -34,7 +34,7 @@ Singular value decomposition — produces `(U, S, V)` such that `A = U @ S @ V.t Shapes 2×2 and 3×3 use closed-form / Jacobi-style implementations specialized per shape. Sign convention for `U` and `V` is the implementation's natural one and is **not** guaranteed to enforce `det(U) = det(V) = +1`; if you depend on a particular handedness (e.g. for an ARAP rotation `R = U @ V.transpose()`), check it explicitly and flip a column if needed. -Singular values come out in implementation order — confirm via test if you depend on `S[0,0] >= S[1,1] >= S[2,2]`. +The 2×2 path returns singular values sorted descending (`S[0,0] >= S[1,1]`). The 3×3 path does **not** sort — singular values come out in whatever order the Sifakis algorithm produces them. If you depend on `S[0,0] >= S[1,1] >= S[2,2]` for 3×3, sort explicitly. ### `qd.sym_eig(A, dt=None)` From 0de99f305c5752efcd3ac3c6b270f71438c55c21 Mon Sep 17 00:00:00 2001 From: Hugh Perkins Date: Thu, 7 May 2026 09:54:23 -0700 Subject: [PATCH 10/26] [Docs] decompositions: FIXME on SVD sort-consistency 2x2 vs 3x3 --- docs/source/user_guide/decompositions.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/docs/source/user_guide/decompositions.md b/docs/source/user_guide/decompositions.md index 0ffa799bb7..03e54cb255 100644 --- a/docs/source/user_guide/decompositions.md +++ b/docs/source/user_guide/decompositions.md @@ -36,6 +36,8 @@ Shapes 2×2 and 3×3 use closed-form / Jacobi-style implementations specialized The 2×2 path returns singular values sorted descending (`S[0,0] >= S[1,1]`). The 3×3 path does **not** sort — singular values come out in whatever order the Sifakis algorithm produces them. If you depend on `S[0,0] >= S[1,1] >= S[2,2]` for 3×3, sort explicitly. +**FIXME (sort consistency):** the 2×2 / 3×3 split is an inconsistency in the API — both shapes should either sort descending or both should leave the order to the algorithm. The 2×2 swap is essentially free; making 3×3 sort would cost a few comparisons and column swaps. Pick one and apply it across both shapes (and update this paragraph accordingly). + ### `qd.sym_eig(A, dt=None)` Symmetric eigendecomposition — for a real symmetric `A`, returns `(eigenvalues, eigenvectors)` with: From a259f734d44377a96d7ff0134c49d46570cdb465 Mon Sep 17 00:00:00 2001 From: Hugh Perkins Date: Thu, 7 May 2026 09:54:47 -0700 Subject: [PATCH 11/26] [Docs] decompositions: mention boolean template option in sort-consistency FIXME --- docs/source/user_guide/decompositions.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/user_guide/decompositions.md b/docs/source/user_guide/decompositions.md index 03e54cb255..3688cb1f04 100644 --- a/docs/source/user_guide/decompositions.md +++ b/docs/source/user_guide/decompositions.md @@ -36,7 +36,7 @@ Shapes 2×2 and 3×3 use closed-form / Jacobi-style implementations specialized The 2×2 path returns singular values sorted descending (`S[0,0] >= S[1,1]`). The 3×3 path does **not** sort — singular values come out in whatever order the Sifakis algorithm produces them. If you depend on `S[0,0] >= S[1,1] >= S[2,2]` for 3×3, sort explicitly. -**FIXME (sort consistency):** the 2×2 / 3×3 split is an inconsistency in the API — both shapes should either sort descending or both should leave the order to the algorithm. The 2×2 swap is essentially free; making 3×3 sort would cost a few comparisons and column swaps. Pick one and apply it across both shapes (and update this paragraph accordingly). +**FIXME (sort consistency):** the 2×2 / 3×3 split is an inconsistency in the API — both shapes should either sort descending or both should leave the order to the algorithm (or parametrize it with a boolean template parameter, e.g. `qd.svd(A, sorted=True)`). The 2×2 swap is essentially free; making 3×3 sort would cost a few comparisons and column swaps. Pick one and apply it across both shapes (and update this paragraph accordingly). ### `qd.sym_eig(A, dt=None)` From 95037e2caaa5e92b0f0d06dc9faed4967278f88e Mon Sep 17 00:00:00 2001 From: Hugh Perkins Date: Thu, 7 May 2026 09:56:14 -0700 Subject: [PATCH 12/26] [Docs] decompositions: FIXME on sym_eig sort consistency (also vs svd direction) --- docs/source/user_guide/decompositions.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/docs/source/user_guide/decompositions.md b/docs/source/user_guide/decompositions.md index 3688cb1f04..cbd91be22e 100644 --- a/docs/source/user_guide/decompositions.md +++ b/docs/source/user_guide/decompositions.md @@ -47,6 +47,8 @@ Symmetric eigendecomposition — for a real symmetric `A`, returns `(eigenvalues For 3×3, eigenvalues come out sorted in ascending order (the implementation explicitly sorts at the end). The 2×2 path does not perform an explicit sort — verify with a test if you need a particular order in 2D. +**FIXME (sort consistency):** same kind of inconsistency as `qd.svd`, with the additional twist that the sort directions disagree across ops — `qd.sym_eig` 3×3 sorts *ascending*, while `qd.svd` 2×2 sorts *descending*. Both shapes of `qd.sym_eig` should sort the same way, and ideally `qd.sym_eig` and `qd.svd` should agree on a direction (or parametrize via a boolean template parameter, e.g. `qd.sym_eig(A, sorted=True)`). Pick one and apply it across both shapes and across both ops (and update this paragraph accordingly). + `A` is *assumed* symmetric; the implementation does not symmetrize first. If your matrix is only approximately symmetric (e.g. accumulated floating-point error), explicitly compute `(A + A.transpose()) * 0.5` before calling. ### `qd.polar_decompose(A, dt=None)` From 43a6d4541d55a94f2f03defbfae7c90dc21e75b2 Mon Sep 17 00:00:00 2001 From: Hugh Perkins Date: Thu, 7 May 2026 09:57:28 -0700 Subject: [PATCH 13/26] [Docs] decompositions: qualify qd.solve singular-A assert as debug-only --- docs/source/user_guide/decompositions.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/user_guide/decompositions.md b/docs/source/user_guide/decompositions.md index cbd91be22e..335ac598db 100644 --- a/docs/source/user_guide/decompositions.md +++ b/docs/source/user_guide/decompositions.md @@ -75,7 +75,7 @@ Direct solve of `A @ x = b` via Gauss elimination with partial pivoting. Returns - Shapes 2×2 and 3×3. - The implementation asserts `A.n == A.m` and `A.m == b.n`. -- Singular `A` triggers an `assert` failure (`"Matrix is singular in linear solve."`); the assert is active by default and is your only signal of singularity. +- Singular `A` is checked by a kernel `assert` (`"Matrix is singular in linear solve."`) inside the Gauss-elimination path. Kernel asserts only fire when the runtime is initialised with `qd.init(debug=True)` (see [debug](debug.md)) — under the default `debug=False` a singular input silently produces a divide-by-zero / NaN result with no diagnostic. If you need a signal in production, check singularity explicitly before calling `qd.solve` (e.g. `abs(A.determinant())` against a tolerance), or run development workloads with `debug=True` to catch the case. - Use this for one-off small solves; for systems that decompose once and back-solve many times, currently you have to bake the LU yourself (or just call `qd.solve` per b — for 2×2 / 3×3 the gain from caching the LU is negligible compared to the launch). ## Examples From 12173047ece722ff97d68e77216c3e20eeb9a2a3 Mon Sep 17 00:00:00 2001 From: Hugh Perkins Date: Thu, 7 May 2026 10:00:33 -0700 Subject: [PATCH 14/26] [Docs] decompositions: split qd.solve usage bullet into three clearer ones --- docs/source/user_guide/decompositions.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/docs/source/user_guide/decompositions.md b/docs/source/user_guide/decompositions.md index 335ac598db..d34da9f6c5 100644 --- a/docs/source/user_guide/decompositions.md +++ b/docs/source/user_guide/decompositions.md @@ -76,7 +76,9 @@ Direct solve of `A @ x = b` via Gauss elimination with partial pivoting. Returns - Shapes 2×2 and 3×3. - The implementation asserts `A.n == A.m` and `A.m == b.n`. - Singular `A` is checked by a kernel `assert` (`"Matrix is singular in linear solve."`) inside the Gauss-elimination path. Kernel asserts only fire when the runtime is initialised with `qd.init(debug=True)` (see [debug](debug.md)) — under the default `debug=False` a singular input silently produces a divide-by-zero / NaN result with no diagnostic. If you need a signal in production, check singularity explicitly before calling `qd.solve` (e.g. `abs(A.determinant())` against a tolerance), or run development workloads with `debug=True` to catch the case. -- Use this for one-off small solves; for systems that decompose once and back-solve many times, currently you have to bake the LU yourself (or just call `qd.solve` per b — for 2×2 / 3×3 the gain from caching the LU is negligible compared to the launch). +- Intended for one-off small solves: each call factorises `A` from scratch and back-substitutes for the given `b`. +- There is no separate LU-factor / reuse API. If you want to amortise the factorisation across many right-hand-sides, you have to write the factorisation yourself in user code. +- At 2×2 / 3×3, the factorisation cost is dominated by kernel-launch overhead, so calling `qd.solve` once per `b` is usually fine in practice — the missing factor-reuse API matters more for larger shapes (which `qd.solve` does not support today anyway). ## Examples From 5370a357955533d93fbf198346f92cbdde0ee58a Mon Sep 17 00:00:00 2001 From: Hugh Perkins Date: Thu, 7 May 2026 10:01:20 -0700 Subject: [PATCH 15/26] [Docs] decompositions: drop LU-reuse and launch-overhead asides for qd.solve --- docs/source/user_guide/decompositions.md | 2 -- 1 file changed, 2 deletions(-) diff --git a/docs/source/user_guide/decompositions.md b/docs/source/user_guide/decompositions.md index d34da9f6c5..78b851932a 100644 --- a/docs/source/user_guide/decompositions.md +++ b/docs/source/user_guide/decompositions.md @@ -77,8 +77,6 @@ Direct solve of `A @ x = b` via Gauss elimination with partial pivoting. Returns - The implementation asserts `A.n == A.m` and `A.m == b.n`. - Singular `A` is checked by a kernel `assert` (`"Matrix is singular in linear solve."`) inside the Gauss-elimination path. Kernel asserts only fire when the runtime is initialised with `qd.init(debug=True)` (see [debug](debug.md)) — under the default `debug=False` a singular input silently produces a divide-by-zero / NaN result with no diagnostic. If you need a signal in production, check singularity explicitly before calling `qd.solve` (e.g. `abs(A.determinant())` against a tolerance), or run development workloads with `debug=True` to catch the case. - Intended for one-off small solves: each call factorises `A` from scratch and back-substitutes for the given `b`. -- There is no separate LU-factor / reuse API. If you want to amortise the factorisation across many right-hand-sides, you have to write the factorisation yourself in user code. -- At 2×2 / 3×3, the factorisation cost is dominated by kernel-launch overhead, so calling `qd.solve` once per `b` is usually fine in practice — the missing factor-reuse API matters more for larger shapes (which `qd.solve` does not support today anyway). ## Examples From 5c0244f8d3e2324d201d831da4ed9f6de4791a72 Mon Sep 17 00:00:00 2001 From: Hugh Perkins Date: Thu, 7 May 2026 10:02:19 -0700 Subject: [PATCH 16/26] [Docs] decompositions: align ARAP commentary with code form (R.determinant() < 0.0) --- docs/source/user_guide/decompositions.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/user_guide/decompositions.md b/docs/source/user_guide/decompositions.md index 78b851932a..f0d637956a 100644 --- a/docs/source/user_guide/decompositions.md +++ b/docs/source/user_guide/decompositions.md @@ -93,7 +93,7 @@ def closest_rotation(A: qd.types.matrix(3, 3, qd.f64)) -> qd.types.matrix(3, 3, return R ``` -The `det(R) < 0` branch fixes the handedness when SVD's sign convention produces a reflection rather than a rotation — a standard ARAP / IPC trick. +The `R.determinant() < 0.0` branch fixes the handedness when SVD's sign convention produces a reflection rather than a rotation — a standard ARAP / IPC trick. ### Project to symmetric positive semi-definite (`make_spd`) From a3722b04df00f29fcdc0d927559cddccd019771d Mon Sep 17 00:00:00 2001 From: Hugh Perkins Date: Thu, 7 May 2026 10:02:45 -0700 Subject: [PATCH 17/26] [Docs] decompositions: drop dangling 'one-off' qualifier on qd.solve --- docs/source/user_guide/decompositions.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/user_guide/decompositions.md b/docs/source/user_guide/decompositions.md index f0d637956a..dd0d336b8a 100644 --- a/docs/source/user_guide/decompositions.md +++ b/docs/source/user_guide/decompositions.md @@ -76,7 +76,7 @@ Direct solve of `A @ x = b` via Gauss elimination with partial pivoting. Returns - Shapes 2×2 and 3×3. - The implementation asserts `A.n == A.m` and `A.m == b.n`. - Singular `A` is checked by a kernel `assert` (`"Matrix is singular in linear solve."`) inside the Gauss-elimination path. Kernel asserts only fire when the runtime is initialised with `qd.init(debug=True)` (see [debug](debug.md)) — under the default `debug=False` a singular input silently produces a divide-by-zero / NaN result with no diagnostic. If you need a signal in production, check singularity explicitly before calling `qd.solve` (e.g. `abs(A.determinant())` against a tolerance), or run development workloads with `debug=True` to catch the case. -- Intended for one-off small solves: each call factorises `A` from scratch and back-substitutes for the given `b`. +- Each call factorises `A` from scratch and back-substitutes for the given `b`. ## Examples From 2871e6ac600e0ca21acb81aaa0d79a57fa0f740b Mon Sep 17 00:00:00 2001 From: Hugh Perkins Date: Thu, 7 May 2026 10:03:28 -0700 Subject: [PATCH 18/26] [Docs] decompositions: drop IPC references --- docs/source/user_guide/decompositions.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/source/user_guide/decompositions.md b/docs/source/user_guide/decompositions.md index dd0d336b8a..d89a6f65ef 100644 --- a/docs/source/user_guide/decompositions.md +++ b/docs/source/user_guide/decompositions.md @@ -93,7 +93,7 @@ def closest_rotation(A: qd.types.matrix(3, 3, qd.f64)) -> qd.types.matrix(3, 3, return R ``` -The `R.determinant() < 0.0` branch fixes the handedness when SVD's sign convention produces a reflection rather than a rotation — a standard ARAP / IPC trick. +The `R.determinant() < 0.0` branch fixes the handedness when SVD's sign convention produces a reflection rather than a rotation — a standard ARAP trick. ### Project to symmetric positive semi-definite (`make_spd`) @@ -110,7 +110,7 @@ def make_spd_3x3(H: qd.types.matrix(3, 3, qd.f64)) -> qd.types.matrix(3, 3, qd.f return Q @ Lambda @ Q.transpose() ``` -Used by IPC-style methods to project an indefinite Hessian to its closest SPD approximation per element. Note the shape cap — only 3×3 today, since `qd.sym_eig` itself caps at 3×3. +Used to project an indefinite Hessian to its closest SPD approximation per element. Note the shape cap — only 3×3 today, since `qd.sym_eig` itself caps at 3×3. ### Per-thread linear solve @@ -140,7 +140,7 @@ The rotation factor `R` from `A = R @ S` is the rigid alignment that minimises ` - **Shape cap is the dominant constraint.** For matrices outside the supported shapes, you currently have to write your own Jacobi sweep (for symmetric EVD up to ~12×12) or LU / Cholesky (for general inverse / solve). - **Compile time.** Each call is unrolled per thread, so a kernel that calls `qd.svd` on a 3×3 matrix per element compiles a moderately large block of straight-line code per thread. Compile time is generally fine at these shapes; matrices larger than the cap would not be — register pressure plus unrolling explode quickly. -- **Numerical conditioning.** All implementations use `f32` by default, which is fine for graphics / soft-body simulation but not always sufficient for stiff IPC-style problems. Pass `dt=qd.f64` whenever conditioning matters; the cost on modern GPUs is a constant factor, not order-of-magnitude. +- **Numerical conditioning.** All implementations use `f32` by default, which is fine for graphics / soft-body simulation but not always sufficient for stiff problems. Pass `dt=qd.f64` whenever conditioning matters; the cost on modern GPUs is a constant factor, not order-of-magnitude. - **Backend portability.** All ops compile cleanly on CUDA, AMDGPU, Vulkan, and Metal — they are pure register arithmetic with no SIMT primitives, so there is no codegen split. Numerical behaviour is bit-exact across backends only for `f64`; `f32` may differ in the last bit because of fused-multiply-add ordering choices. ## What's missing From 6243d97c85d0c1d83c700792b8512935111898cd Mon Sep 17 00:00:00 2001 From: Hugh Perkins Date: Thu, 7 May 2026 10:04:19 -0700 Subject: [PATCH 19/26] [Docs] decompositions: drop graphics/soft-body aside in conditioning bullet --- docs/source/user_guide/decompositions.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/user_guide/decompositions.md b/docs/source/user_guide/decompositions.md index d89a6f65ef..e7f21d0ef9 100644 --- a/docs/source/user_guide/decompositions.md +++ b/docs/source/user_guide/decompositions.md @@ -140,7 +140,7 @@ The rotation factor `R` from `A = R @ S` is the rigid alignment that minimises ` - **Shape cap is the dominant constraint.** For matrices outside the supported shapes, you currently have to write your own Jacobi sweep (for symmetric EVD up to ~12×12) or LU / Cholesky (for general inverse / solve). - **Compile time.** Each call is unrolled per thread, so a kernel that calls `qd.svd` on a 3×3 matrix per element compiles a moderately large block of straight-line code per thread. Compile time is generally fine at these shapes; matrices larger than the cap would not be — register pressure plus unrolling explode quickly. -- **Numerical conditioning.** All implementations use `f32` by default, which is fine for graphics / soft-body simulation but not always sufficient for stiff problems. Pass `dt=qd.f64` whenever conditioning matters; the cost on modern GPUs is a constant factor, not order-of-magnitude. +- **Numerical conditioning.** All implementations use `f32` by default. Pass `dt=qd.f64` whenever conditioning matters; the cost on modern GPUs is a constant factor, not order-of-magnitude. - **Backend portability.** All ops compile cleanly on CUDA, AMDGPU, Vulkan, and Metal — they are pure register arithmetic with no SIMT primitives, so there is no codegen split. Numerical behaviour is bit-exact across backends only for `f64`; `f32` may differ in the last bit because of fused-multiply-add ordering choices. ## What's missing From 85004029cf179bde4ac6587237365af848653f9a Mon Sep 17 00:00:00 2001 From: Hugh Perkins Date: Thu, 7 May 2026 10:04:39 -0700 Subject: [PATCH 20/26] [Docs] decompositions: drop trailing 'standard ARAP trick' aside --- docs/source/user_guide/decompositions.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/user_guide/decompositions.md b/docs/source/user_guide/decompositions.md index e7f21d0ef9..8f297ca318 100644 --- a/docs/source/user_guide/decompositions.md +++ b/docs/source/user_guide/decompositions.md @@ -93,7 +93,7 @@ def closest_rotation(A: qd.types.matrix(3, 3, qd.f64)) -> qd.types.matrix(3, 3, return R ``` -The `R.determinant() < 0.0` branch fixes the handedness when SVD's sign convention produces a reflection rather than a rotation — a standard ARAP trick. +The `R.determinant() < 0.0` branch fixes the handedness when SVD's sign convention produces a reflection rather than a rotation. ### Project to symmetric positive semi-definite (`make_spd`) From 2bda0a249326586512d8a503fba82913283aab01 Mon Sep 17 00:00:00 2001 From: Hugh Perkins Date: Thu, 7 May 2026 10:04:57 -0700 Subject: [PATCH 21/26] [Docs] decompositions: trim conditioning bullet to f32-default statement --- docs/source/user_guide/decompositions.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/user_guide/decompositions.md b/docs/source/user_guide/decompositions.md index 8f297ca318..41bc252e66 100644 --- a/docs/source/user_guide/decompositions.md +++ b/docs/source/user_guide/decompositions.md @@ -140,7 +140,7 @@ The rotation factor `R` from `A = R @ S` is the rigid alignment that minimises ` - **Shape cap is the dominant constraint.** For matrices outside the supported shapes, you currently have to write your own Jacobi sweep (for symmetric EVD up to ~12×12) or LU / Cholesky (for general inverse / solve). - **Compile time.** Each call is unrolled per thread, so a kernel that calls `qd.svd` on a 3×3 matrix per element compiles a moderately large block of straight-line code per thread. Compile time is generally fine at these shapes; matrices larger than the cap would not be — register pressure plus unrolling explode quickly. -- **Numerical conditioning.** All implementations use `f32` by default. Pass `dt=qd.f64` whenever conditioning matters; the cost on modern GPUs is a constant factor, not order-of-magnitude. +- **Numerical conditioning.** All implementations use `f32` by default. - **Backend portability.** All ops compile cleanly on CUDA, AMDGPU, Vulkan, and Metal — they are pure register arithmetic with no SIMT primitives, so there is no codegen split. Numerical behaviour is bit-exact across backends only for `f64`; `f32` may differ in the last bit because of fused-multiply-add ordering choices. ## What's missing From 9d6098929dc43b065e12911d6faa4b94f344356f Mon Sep 17 00:00:00 2001 From: Hugh Perkins Date: Thu, 7 May 2026 10:06:26 -0700 Subject: [PATCH 22/26] [Docs] decompositions: remove 'Shape cap is the dominant constraint' bullet --- docs/source/user_guide/decompositions.md | 1 - 1 file changed, 1 deletion(-) diff --git a/docs/source/user_guide/decompositions.md b/docs/source/user_guide/decompositions.md index 41bc252e66..efd9e60067 100644 --- a/docs/source/user_guide/decompositions.md +++ b/docs/source/user_guide/decompositions.md @@ -138,7 +138,6 @@ The rotation factor `R` from `A = R @ S` is the rigid alignment that minimises ` ## Shapes, performance, portability -- **Shape cap is the dominant constraint.** For matrices outside the supported shapes, you currently have to write your own Jacobi sweep (for symmetric EVD up to ~12×12) or LU / Cholesky (for general inverse / solve). - **Compile time.** Each call is unrolled per thread, so a kernel that calls `qd.svd` on a 3×3 matrix per element compiles a moderately large block of straight-line code per thread. Compile time is generally fine at these shapes; matrices larger than the cap would not be — register pressure plus unrolling explode quickly. - **Numerical conditioning.** All implementations use `f32` by default. - **Backend portability.** All ops compile cleanly on CUDA, AMDGPU, Vulkan, and Metal — they are pure register arithmetic with no SIMT primitives, so there is no codegen split. Numerical behaviour is bit-exact across backends only for `f64`; `f32` may differ in the last bit because of fused-multiply-add ordering choices. From 0e33770c4b0954a3859bdaf86d7c732b4527d191 Mon Sep 17 00:00:00 2001 From: Hugh Perkins Date: Thu, 7 May 2026 10:06:47 -0700 Subject: [PATCH 23/26] [Docs] decompositions: remove 'Numerical conditioning' bullet --- docs/source/user_guide/decompositions.md | 1 - 1 file changed, 1 deletion(-) diff --git a/docs/source/user_guide/decompositions.md b/docs/source/user_guide/decompositions.md index efd9e60067..e6db8443bc 100644 --- a/docs/source/user_guide/decompositions.md +++ b/docs/source/user_guide/decompositions.md @@ -139,7 +139,6 @@ The rotation factor `R` from `A = R @ S` is the rigid alignment that minimises ` ## Shapes, performance, portability - **Compile time.** Each call is unrolled per thread, so a kernel that calls `qd.svd` on a 3×3 matrix per element compiles a moderately large block of straight-line code per thread. Compile time is generally fine at these shapes; matrices larger than the cap would not be — register pressure plus unrolling explode quickly. -- **Numerical conditioning.** All implementations use `f32` by default. - **Backend portability.** All ops compile cleanly on CUDA, AMDGPU, Vulkan, and Metal — they are pure register arithmetic with no SIMT primitives, so there is no codegen split. Numerical behaviour is bit-exact across backends only for `f64`; `f32` may differ in the last bit because of fused-multiply-add ordering choices. ## What's missing From dffa6065d9eab36dd69a930aef176a53113c16c3 Mon Sep 17 00:00:00 2001 From: Hugh Perkins Date: Thu, 7 May 2026 10:08:38 -0700 Subject: [PATCH 24/26] [Docs] decompositions: drop overconfident bit-exactness aside --- docs/source/user_guide/decompositions.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/user_guide/decompositions.md b/docs/source/user_guide/decompositions.md index e6db8443bc..8e702115fb 100644 --- a/docs/source/user_guide/decompositions.md +++ b/docs/source/user_guide/decompositions.md @@ -139,7 +139,7 @@ The rotation factor `R` from `A = R @ S` is the rigid alignment that minimises ` ## Shapes, performance, portability - **Compile time.** Each call is unrolled per thread, so a kernel that calls `qd.svd` on a 3×3 matrix per element compiles a moderately large block of straight-line code per thread. Compile time is generally fine at these shapes; matrices larger than the cap would not be — register pressure plus unrolling explode quickly. -- **Backend portability.** All ops compile cleanly on CUDA, AMDGPU, Vulkan, and Metal — they are pure register arithmetic with no SIMT primitives, so there is no codegen split. Numerical behaviour is bit-exact across backends only for `f64`; `f32` may differ in the last bit because of fused-multiply-add ordering choices. +- **Backend portability.** All ops compile cleanly on CUDA, AMDGPU, Vulkan, and Metal — they are pure register arithmetic with no SIMT primitives, so there is no codegen split. ## What's missing From c5a50ca3cd053c5c3d0bfdcbfa799871cbbb9cbb Mon Sep 17 00:00:00 2001 From: Hugh Perkins Date: Thu, 7 May 2026 10:08:59 -0700 Subject: [PATCH 25/26] [Docs] decompositions: soften 'would not be' to 'may not be' for compile-time claim --- docs/source/user_guide/decompositions.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/user_guide/decompositions.md b/docs/source/user_guide/decompositions.md index 8e702115fb..44e40fc998 100644 --- a/docs/source/user_guide/decompositions.md +++ b/docs/source/user_guide/decompositions.md @@ -138,7 +138,7 @@ The rotation factor `R` from `A = R @ S` is the rigid alignment that minimises ` ## Shapes, performance, portability -- **Compile time.** Each call is unrolled per thread, so a kernel that calls `qd.svd` on a 3×3 matrix per element compiles a moderately large block of straight-line code per thread. Compile time is generally fine at these shapes; matrices larger than the cap would not be — register pressure plus unrolling explode quickly. +- **Compile time.** Each call is unrolled per thread, so a kernel that calls `qd.svd` on a 3×3 matrix per element compiles a moderately large block of straight-line code per thread. Compile time is generally fine at these shapes; matrices larger than the cap may not be — register pressure plus unrolling explode quickly. - **Backend portability.** All ops compile cleanly on CUDA, AMDGPU, Vulkan, and Metal — they are pure register arithmetic with no SIMT primitives, so there is no codegen split. ## What's missing From 3d0fa3ef1dd7dfb7005c8f36b0b4ce633aaf8b17 Mon Sep 17 00:00:00 2001 From: Hugh Perkins Date: Thu, 7 May 2026 10:09:21 -0700 Subject: [PATCH 26/26] [Docs] decompositions: remove 'What's missing' section --- docs/source/user_guide/decompositions.md | 9 --------- 1 file changed, 9 deletions(-) diff --git a/docs/source/user_guide/decompositions.md b/docs/source/user_guide/decompositions.md index 44e40fc998..e5cd3ba44a 100644 --- a/docs/source/user_guide/decompositions.md +++ b/docs/source/user_guide/decompositions.md @@ -141,15 +141,6 @@ The rotation factor `R` from `A = R @ S` is the rigid alignment that minimises ` - **Compile time.** Each call is unrolled per thread, so a kernel that calls `qd.svd` on a 3×3 matrix per element compiles a moderately large block of straight-line code per thread. Compile time is generally fine at these shapes; matrices larger than the cap may not be — register pressure plus unrolling explode quickly. - **Backend portability.** All ops compile cleanly on CUDA, AMDGPU, Vulkan, and Metal — they are pure register arithmetic with no SIMT primitives, so there is no codegen split. -## What's missing - -For reference / planning purposes, the gaps users most often hit: - -- **Larger SVD / EVD.** Shapes above 3×3 are unsupported. For symmetric EVD up to ~12×12, a Jacobi sweep is the standard approach; for general SVD, a one-sided Jacobi or QR-with-shifts is the standard approach. -- **`Matrix.inverse` shape cap.** Documented in [matrix_vector](matrix_vector.md): the closed-form cofactor inverse caps at 4×4. Larger inverses need an LU or Cholesky factorisation, neither of which is exposed today. -- **`atomic_cas`.** Unrelated to decompositions, but the building block for spinlocks and lock-free dictionaries; not exposed in Python — `qd.atomic_*` covers add / sub / mul / min / max / and / or / xor but does not currently include compare-and-swap. -- **3×3 `qd.eig`.** Only the 2×2 general (non-symmetric) eigendecomposition is provided. For 3×3, use `qd.sym_eig` if your matrix is symmetric. - ## Related - [matrix_vector](matrix_vector.md) — element-wise / arithmetic matrix operations (`@`, `inverse`, `determinant`, `transpose`, dot, cross, norm, `outer_product`). Covers the operations whose implementation is a single closed-form formula.