Solvers & Numerical Methods

Comprehensive guide to solvers and numerical methods in the CFD Framework.

Overview

The CFD Framework implements multiple solvers for the incompressible Navier-Stokes equations:

∂u/∂t + (u·∇)u = -∇p/ρ + ν∇²u
∇·u = 0

All solvers support both 2D (nz=1) and 3D (nz>1) grids using a branch-free approach: stride_z=0 when nz==1 causes z-terms to vanish, producing bit-identical 2D results with zero overhead.

Solver Families

1. Explicit Euler Solvers

Simple forward Euler time integration using finite differences.

Equation:

u^(n+1) = u^n + dt * [-(u·∇)u + ν∇²u - ∇p/ρ]

Characteristics:

First-order accurate in time
Simple and fast
Does not enforce incompressibility strictly
Good for learning and debugging

Stability: CFL condition must be satisfied:

dt ≤ min(dx²/(4ν), dx/u_max)

Available Backends:

Solver	Backend	Description
`explicit_euler`	Scalar	Basic implementation
`explicit_euler_optimized`	SIMD	SIMD-optimized (auto-detects AVX2/NEON)
`explicit_euler_omp`	OpenMP	Multi-threaded
`explicit_euler_gpu`	CUDA	GPU-accelerated

2. Projection Method Solvers

Chorin's projection method - properly enforces incompressibility constraint.

Algorithm (Fractional Step Method):

Predictor Step - Compute intermediate velocity (ignore pressure):
```
u* = u^n + dt * [-(u·∇)u + ν∇²u]
```
Pressure Poisson Solve - Enforce incompressibility:
```
∇²p^(n+1) = (ρ/dt) * ∇·u*
```
Corrector Step - Project velocity to divergence-free space:
```
u^(n+1) = u* - (dt/ρ) * ∇p^(n+1)
```

Characteristics:

Second-order accurate in space (central differences)
First-order accurate in time
Properly enforces ∇·u = 0
More expensive due to Poisson solve

Available Backends:

Solver	Backend	Description
`projection`	Scalar	Basic implementation
`projection_optimized`	SIMD	SIMD-optimized (runtime detection: AVX2/NEON)
`projection_omp`	OpenMP	Multi-threaded
`projection_jacobi_gpu`	GPU	CUDA-accelerated (Jacobi iteration)

Linear Solvers (Poisson Equation)

The projection method requires solving the pressure Poisson equation:

∇²p = f

Available Methods

1. Jacobi Iteration

Algorithm:

p_ij^(k+1) = (p_i-1,j + p_i+1,j + p_i,j-1 + p_i,j+1 - h²f_ij) / 4

Characteristics:

Simple, parallel-friendly
Slow convergence: O(n²) iterations for n×n grid
Unconditionally stable
Good for GPU/SIMD

Convergence Rate: ρ ≈ 1 - π²/(2n²)

Usage:

poisson_solver_t* solver = poisson_solver_create(POISSON_METHOD_JACOBI,
                                                 POISSON_BACKEND_SIMD);

2. Successive Over-Relaxation (SOR)

Algorithm:

p_ij^(k+1) = (1-ω)p_ij^k + (ω/4)(p_i-1,j + p_i+1,j + p_i,j-1 + p_i,j+1 - h²f_ij)

Optimal Relaxation (Dirichlet BCs):

ω_opt = 2 / (1 + sin(πh))

Characteristics:

Faster than Jacobi (ω > 1)
Sequential row updates (row j depends on j-1)
Optimal ω depends on problem
SIMD variant uses Block SOR: processes SIMD_WIDTH consecutive cells per block, with intra-block left-neighbor approximation (see Block SOR technical note)

Convergence Rate: ρ ≈ 1 - 2πh (with optimal ω)

Available Backends:

Solver	Backend	Description
`sor_scalar`	Scalar	Sequential Gauss-Seidel + SOR relaxation
`sor_simd`	SIMD	Block SOR (auto-detects AVX2/NEON)

Usage:

poisson_solver_params_t params = poisson_solver_params_default();
params.omega = 1.5;  // Relaxation parameter

// Scalar (fully sequential, best convergence per iteration)
poisson_solver_t* solver = poisson_solver_create(POISSON_METHOD_SOR,
                                                 POISSON_BACKEND_SCALAR);

// SIMD (Block SOR, higher throughput, slightly more iterations)
poisson_solver_t* solver = poisson_solver_create(POISSON_METHOD_SOR,
                                                 POISSON_BACKEND_SIMD);

3. Red-Black SOR

Algorithm:

Split grid into red and black points (checkerboard)
Update all red points (can parallelize)
Update all black points (can parallelize)

Characteristics:

Same convergence as SOR
Parallelizable (red/black decoupled)
SIMD and GPU friendly
Preferred over standard SOR for performance

Usage:

poisson_solver_t* solver = poisson_solver_create(POISSON_METHOD_REDBLACK_SOR,
                                                 POISSON_BACKEND_SIMD);

4. Conjugate Gradient (CG)

Algorithm: Krylov subspace method for symmetric positive definite systems.

Characteristics:

Theoretically converges in n iterations
Practical convergence: O(√κ) where κ = condition number
Memory efficient (no matrix storage)
Best for large grids

Convergence: For Poisson equation, κ ≈ 4(n-1)²/π², giving O(n) iterations.

Usage:

poisson_solver_t* solver = poisson_solver_create(POISSON_METHOD_CG,
                                                 POISSON_BACKEND_SIMD);

5. Preconditioned CG (PCG)

Algorithm: CG with preconditioner M to improve conditioning.

Available Preconditioners:

Jacobi (Diagonal): M = diag(A)
- Simple, cheap per iteration
- Variable coefficients: can reduce iterations
- Constant coefficients (uniform grid): no benefit
SSOR: M = (D + L)D⁻¹(D + U) (future)
ILU: Incomplete LU factorization (future)

Usage:

poisson_solver_params_t params = poisson_solver_params_default();
params.preconditioner = POISSON_PRECOND_JACOBI;  // Enable preconditioning

poisson_solver_t* solver = poisson_solver_create(POISSON_METHOD_CG,
                                                 POISSON_BACKEND_SIMD);
poisson_solver_init(solver, nx, ny, dx, dy, &params);  // Pass params with preconditioner

6. BiCGSTAB

Algorithm: Bi-Conjugate Gradient Stabilized for non-symmetric systems.

Characteristics:

Handles non-symmetric matrices
More robust than CG for difficult problems
Higher cost per iteration than CG

Usage:

poisson_solver_t* solver = poisson_solver_create(POISSON_METHOD_BICGSTAB,
                                                 POISSON_BACKEND_SCALAR);

Linear Solver Performance Comparison

Problem: 65×65 grid, tolerance = 1e-6

Method	Iterations	Time (ms)	Notes
Jacobi	~8000	45	Simple, slow
SOR (ω=1.5)	~2000	15	Good serial performance
Red-Black SOR	~2000	8	Parallelizable
CG	~80	5	Best for large grids
PCG (Jacobi)	~80	5.5	No benefit on uniform grid
BiCGSTAB	~40	4	Fastest convergence

Note: Jacobi preconditioning provides no benefit on uniform grids with constant coefficients (diagonal is constant 4/h²).

Backend Performance

CPU Scalar

Basic C implementation:

No SIMD, no parallelization
Portable to all platforms
Good baseline for correctness

Use when:

Debugging
Small grids (<50×50)
Platform doesn't support SIMD

SIMD (AVX2/NEON)

Vectorized implementation using intrinsics:

AVX2 (x86-64):

4 double operations per instruction (256-bit)
Fused multiply-add (FMA) for performance
2-3x speedup over scalar

NEON (ARM64):

2 double operations per instruction (128-bit)
ARM64 architecture optimization
1.5-2x speedup over scalar

Features:

Aligned memory allocation
Vectorized inner loops
Scalar remainder handling

Use when:

Medium to large grids (>100×100)
Modern CPU (Intel Haswell+ or ARM Cortex-A57+)
Single-threaded workload

Example:

// AVX2 inner loop (simplified)
for (size_t i = 0; i < n; i += 4) {
    __m256d u_vec = _mm256_load_pd(&u[i]);
    __m256d v_vec = _mm256_load_pd(&v[i]);
    __m256d result = _mm256_fmadd_pd(u_vec, v_vec, sum_vec);
    _mm256_store_pd(&out[i], result);
}

OpenMP

Multi-threaded parallelization:

Features:

Thread pool for hot loops
Static scheduling for load balance
Scales with CPU cores

Speedup: ~3-6x on 8-core CPU (depends on grid size and cache)

Use when:

Large grids (>200×200)
Multi-core CPU available
Long simulations

Example:

#pragma omp parallel for schedule(static)
for (int j = 1; j < ny-1; j++) {
    for (int i = 1; i < nx-1; i++) {
        // Stencil operation...
    }
}

CUDA GPU

GPU-accelerated using CUDA:

Features:

Thousands of parallel threads
Coalesced memory access
Shared memory tiling

Speedup: 10-50x for very large grids (>500×500)

Use when:

Very large grids (>200×200)
Many time steps
NVIDIA GPU available

Crossover Point:

Small grids: CPU faster (data transfer overhead)
Grid >200×200: GPU becomes beneficial
Grid >500×500: GPU significantly faster

Example:

// Check if GPU should be used
gpu_config_t config = gpu_config_default();
if (gpu_should_use(&config, nx, ny, num_steps)) {
    solver = cfd_solver_create(registry, "projection_jacobi_gpu");
} else {
    solver = cfd_solver_create(registry, "projection_optimized");
}

Performance Benchmarks

Solver Comparison (100×50 grid, 50 steps, Release mode)

Solver	Time (ms)	Speedup	Accuracy
explicit_euler	2.6	1.0x	Low
explicit_euler_optimized	0.9	2.9x	Low
explicit_euler_omp (8 cores)	0.8	3.3x	Low
projection	19.0	1.0x	High
projection_optimized	5.3	3.6x	High
projection_omp (8 cores)	4.2	4.5x	High
projection_jacobi_gpu	8.4	0.45x†	High

† GPU slower on small grids due to data transfer overhead

Grid Size Scaling (projection_optimized, 100 steps)

Grid	Time (s)	Memory (MB)	Iterations/step
50×50	0.5	0.02	~150
100×100	2.1	0.08	~300
200×200	9.8	0.31	~600
500×500	82.4	1.91	~1500

GPU vs SIMD (500×500 grid, 1000 steps)

Solver	Time (s)	Speedup
projection_optimized	824	1.0x
projection_jacobi_gpu	68	12.1x

Choosing a Solver

Decision Tree

Need strict incompressibility enforcement?
├─ No  → Use Explicit Euler family (faster)
│        ├─ Small grid (<100×100) → explicit_euler
│        ├─ Medium grid (100-500) → explicit_euler_optimized or explicit_euler_omp
│        └─ Large grid (>500)     → explicit_euler_omp or explicit_euler_gpu
│
└─ Yes → Use Projection Method family
         ├─ Small grid (<100×100) → projection
         ├─ Medium grid (100-500) → projection_optimized or projection_omp
         └─ Large grid (>500)     → projection_jacobi_gpu

GPU available and grid >200×200?
└─ Use CUDA variant for 10-50x speedup

Recommendations by Use Case

Learning/Debugging:

explicit_euler or projection
Simple, predictable behavior
Easy to inspect intermediate results

Production Simulations:

projection_optimized or projection_omp (medium grids)
projection_jacobi_gpu (large grids)
Best accuracy and performance

Benchmarking:

Always use CMAKE_BUILD_TYPE=Release
Disable I/O during timing measurements
Run multiple iterations for statistical significance

Boundary Conditions

Supported Types

Dirichlet - Fixed values: u(boundary) = u_bc
Neumann - Fixed gradient: ∂u/∂n(boundary) = g_bc
Periodic - u(0) = u(L)
No-slip - u = 0, v = 0 (walls)
Inlet - Specified velocity profile
Outlet - Zero-gradient: ∂u/∂n = 0

Implementation

Boundary conditions are applied after each solver step:

void apply_boundary_conditions(flow_field* field, grid_t* grid) {
    // No-slip walls (u=0, v=0)
    for (size_t i = 0; i < grid->nx; i++) {
        field->u[i + 0 * grid->nx] = 0.0;           // Bottom
        field->u[i + (grid->ny-1) * grid->nx] = 0.0; // Top
    }

    // Lid-driven cavity (top wall moves)
    for (size_t i = 0; i < grid->nx; i++) {
        field->u[i + (grid->ny-1) * grid->nx] = 1.0;
    }
}

Validation

Ghia Lid-Driven Cavity

Classic benchmark problem validated against Ghia et al. (1982):

Setup:

Square cavity [0,1] × [0,1]
Top wall moves with velocity u=1
All other walls no-slip
Reynolds number Re = ρUL/μ

Results (Re=100, 129×129 grid):

Centerline velocity profiles match published data
RMS error < 0.01

See validation/lid-driven-cavity.md for details.

Taylor-Green Vortex

Analytical solution for decaying vortex:

Exact Solution:

u(x,y,t) = -cos(x)sin(y)exp(-2νt)
v(x,y,t) =  sin(x)cos(y)exp(-2νt)
p(x,y,t) = -0.25(cos(2x) + cos(2y))exp(-4νt)

Results:

Velocity error < 1% at t=1.0
Energy decay matches analytical solution

3D Extension:

u(x,y,z,t) =  cos(x)sin(y)cos(z)exp(-3νt)
v(x,y,z,t) = -sin(x)cos(y)cos(z)exp(-3νt)
w(x,y,z,t) = 0

Velocity decays as exp(-3νt), kinetic energy as exp(-6νt)
Validated on 16×16×16 grid with ν=0.01

References

Numerical Methods

Chorin, A.J. (1968). "Numerical Solution of the Navier-Stokes Equations". Mathematics of Computation.
Ferziger & Peric - "Computational Methods for Fluid Dynamics"
Versteeg & Malalasekera - "An Introduction to CFD"

Validation

Ghia, U., Ghia, K.N., Shin, C.T. (1982). "High-Re Solutions for Incompressible Flow Using the Navier-Stokes Equations and a Multigrid Method". Journal of Computational Physics.

Linear Solvers

Saad, Y. - "Iterative Methods for Sparse Linear Systems"
Barrett et al. - "Templates for the Solution of Linear Systems"

Next Steps

Examples - See solvers in action
API Reference - API documentation
Building - Build with specific solvers

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Solvers & Numerical Methods

Overview

Solver Families

1. Explicit Euler Solvers

2. Projection Method Solvers

Linear Solvers (Poisson Equation)

Available Methods

1. Jacobi Iteration

2. Successive Over-Relaxation (SOR)

3. Red-Black SOR

4. Conjugate Gradient (CG)

5. Preconditioned CG (PCG)

6. BiCGSTAB

Linear Solver Performance Comparison

Backend Performance

CPU Scalar

SIMD (AVX2/NEON)

OpenMP

CUDA GPU

Performance Benchmarks

Solver Comparison (100×50 grid, 50 steps, Release mode)

Grid Size Scaling (projection_optimized, 100 steps)

GPU vs SIMD (500×500 grid, 1000 steps)

Choosing a Solver

Decision Tree

Recommendations by Use Case

Boundary Conditions

Supported Types

Implementation

Validation

Ghia Lid-Driven Cavity

Taylor-Green Vortex

References

Numerical Methods

Validation

Linear Solvers

Next Steps

FilesExpand file tree

solvers.md

Latest commit

History

solvers.md

File metadata and controls

Solvers & Numerical Methods

Overview

Solver Families

1. Explicit Euler Solvers

2. Projection Method Solvers

Linear Solvers (Poisson Equation)

Available Methods

1. Jacobi Iteration

2. Successive Over-Relaxation (SOR)

3. Red-Black SOR

4. Conjugate Gradient (CG)

5. Preconditioned CG (PCG)

6. BiCGSTAB

Linear Solver Performance Comparison

Backend Performance

CPU Scalar

SIMD (AVX2/NEON)

OpenMP

CUDA GPU

Performance Benchmarks

Solver Comparison (100×50 grid, 50 steps, Release mode)

Grid Size Scaling (projection_optimized, 100 steps)

GPU vs SIMD (500×500 grid, 1000 steps)

Choosing a Solver

Decision Tree

Recommendations by Use Case

Boundary Conditions

Supported Types

Implementation

Validation

Ghia Lid-Driven Cavity

Taylor-Green Vortex

References

Numerical Methods

Validation

Linear Solvers

Next Steps