Hesper

Write GPU programs in Lean 4. Prove them correct. Run on WebGPU.

Important

This is Alpha Software. The APIs, verification features, and compiler are under active development and subject to breaking changes. While core functionality works, this project is primarily for research and experimentation.

Hesper is a verified GPU programming framework that brings the power of formal verification to GPU computing. Write type-safe shaders, execute tensor operations, and build graphics applications—all in Lean 4.

import Hesper.WGSL.DSL

-- Type-safe shader expressions with compile-time verification
let x : Exp (.scalar .f32) := var "x"
let y : Exp (.scalar .f32) := var "y"
let result := sqrt (x * x + y * y)  -- Generates: sqrt(x * x + y * y)

-- Cannot mix types (compile error!)
-- let wrong := x + (var "i" : Exp (.scalar .i32))  ✗ Type error!

BitNet b1.58 Inference: 125 TPS on M4 Max

Hesper includes a complete BitNet b1.58 2B inference engine running entirely on WebGPU, achieving 125 tokens/second on Apple M4 Max:

$ lake exe bitnet-complete --stats
> Hello, world!
Hello, world! I'm a 20-year-old college student...

Performance: 125.6 TPS (8.0 ms/token)
  Model: BitNet b1.58 2B (30 layers, 2560 dim, i2_s ternary weights)

Key optimizations:

Ternary weight kernel (i2_s): 2-bit packed weights, addition-only matmul
Kernel fusion: fused gate+up+ReLU²×mul and fused KV cache write (150 fewer dispatches/token)
Shared memory F16 matmul for LM head (128K vocab)
PreparedDispatch graph capture: ~99% pipeline cache hit rate
Command buffer batching: single GPU submit per token
KV cache with grouped-query attention (20 heads, 5 KV heads)

See bitnet.lean for the full inference pipeline.

Why Hesper?

Modern GPU programming lacks safety guarantees. Hesper provides:

Type Safety: Shaders are type-checked at compile time, preventing type mismatches
Formal Verification: Prove correctness properties about your GPU programs
WebGPU Backend: Cross-platform GPU access via Dawn (Metal, Vulkan, D3D12)
Lean Integration: Use Lean's powerful theorem proving alongside GPU computation
Multi-GPU Support: Select and coordinate across multiple GPU adapters

Quick Start

Prerequisites

Platform: macOS (Metal), Linux (Vulkan), or Windows (D3D12/Vulkan)

🐳 Docker Environment (Recommended for Linux/CI)

For a reproducible build environment, especially on Linux, you can use the provided Docker image:

# Build the image
docker build -t hesper-ci .

# Run build and tests inside container
docker run -it hesper-ci lake test-all

Installation

# Clone the repository
git clone https://github.com/Verilean/hesper.git
cd hesper

# Build native dependencies (this will take a while on first build)
lake run buildNative

# Build and run a demo
lake build dsl-basics
./.lake/build/bin/dsl-basics

Your First Hesper Program

Create MyFirst.lean:

import Hesper
import Hesper.WebGPU.Device

def main : IO Unit := do
  -- Initialize WebGPU
  Hesper.init

  -- Get a GPU device
  let device ← Hesper.WebGPU.getDevice

  IO.println "✓ GPU ready!"

Build and run:

lake build myfirst
./.lake/build/bin/myfirst

Features

🚀 Portable SIMD CPU Backend (Google Highway)

Hardware-accelerated CPU operations powered by Google Highway, providing high-performance SIMD across x86, ARM, and RISC-V:

import Hesper.Simd
import Hesper.Float32
import Hesper.Float16

-- Float64 (8 bytes): Native Lean Float, NEON 2/op, AVX2 4/op
let a64 := FloatArray.mk #[1.0, 2.0, 3.0, 4.0]
let c64 := Hesper.Simd.simdAdd a64 b64

-- Float32 (4 bytes): 2x memory savings, NEON 4/op, AVX2 8/op
let a32 := Float32.fromFloatArray a64
let c32 := Float32.simdAdd a32 b32

-- Float16 (2 bytes): 4x memory savings, NEON 8/op, AVX2+F16C 8/op
-- Requires ARMv8.2-A FP16 or x86_64 F16C - returns error if unavailable
let hasFP16 ← Float16.hasHardwareSupport
if hasFP16 then
  let a16 ← Float16.fromFloatArray a64
  let c16 ← Float16.simdAdd a16 b16

Features:

Google Highway Integration: Portable SIMD implementation with runtime dispatch
Architecture Support: NEON (ARM), AVX2/AVX-512 (x86), optional FP16 vector arithmetic
Multi-Precision: Optimized paths for Float64, Float32, and Float16
OpenMP Support: Optional multithreading for large tensor operations

Zero-Conversion Architecture: All operations work directly on raw ByteArray with no automatic type conversions. Conversions are explicit only when needed.

⚡️ High-Level Parallel API

Inspired by webgpu-dawn, Hesper provides an easy-to-use API for data-parallelism that handles all GPU boilerplate (buffers, shaders, synchronization) in a single call.

parallelFor

Quickly execute a WGSL shader over a Float array:

import Hesper.Compute

-- Multiply each element by 1000 on the GPU
let result ← parallelFor device shader inputData

Device.compute

Run a computation with multiple named buffers directly on the Device:

device.compute myKernel [("input", inputBuf), ("output", outputBuf)] config

🎯 Type-Safe Shader DSL

Write WGSL shaders with Lean's type system guaranteeing correctness:

import Hesper.WGSL.DSL

-- Expressions are typed and checked at compile time
let x : Exp (.scalar .f32) := var "x"
let y : Exp (.scalar .f32) := var "y"

-- Arithmetic operators work naturally
let distance := sqrt (x * x + y * y)

-- Built-in functions
let clamped := Exp.clamp x (lit 0.0) (lit 1.0)
let power := Exp.pow x (lit 2.0)

-- Generate WGSL code
IO.println distance.toWGSL  -- Output: sqrt((x * x) + (y * y))

🧩 Verified Composable Kernels (Operator Fusion)

Hesper's VerifiedOpFusion architecture allows you to compose multiple GPU operations into a single kernel pass while maintaining formal correctness:

-- Fuses MatMul and ReLU into a single GPU kernel
-- Correctness is proven by construction
let fusedOp := matmulKernel |> reluKernel

Key Advantages:

Zero-Copy Fusion: Eliminate expensive memory roundtrips between kernels.
Formal Correctness: Each fused kernel is verified against a high-level CPU specification (spec_forward).
Unified Interface: Same code runs on GPU (via WGSL) or CPU (via Google Highway) for easy debugging.

📈 Unified Verified Automatic Differentiation

Hesper's unique architecture unifies formal verification with automatic differentiation via a shared Differentiable interface. This allows the AD engine to treat complex, verified GPU kernels as first-class primitives.

The Differentiable Interface

All operations in Hesper—from simple scalar addition to fused ResNet blocks—implement this common trait:

class Differentiable (I O : Type) where
  /-- Primal execution (Forward pass) -/
  forward : I → O
  
  /-- Adjoint computation (Backward pass) -/
  /-- Matrix-Free Vector-Jacobian Product (Jᵀv) -/
  backward : I → O → I

Why it Matters:

Unified Logic: Scalar-CPU logic and Tensor-GPU kernels share the same mathematical abstraction.
End-to-End Correctness: By "lifting" VerifiedOp instances into the AD tape, Hesper ensures that backpropagation is as formally correct as the forward pass.
Zero-Copy Fusion: The AD engine can calculate gradients across fused kernels (e.g., MatMul |> ReLU) without writing intermediate tensors to VRAM.

-- AD engine automatically dispatches to hand-optimized GPU kernels
let grad := diff (matmul |> relu |> crossEntropy) input

Key Features:

Hybrid AD: Seamlessly switch between CPU-scalar AD and GPU-tensor AD.
Verified Primitives: Every AD node is backed by a verified spec_forward and spec_backward.
High Performance: Leverages Hand-optimized WGSL and Google Highway SIMD.

⚙️ High-Level Optimizers

Train models using state-of-the-art optimizers that integrate with Hesper's verified tensors:

import Hesper.Optimizer.SGD

-- Configure SGD with momentum
let opt := SGDConfig.default
  |>.withLearningRate 0.01 
  |>.withMomentum 0.9

-- Perform optimization step
let (newParams, newState) := opt.step params grads state

🎮 Graphics & Windowing

Build interactive graphics applications with GLFW integration:

import Hesper.GLFW

def main : IO Unit := do
  Hesper.init

  withGLFW do
    let window ← createWindow 800 600 "Hesper Graphics"
    let device ← Hesper.WebGPU.getDevice
    let surface ← createSurface device window

    -- Render loop
    gameLoop window surface

🔌 Multi-GPU Support

Enumerate and select GPUs in multi-GPU systems:

import Hesper.WebGPU.Device

-- List all available GPUs
Hesper.WebGPU.listAdapters

-- Select specific GPU
let device0 ← getDeviceByIndex 0  -- First GPU
let device1 ← getDeviceByIndex 1  -- Second GPU

-- Get adapter information
let info ← getAdapterInfo 0
IO.println s!"GPU: {info.name} (Backend: {info.backendType})"

Examples

WebGPU Tetris

A full Tetris implementation using GLFW and WebGPU, demonstrating:

Dynamic shader generation
Real-time rendering
Input handling
Game state management

lake build tetris
./.lake/build/bin/tetris

Controls: A/D (move), S (drop), Space (rotate), ESC (exit)

Matrix Multiplication

High-performance matrix multiplication with subgroup optimizations:

lake build matmul-demo
./.lake/build/bin/matmul-demo

Demonstrates:

GPU buffer management
Compute shader execution
Performance profiling
Result verification

SIMD CPU Backend

Multi-precision SIMD operations with hardware acceleration:

# Run multi-precision test (Float64/Float32/Float16)
lake script run buildSimd
lake build multi-precision
./.lake/build/bin/multi-precision

# Run SIMD benchmarks
lake build simd-bench
./.lake/build/bin/simd-bench

Output:

Backend: NEON (ARM64) - F64: 2/op, F32: 4/op, FP16

─── Float64 (8 bytes/element) ───
Result: #[6.0, 8.0, 10.0, 12.0] ✓

─── Float32 (4 bytes/element) ───
Result: Float32[4]: [6.0, 8.0, 10.0, 12.0] ✓

─── Float16 (2 bytes/element) ───
FP16 hardware detected!
Result: Float16[4]: [6.0, 8.0, 10.0, 12.0] ✓

Multi-GPU Demo

Enumerate GPUs and create devices from specific adapters:

lake build multigpu
./.lake/build/bin/multigpu

Output:

Found 2 GPU adapter(s):
  [0] NVIDIA GeForce RTX 3080 (Backend: Vulkan)
  [1] Intel UHD Graphics 630 (Backend: Vulkan)
✓ Device created from GPU 0

Neural Network Training

Automatic differentiation and gradient descent on GPU:

lake build nn-gpu-demo
./.lake/build/bin/nn-gpu-demo

Features:

Conv2D layers with verified gradients
Backpropagation on GPU
Real-time training visualization

Building and Testing

Building the Project

Hesper requires building both native C++ dependencies (Google Dawn) and Lean code.

Step 1: Build Native Dependencies

The first build will take 5-15 minutes as it compiles Google Dawn from source:

# Build the native WebGPU bridge (hesper_native library)
lake script run buildNative

This compiles:

Google Dawn WebGPU implementation
C++ FFI bridge (native/bridge.cpp)
SIMD CPU backend (c_src/simd_ops.cpp)

Step 2: Build Lean Code

Once native dependencies are built, compile the Lean libraries and executables:

# Build the core library
lake build Hesper

# Or build a specific executable
lake build simple-write

Clean Build (if you encounter issues):

lake clean
lake script run buildNative
lake build

Testing the Installation

1. Simple GPU Test (Raw WGSL + DSL)

This test verifies both raw WGSL shaders and DSL-generated shaders execute correctly on your GPU:

lake build simple-write
./.lake/build/bin/simple-write

Expected output:

╔══════════════════════════════════════╗
║   GPU Double Test (DSL + Raw)        ║
╚══════════════════════════════════════╝

📝 DSL-generated WGSL:
─────────────────────────────────────
@group(0) @binding(0) var<storage, read_write> input: array<f32>;
@group(0) @binding(1) var<storage, read_write> output: array<f32>;

@compute @workgroup_size(1)
fn main(@builtin(global_invocation_id) gid: vec3<u32>) {
    let idx = gid.x;
    if (idx < arrayLength(&input)) {
        output[idx] = input[idx] * 2.0;
    }
}

🚀 Initializing WebGPU...
  ✓ Created input buffer
  ✓ Wrote input: [1.0, 2.0, 3.0, 4.0]
  ✓ Created output buffer

  🔹 Test 1: Raw WGSL shader
  ✓ Raw WGSL executed

  🔹 Test 2: DSL-generated shader
  ✓ DSL shader executed

📊 Results:
  Input → Expected → Raw WGSL → DSL WGSL
  [0] 1.0 → 2.0 → 2.0 ✓ → 2.0 ✓
  [1] 2.0 → 4.0 → 4.0 ✓ → 4.0 ✓
  [2] 3.0 → 6.0 → 6.0 ✓ → 6.0 ✓
  [3] 4.0 → 8.0 → 8.0 ✓ → 8.0 ✓

✅ SUCCESS: Both shaders work correctly!
   - Raw WGSL shader: ✓
   - DSL-generated shader (ShaderM monad): ✓
   - Both produce identical correct results

This test validates:

✓ WebGPU initialization and GPU discovery
✓ Buffer creation and data transfer (CPU ↔ GPU)
✓ Raw WGSL shader compilation and execution
✓ DSL shader code generation (ShaderM monad → WGSL)
✓ DSL shader execution on GPU
✓ Correct data marshalling across the FFI boundary

2. FFI Boundary Tests

Test data conversion across the Lean ↔ C++ FFI boundary:

lake build ffi-tests
./.lake/build/bin/ffi-tests

Expected output:

╔══════════════════════════════════════╗
║   FFI Boundary Tests                 ║
╚══════════════════════════════════════╝

Test 1: Lean writes data, C++ reads
  ✓ Lean wrote: [1.0, 2.0, 3.0, 4.0]
  ✓ C++ verified byte-level accuracy

Test 2: C++ writes data, Lean reads
  ✓ GPU wrote: [10.0, 20.0, 30.0, 40.0]
  ✓ Lean verified byte-level accuracy

Test 3: Round-trip (Lean → GPU → Lean)
  ✓ Input: [5.0, 10.0, 15.0, 20.0]
  ✓ Output: [10.0, 20.0, 30.0, 40.0]
  ✓ Data integrity preserved

✅ All FFI boundary tests passed!

This validates:

Lean writes ByteArray → C++ reads correct bytes
C++ writes bytes → Lean reads correct Float values
Round-trip data integrity across FFI boundary

3. SIMD CPU Backend Test

Test multi-precision SIMD operations (Float64/Float32/Float16):

lake script run buildSimd
lake build multi-precision
./.lake/build/bin/multi-precision

Expected output (on ARM64 with FP16 support):

Backend: NEON (ARM64) - F64: 2/op, F32: 4/op, FP16

─── Float64 (8 bytes/element) ───
Result: #[6.0, 8.0, 10.0, 12.0] ✓

─── Float32 (4 bytes/element) ───
Result: Float32[4]: [6.0, 8.0, 10.0, 12.0] ✓

─── Float16 (2 bytes/element) ───
FP16 hardware detected!
Result: Float16[4]: [6.0, 8.0, 10.0, 12.0] ✓

For Contributors: Testing Your Changes

When making changes to Hesper, run these tests to ensure you haven't broken anything:

1. Core FFI Tests

# Test Lean ↔ C++ data conversion
lake build ffi-tests
./.lake/build/bin/ffi-tests

2. GPU Shader Tests

# Test raw WGSL and DSL shader execution
lake build simple-write
./.lake/build/bin/simple-write

3. SIMD Tests

# Rebuild SIMD library and run tests
lake script run buildSimd
lake build simd-test
./.lake/build/bin/simd-test

4. Full Test Suite

# Run all tests
lake build test-all
./.lake/build/bin/test-all

Troubleshooting

Issue: "Build failed: native library not found"

Solution: Rebuild the native library:

lake clean
lake script run buildNative
lake build

Issue: "No GPU adapters found"

Solution: Ensure you have proper GPU drivers:

macOS: No action needed (Metal is built-in)
Linux: Install Vulkan drivers (vulkan-tools, mesa-vulkan-drivers)
Windows: Install latest GPU drivers with D3D12/Vulkan support

Issue: "SIMD library not found"

Solution: Build the SIMD backend:

lake script run buildSimd

Issue: "FP16 not supported"

Solution: This is expected on older hardware. Float16 requires:

ARM64: ARMv8.2-A with FP16 extension (Apple M1+, AWS Graviton2+)
x86_64: F16C extension (Intel Ivy Bridge+ / AMD Bulldozer+)

The library will gracefully fall back to Float32 operations.

Issue: Dawn build takes too long

Solution: Dawn's first build can take 10-15 minutes. Subsequent builds are incremental and much faster. To speed up:

# Use more CPU cores (adjust -j value)
lake script run buildNative -- -j 16

How It Works

┌─────────────────────────────────────────────────────────────┐
│                    Lean 4 Code                               │
│  • Type-safe shader DSL                                      │
│  • Tensor operations                                         │
│  • Formal proofs                                             │
└─────────────────────┬───────────────────────────────────────┘
                      │
                      ▼
┌─────────────────────────────────────────────────────────────┐
│              WGSL Code Generation                            │
│  Exp (.scalar .f32) → WGSL shader source                    │
└─────────────────────┬───────────────────────────────────────┘
                      │
                      ▼
┌─────────────────────────────────────────────────────────────┐
│              Lean FFI (C++ Bridge)                           │
│  • lean_hesper_* functions                                   │
│  • Resource management via Lean.External                     │
└─────────────────────┬───────────────────────────────────────┘
                      │
                      ▼
┌─────────────────────────────────────────────────────────────┐
│              Google Dawn (WebGPU Native)                     │
│  • Metal (macOS)                                             │
│  • Vulkan (Linux/Windows)                                    │
│  • D3D12 (Windows)                                           │
└─────────────────────────────────────────────────────────────┘

Architecture Layers

DSL Layer: Type-safe WGSL expression builder with dependent types
Tensor Layer: High-level operations (matmul, conv2d, pooling)
Compute Layer: Shader compilation, buffer management, execution
WebGPU Layer: FFI bindings to Dawn native implementation
Backend Layer: Platform-specific GPU drivers (Metal/Vulkan/D3D12)

Project Structure

Hesper/
├── Hesper/
│   ├── WGSL/          # Type-safe shader DSL
│   │   ├── Types.lean      # WGSL type system
│   │   ├── Exp.lean        # Expression AST
│   │   └── DSL.lean        # User-facing DSL
│   ├── WebGPU/        # WebGPU bindings
│   │   ├── Device.lean     # GPU device management
│   │   ├── Buffer.lean     # GPU buffers
│   │   ├── Shader.lean     # Shader modules
│   │   ├── Pipeline.lean   # Compute/render pipelines
│   │   └── Errors.lean     # Comprehensive error handling
│   ├── Tensor/        # Tensor operations
│   │   └── MatMul.lean     # Matrix multiplication
│   ├── NN/            # Neural network layers
│   │   └── Conv.lean       # Convolution layers
│   ├── GLFW/          # Windowing and graphics
│   │   └── GLFW.lean       # GLFW bindings
│   ├── Simd.lean      # SIMD Float64 operations
│   ├── Float32.lean   # SIMD Float32 operations
│   ├── Float16.lean   # SIMD Float16 operations
│   └── Compute.lean   # High-level compute API
├── Examples/          # Example programs
│   ├── Tetris.lean         # Full game demo
│   ├── MultiGPU.lean       # Multi-GPU support
│   ├── DSLBasics.lean      # DSL tutorial
│   └── ...
├── native/            # C++ WebGPU bridge
│   ├── bridge.cpp          # FFI implementation
│   └── CMakeLists.txt      # Build configuration
├── c_src/             # SIMD CPU backend
│   └── simd_ops.cpp        # NEON/AVX2 implementations
├── Tests/             # Comprehensive test suite
│   ├── ErrorTests.lean     # Error handling tests
│   ├── ShaderTests.lean    # Shader monad tests
│   └── ...
└── lakefile.lean      # Lake build script

Roadmap

Current Status: Early Development (Alpha)

In Progress:

Comprehensive tensor operation library (GEMM, Conv3D)
Gemma 3 / Transformer support
Automatic differentiation on GPU kernels
Formal proofs of kernel numerical stability
Integration with Lean's tactic framework

Contributing

Hesper is part of the Verilean organization's effort to bring verified computing to GPUs.

How to Contribute

Fork the repository and create a feature branch
Make your changes following the existing code style

Run the test suite to ensure nothing broke:

# Core FFI boundary tests
lake build ffi-tests
./.lake/build/bin/ffi-tests

# GPU shader tests (raw WGSL + DSL)
lake build simple-write
./.lake/build/bin/simple-write

# SIMD tests (if you modified SIMD code)
lake script run buildSimd
lake build simd-test
./.lake/build/bin/simd-test

Add tests for new features (see Examples/Tests/ for examples)
Submit a pull request with a clear description of changes

Testing Guidelines

FFI changes: Always run test-ffi to verify Lean ↔ C++ data marshalling
DSL changes: Run simple-write to verify WGSL code generation
GPU operations: Test with real GPU hardware, not just compilation
SIMD changes: Test on both ARM64 (NEON) and x86_64 (AVX2) if possible
Cross-platform: macOS (Metal), Linux (Vulkan), Windows (D3D12/Vulkan)

Code Organization for Contributors

Hesper/
├── Hesper/               # Core library
│   ├── WGSL/            # Type-safe shader DSL
│   ├── WebGPU/          # WebGPU bindings (Device, Buffer, Shader, Pipeline)
│   ├── Compute.lean     # High-level compute API
│   ├── Simd.lean        # SIMD Float64 operations
│   ├── Float32.lean     # SIMD Float32 operations
│   └── Float16.lean     # SIMD Float16 operations
├── Examples/             # Example programs (organized by category)
│   ├── DSL/             # DSL feature demonstrations
│   ├── Compute/         # GPU compute examples
│   ├── MachineLearning/ # Neural network training
│   ├── Graphics/        # GLFW rendering demos
│   ├── SIMD/            # CPU SIMD benchmarks
│   ├── Tests/           # Integration tests
│   └── Utilities/       # Helper utilities
├── Tests/                # Unit tests
│   ├── FFIBoundaryTests.lean  # Lean ↔ C++ data conversion tests
│   └── FusionTest.lean        # Operator fusion tests
├── native/               # C++ WebGPU bridge
│   ├── bridge.cpp       # FFI implementation (lean_hesper_* functions)
│   └── CMakeLists.txt   # Dawn build configuration
├── c_src/                # SIMD CPU backend
│   └── simd_ops.cpp     # NEON/AVX2 implementations
└── lakefile.lean         # Lake build script

Key files for contributors:

native/bridge.cpp: FFI boundary - all Lean ↔ C++ data conversion happens here
Hesper/WGSL/Monad.lean: ShaderM monad for imperative shader construction
Hesper/WGSL/Execute.lean: Compiles ShaderM → WGSL and executes on GPU
Examples/Tests/SimpleWrite.lean: Reference test showing raw WGSL vs DSL execution
Tests/FFIBoundaryTests.lean: Reference test for FFI data conversion

Links

Report Issues: GitHub Issues
Discussions: GitHub Discussions
Sister Project: Sparkle HDL - Verified hardware design in Lean 4

Author

Junji Hashimoto

Twitter/X: @junjihashimoto3

License

Apache License 2.0 - see LICENSE file for details

Acknowledgments

Google Dawn for the WebGPU native implementation
Lean 4 for the foundation of verified programming
WebGPU Working Group for the standard
gpu.cpp (Answer.AI): High-level C++ API wrapper inspiration.

Write GPU code that's not just fast—make it correct by construction.

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
.github/workflows		.github/workflows
Bench		Bench
Benchmarks		Benchmarks
Examples		Examples
Hesper		Hesper
Tests		Tests
c_src		c_src
native		native
.dockerignore		.dockerignore
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
CHANGELOG.md		CHANGELOG.md
Dockerfile		Dockerfile
Hesper.lean		Hesper.lean
LICENSE		LICENSE
README.md		README.md
lake-manifest.json		lake-manifest.json
lakefile.lean		lakefile.lean
lakefile.toml		lakefile.toml
lean-toolchain		lean-toolchain

Folders and files

Latest commit

History

Repository files navigation

Hesper

BitNet b1.58 Inference: 125 TPS on M4 Max

Why Hesper?

Quick Start

Prerequisites

🐳 Docker Environment (Recommended for Linux/CI)

Installation

Your First Hesper Program

Features

🚀 Portable SIMD CPU Backend (Google Highway)

⚡️ High-Level Parallel API

parallelFor

Device.compute

🎯 Type-Safe Shader DSL

🧩 Verified Composable Kernels (Operator Fusion)

📈 Unified Verified Automatic Differentiation

The Differentiable Interface

Why it Matters:

⚙️ High-Level Optimizers

🎮 Graphics & Windowing

🔌 Multi-GPU Support

Examples

WebGPU Tetris

Matrix Multiplication

SIMD CPU Backend

Multi-GPU Demo

Neural Network Training

Building and Testing

Building the Project

Testing the Installation

1. Simple GPU Test (Raw WGSL + DSL)

2. FFI Boundary Tests

3. SIMD CPU Backend Test

For Contributors: Testing Your Changes

1. Core FFI Tests

2. GPU Shader Tests

3. SIMD Tests

4. Full Test Suite

Troubleshooting

Issue: "Build failed: native library not found"

Issue: "No GPU adapters found"

Issue: "SIMD library not found"

Issue: "FP16 not supported"

Issue: Dawn build takes too long

How It Works

Architecture Layers

Project Structure

Roadmap

Contributing

How to Contribute

Testing Guidelines

Code Organization for Contributors

Links

Author

License

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages