Skip to content

Verilean/hesper

Repository files navigation

Hesper

Write GPU programs in Lean 4. Prove them correct. Run on WebGPU.

Important

This is Alpha Software. The APIs, verification features, and compiler are under active development and subject to breaking changes. While core functionality works, this project is primarily for research and experimentation.

Hesper is a verified GPU programming framework that brings the power of formal verification to GPU computing. Write type-safe shaders, execute tensor operations, and build graphics applications—all in Lean 4.

import Hesper.WGSL.DSL

-- Type-safe shader expressions with compile-time verification
let x : Exp (.scalar .f32) := var "x"
let y : Exp (.scalar .f32) := var "y"
let result := sqrt (x * x + y * y)  -- Generates: sqrt(x * x + y * y)

-- Cannot mix types (compile error!)
-- let wrong := x + (var "i" : Exp (.scalar .i32))  ✗ Type error!

BitNet b1.58 Inference: 125 TPS on M4 Max

Hesper includes a complete BitNet b1.58 2B inference engine running entirely on WebGPU, achieving 125 tokens/second on Apple M4 Max:

$ lake exe bitnet-complete --stats
> Hello, world!
Hello, world! I'm a 20-year-old college student...

Performance: 125.6 TPS (8.0 ms/token)
  Model: BitNet b1.58 2B (30 layers, 2560 dim, i2_s ternary weights)

Key optimizations:

  • Ternary weight kernel (i2_s): 2-bit packed weights, addition-only matmul
  • Kernel fusion: fused gate+up+ReLU²×mul and fused KV cache write (150 fewer dispatches/token)
  • Shared memory F16 matmul for LM head (128K vocab)
  • PreparedDispatch graph capture: ~99% pipeline cache hit rate
  • Command buffer batching: single GPU submit per token
  • KV cache with grouped-query attention (20 heads, 5 KV heads)

See bitnet.lean for the full inference pipeline.

Why Hesper?

Modern GPU programming lacks safety guarantees. Hesper provides:

  • Type Safety: Shaders are type-checked at compile time, preventing type mismatches
  • Formal Verification: Prove correctness properties about your GPU programs
  • WebGPU Backend: Cross-platform GPU access via Dawn (Metal, Vulkan, D3D12)
  • Lean Integration: Use Lean's powerful theorem proving alongside GPU computation
  • Multi-GPU Support: Select and coordinate across multiple GPU adapters

Quick Start

Prerequisites

  • Platform: macOS (Metal), Linux (Vulkan), or Windows (D3D12/Vulkan)

🐳 Docker Environment (Recommended for Linux/CI)

For a reproducible build environment, especially on Linux, you can use the provided Docker image:

# Build the image
docker build -t hesper-ci .

# Run build and tests inside container
docker run -it hesper-ci lake test-all

Installation

# Clone the repository
git clone https://github.com/Verilean/hesper.git
cd hesper

# Build native dependencies (this will take a while on first build)
lake run buildNative

# Build and run a demo
lake build dsl-basics
./.lake/build/bin/dsl-basics

Your First Hesper Program

Create MyFirst.lean:

import Hesper
import Hesper.WebGPU.Device

def main : IO Unit := do
  -- Initialize WebGPU
  Hesper.init

  -- Get a GPU device
  let device ← Hesper.WebGPU.getDevice

  IO.println "✓ GPU ready!"

Build and run:

lake build myfirst
./.lake/build/bin/myfirst

Features

🚀 Portable SIMD CPU Backend (Google Highway)

Hardware-accelerated CPU operations powered by Google Highway, providing high-performance SIMD across x86, ARM, and RISC-V:

import Hesper.Simd
import Hesper.Float32
import Hesper.Float16

-- Float64 (8 bytes): Native Lean Float, NEON 2/op, AVX2 4/op
let a64 := FloatArray.mk #[1.0, 2.0, 3.0, 4.0]
let c64 := Hesper.Simd.simdAdd a64 b64

-- Float32 (4 bytes): 2x memory savings, NEON 4/op, AVX2 8/op
let a32 := Float32.fromFloatArray a64
let c32 := Float32.simdAdd a32 b32

-- Float16 (2 bytes): 4x memory savings, NEON 8/op, AVX2+F16C 8/op
-- Requires ARMv8.2-A FP16 or x86_64 F16C - returns error if unavailable
let hasFP16 ← Float16.hasHardwareSupport
if hasFP16 then
  let a16 ← Float16.fromFloatArray a64
  let c16 ← Float16.simdAdd a16 b16

Features:

  • Google Highway Integration: Portable SIMD implementation with runtime dispatch
  • Architecture Support: NEON (ARM), AVX2/AVX-512 (x86), optional FP16 vector arithmetic
  • Multi-Precision: Optimized paths for Float64, Float32, and Float16
  • OpenMP Support: Optional multithreading for large tensor operations

Zero-Conversion Architecture: All operations work directly on raw ByteArray with no automatic type conversions. Conversions are explicit only when needed.

⚡️ High-Level Parallel API

Inspired by webgpu-dawn, Hesper provides an easy-to-use API for data-parallelism that handles all GPU boilerplate (buffers, shaders, synchronization) in a single call.

parallelFor

Quickly execute a WGSL shader over a Float array:

import Hesper.Compute

-- Multiply each element by 1000 on the GPU
let result ← parallelFor device shader inputData

Device.compute

Run a computation with multiple named buffers directly on the Device:

device.compute myKernel [("input", inputBuf), ("output", outputBuf)] config

🎯 Type-Safe Shader DSL

Write WGSL shaders with Lean's type system guaranteeing correctness:

import Hesper.WGSL.DSL

-- Expressions are typed and checked at compile time
let x : Exp (.scalar .f32) := var "x"
let y : Exp (.scalar .f32) := var "y"

-- Arithmetic operators work naturally
let distance := sqrt (x * x + y * y)

-- Built-in functions
let clamped := Exp.clamp x (lit 0.0) (lit 1.0)
let power := Exp.pow x (lit 2.0)

-- Generate WGSL code
IO.println distance.toWGSL  -- Output: sqrt((x * x) + (y * y))

🧩 Verified Composable Kernels (Operator Fusion)

Hesper's VerifiedOpFusion architecture allows you to compose multiple GPU operations into a single kernel pass while maintaining formal correctness:

-- Fuses MatMul and ReLU into a single GPU kernel
-- Correctness is proven by construction
let fusedOp := matmulKernel |> reluKernel

Key Advantages:

  • Zero-Copy Fusion: Eliminate expensive memory roundtrips between kernels.
  • Formal Correctness: Each fused kernel is verified against a high-level CPU specification (spec_forward).
  • Unified Interface: Same code runs on GPU (via WGSL) or CPU (via Google Highway) for easy debugging.

📈 Unified Verified Automatic Differentiation

Hesper's unique architecture unifies formal verification with automatic differentiation via a shared Differentiable interface. This allows the AD engine to treat complex, verified GPU kernels as first-class primitives.

The Differentiable Interface

All operations in Hesper—from simple scalar addition to fused ResNet blocks—implement this common trait:

class Differentiable (I O : Type) where
  /-- Primal execution (Forward pass) -/
  forward : I → O
  
  /-- Adjoint computation (Backward pass) -/
  /-- Matrix-Free Vector-Jacobian Product (Jᵀv) -/
  backward : I → O → I

Why it Matters:

  • Unified Logic: Scalar-CPU logic and Tensor-GPU kernels share the same mathematical abstraction.
  • End-to-End Correctness: By "lifting" VerifiedOp instances into the AD tape, Hesper ensures that backpropagation is as formally correct as the forward pass.
  • Zero-Copy Fusion: The AD engine can calculate gradients across fused kernels (e.g., MatMul |> ReLU) without writing intermediate tensors to VRAM.
-- AD engine automatically dispatches to hand-optimized GPU kernels
let grad := diff (matmul |> relu |> crossEntropy) input 

Key Features:

  • Hybrid AD: Seamlessly switch between CPU-scalar AD and GPU-tensor AD.
  • Verified Primitives: Every AD node is backed by a verified spec_forward and spec_backward.
  • High Performance: Leverages Hand-optimized WGSL and Google Highway SIMD.

⚙️ High-Level Optimizers

Train models using state-of-the-art optimizers that integrate with Hesper's verified tensors:

import Hesper.Optimizer.SGD

-- Configure SGD with momentum
let opt := SGDConfig.default
  |>.withLearningRate 0.01 
  |>.withMomentum 0.9

-- Perform optimization step
let (newParams, newState) := opt.step params grads state

🎮 Graphics & Windowing

Build interactive graphics applications with GLFW integration:

import Hesper.GLFW

def main : IO Unit := do
  Hesper.init

  withGLFW do
    let window ← createWindow 800 600 "Hesper Graphics"
    let device ← Hesper.WebGPU.getDevice
    let surface ← createSurface device window

    -- Render loop
    gameLoop window surface

🔌 Multi-GPU Support

Enumerate and select GPUs in multi-GPU systems:

import Hesper.WebGPU.Device

-- List all available GPUs
Hesper.WebGPU.listAdapters

-- Select specific GPU
let device0 ← getDeviceByIndex 0  -- First GPU
let device1 ← getDeviceByIndex 1  -- Second GPU

-- Get adapter information
let info ← getAdapterInfo 0
IO.println s!"GPU: {info.name} (Backend: {info.backendType})"

Examples

WebGPU Tetris

A full Tetris implementation using GLFW and WebGPU, demonstrating:

  • Dynamic shader generation
  • Real-time rendering
  • Input handling
  • Game state management
lake build tetris
./.lake/build/bin/tetris

Controls: A/D (move), S (drop), Space (rotate), ESC (exit)

Matrix Multiplication

High-performance matrix multiplication with subgroup optimizations:

lake build matmul-demo
./.lake/build/bin/matmul-demo

Demonstrates:

  • GPU buffer management
  • Compute shader execution
  • Performance profiling
  • Result verification

SIMD CPU Backend

Multi-precision SIMD operations with hardware acceleration:

# Run multi-precision test (Float64/Float32/Float16)
lake script run buildSimd
lake build multi-precision
./.lake/build/bin/multi-precision

# Run SIMD benchmarks
lake build simd-bench
./.lake/build/bin/simd-bench

Output:

Backend: NEON (ARM64) - F64: 2/op, F32: 4/op, FP16

─── Float64 (8 bytes/element) ───
Result: #[6.0, 8.0, 10.0, 12.0] ✓

─── Float32 (4 bytes/element) ───
Result: Float32[4]: [6.0, 8.0, 10.0, 12.0] ✓

─── Float16 (2 bytes/element) ───
FP16 hardware detected!
Result: Float16[4]: [6.0, 8.0, 10.0, 12.0] ✓

Multi-GPU Demo

Enumerate GPUs and create devices from specific adapters:

lake build multigpu
./.lake/build/bin/multigpu

Output:

Found 2 GPU adapter(s):
  [0] NVIDIA GeForce RTX 3080 (Backend: Vulkan)
  [1] Intel UHD Graphics 630 (Backend: Vulkan)
✓ Device created from GPU 0

Neural Network Training

Automatic differentiation and gradient descent on GPU:

lake build nn-gpu-demo
./.lake/build/bin/nn-gpu-demo

Features:

  • Conv2D layers with verified gradients
  • Backpropagation on GPU
  • Real-time training visualization

Building and Testing

Building the Project

Hesper requires building both native C++ dependencies (Google Dawn) and Lean code.

Step 1: Build Native Dependencies

The first build will take 5-15 minutes as it compiles Google Dawn from source:

# Build the native WebGPU bridge (hesper_native library)
lake script run buildNative

This compiles:

  • Google Dawn WebGPU implementation
  • C++ FFI bridge (native/bridge.cpp)
  • SIMD CPU backend (c_src/simd_ops.cpp)

Step 2: Build Lean Code

Once native dependencies are built, compile the Lean libraries and executables:

# Build the core library
lake build Hesper

# Or build a specific executable
lake build simple-write

Clean Build (if you encounter issues):

lake clean
lake script run buildNative
lake build

Testing the Installation

1. Simple GPU Test (Raw WGSL + DSL)

This test verifies both raw WGSL shaders and DSL-generated shaders execute correctly on your GPU:

lake build simple-write
./.lake/build/bin/simple-write

Expected output:

╔══════════════════════════════════════╗
║   GPU Double Test (DSL + Raw)        ║
╚══════════════════════════════════════╝

📝 DSL-generated WGSL:
─────────────────────────────────────
@group(0) @binding(0) var<storage, read_write> input: array<f32>;
@group(0) @binding(1) var<storage, read_write> output: array<f32>;

@compute @workgroup_size(1)
fn main(@builtin(global_invocation_id) gid: vec3<u32>) {
    let idx = gid.x;
    if (idx < arrayLength(&input)) {
        output[idx] = input[idx] * 2.0;
    }
}

🚀 Initializing WebGPU...
  ✓ Created input buffer
  ✓ Wrote input: [1.0, 2.0, 3.0, 4.0]
  ✓ Created output buffer

  🔹 Test 1: Raw WGSL shader
  ✓ Raw WGSL executed

  🔹 Test 2: DSL-generated shader
  ✓ DSL shader executed

📊 Results:
  Input → Expected → Raw WGSL → DSL WGSL
  [0] 1.0 → 2.0 → 2.0 ✓ → 2.0 ✓
  [1] 2.0 → 4.0 → 4.0 ✓ → 4.0 ✓
  [2] 3.0 → 6.0 → 6.0 ✓ → 6.0 ✓
  [3] 4.0 → 8.0 → 8.0 ✓ → 8.0 ✓

✅ SUCCESS: Both shaders work correctly!
   - Raw WGSL shader: ✓
   - DSL-generated shader (ShaderM monad): ✓
   - Both produce identical correct results

This test validates:

  • ✓ WebGPU initialization and GPU discovery
  • ✓ Buffer creation and data transfer (CPU ↔ GPU)
  • ✓ Raw WGSL shader compilation and execution
  • ✓ DSL shader code generation (ShaderM monad → WGSL)
  • ✓ DSL shader execution on GPU
  • ✓ Correct data marshalling across the FFI boundary

2. FFI Boundary Tests

Test data conversion across the Lean ↔ C++ FFI boundary:

lake build ffi-tests
./.lake/build/bin/ffi-tests

Expected output:

╔══════════════════════════════════════╗
║   FFI Boundary Tests                 ║
╚══════════════════════════════════════╝

Test 1: Lean writes data, C++ reads
  ✓ Lean wrote: [1.0, 2.0, 3.0, 4.0]
  ✓ C++ verified byte-level accuracy

Test 2: C++ writes data, Lean reads
  ✓ GPU wrote: [10.0, 20.0, 30.0, 40.0]
  ✓ Lean verified byte-level accuracy

Test 3: Round-trip (Lean → GPU → Lean)
  ✓ Input: [5.0, 10.0, 15.0, 20.0]
  ✓ Output: [10.0, 20.0, 30.0, 40.0]
  ✓ Data integrity preserved

✅ All FFI boundary tests passed!

This validates:

  • Lean writes ByteArray → C++ reads correct bytes
  • C++ writes bytes → Lean reads correct Float values
  • Round-trip data integrity across FFI boundary

3. SIMD CPU Backend Test

Test multi-precision SIMD operations (Float64/Float32/Float16):

lake script run buildSimd
lake build multi-precision
./.lake/build/bin/multi-precision

Expected output (on ARM64 with FP16 support):

Backend: NEON (ARM64) - F64: 2/op, F32: 4/op, FP16

─── Float64 (8 bytes/element) ───
Result: #[6.0, 8.0, 10.0, 12.0] ✓

─── Float32 (4 bytes/element) ───
Result: Float32[4]: [6.0, 8.0, 10.0, 12.0] ✓

─── Float16 (2 bytes/element) ───
FP16 hardware detected!
Result: Float16[4]: [6.0, 8.0, 10.0, 12.0] ✓

For Contributors: Testing Your Changes

When making changes to Hesper, run these tests to ensure you haven't broken anything:

1. Core FFI Tests

# Test Lean ↔ C++ data conversion
lake build ffi-tests
./.lake/build/bin/ffi-tests

2. GPU Shader Tests

# Test raw WGSL and DSL shader execution
lake build simple-write
./.lake/build/bin/simple-write

3. SIMD Tests

# Rebuild SIMD library and run tests
lake script run buildSimd
lake build simd-test
./.lake/build/bin/simd-test

4. Full Test Suite

# Run all tests
lake build test-all
./.lake/build/bin/test-all

Troubleshooting

Issue: "Build failed: native library not found"

Solution: Rebuild the native library:

lake clean
lake script run buildNative
lake build

Issue: "No GPU adapters found"

Solution: Ensure you have proper GPU drivers:

  • macOS: No action needed (Metal is built-in)
  • Linux: Install Vulkan drivers (vulkan-tools, mesa-vulkan-drivers)
  • Windows: Install latest GPU drivers with D3D12/Vulkan support

Issue: "SIMD library not found"

Solution: Build the SIMD backend:

lake script run buildSimd

Issue: "FP16 not supported"

Solution: This is expected on older hardware. Float16 requires:

  • ARM64: ARMv8.2-A with FP16 extension (Apple M1+, AWS Graviton2+)
  • x86_64: F16C extension (Intel Ivy Bridge+ / AMD Bulldozer+)

The library will gracefully fall back to Float32 operations.

Issue: Dawn build takes too long

Solution: Dawn's first build can take 10-15 minutes. Subsequent builds are incremental and much faster. To speed up:

# Use more CPU cores (adjust -j value)
lake script run buildNative -- -j 16

How It Works

┌─────────────────────────────────────────────────────────────┐
│                    Lean 4 Code                               │
│  • Type-safe shader DSL                                      │
│  • Tensor operations                                         │
│  • Formal proofs                                             │
└─────────────────────┬───────────────────────────────────────┘
                      │
                      ▼
┌─────────────────────────────────────────────────────────────┐
│              WGSL Code Generation                            │
│  Exp (.scalar .f32) → WGSL shader source                    │
└─────────────────────┬───────────────────────────────────────┘
                      │
                      ▼
┌─────────────────────────────────────────────────────────────┐
│              Lean FFI (C++ Bridge)                           │
│  • lean_hesper_* functions                                   │
│  • Resource management via Lean.External                     │
└─────────────────────┬───────────────────────────────────────┘
                      │
                      ▼
┌─────────────────────────────────────────────────────────────┐
│              Google Dawn (WebGPU Native)                     │
│  • Metal (macOS)                                             │
│  • Vulkan (Linux/Windows)                                    │
│  • D3D12 (Windows)                                           │
└─────────────────────────────────────────────────────────────┘

Architecture Layers

  1. DSL Layer: Type-safe WGSL expression builder with dependent types
  2. Tensor Layer: High-level operations (matmul, conv2d, pooling)
  3. Compute Layer: Shader compilation, buffer management, execution
  4. WebGPU Layer: FFI bindings to Dawn native implementation
  5. Backend Layer: Platform-specific GPU drivers (Metal/Vulkan/D3D12)

Project Structure

Hesper/
├── Hesper/
│   ├── WGSL/          # Type-safe shader DSL
│   │   ├── Types.lean      # WGSL type system
│   │   ├── Exp.lean        # Expression AST
│   │   └── DSL.lean        # User-facing DSL
│   ├── WebGPU/        # WebGPU bindings
│   │   ├── Device.lean     # GPU device management
│   │   ├── Buffer.lean     # GPU buffers
│   │   ├── Shader.lean     # Shader modules
│   │   ├── Pipeline.lean   # Compute/render pipelines
│   │   └── Errors.lean     # Comprehensive error handling
│   ├── Tensor/        # Tensor operations
│   │   └── MatMul.lean     # Matrix multiplication
│   ├── NN/            # Neural network layers
│   │   └── Conv.lean       # Convolution layers
│   ├── GLFW/          # Windowing and graphics
│   │   └── GLFW.lean       # GLFW bindings
│   ├── Simd.lean      # SIMD Float64 operations
│   ├── Float32.lean   # SIMD Float32 operations
│   ├── Float16.lean   # SIMD Float16 operations
│   └── Compute.lean   # High-level compute API
├── Examples/          # Example programs
│   ├── Tetris.lean         # Full game demo
│   ├── MultiGPU.lean       # Multi-GPU support
│   ├── DSLBasics.lean      # DSL tutorial
│   └── ...
├── native/            # C++ WebGPU bridge
│   ├── bridge.cpp          # FFI implementation
│   └── CMakeLists.txt      # Build configuration
├── c_src/             # SIMD CPU backend
│   └── simd_ops.cpp        # NEON/AVX2 implementations
├── Tests/             # Comprehensive test suite
│   ├── ErrorTests.lean     # Error handling tests
│   ├── ShaderTests.lean    # Shader monad tests
│   └── ...
└── lakefile.lean      # Lake build script

Roadmap

Current Status: Early Development (Alpha)

  • Multi-precision SIMD CPU backend (Google Highway)

  • Architecture detection (NEON/AVX2/F16C)

  • Comprehensive error handling with structured error types

  • Complete test suite (error handling, shader monad)

  • Docker-based CI environment

  • Verified Composable Kernels (VerifiedOpFusion)

  • BitNet b1.58 inference engine (125 TPS on M4 Max)

  • Kernel fusion: fused gate+up+ReLU²×mul, fused KV cache write

  • KV cache with grouped-query attention

  • PreparedDispatch graph capture (99%+ cache hit rate)

In Progress:

  • Comprehensive tensor operation library (GEMM, Conv3D)
  • Gemma 3 / Transformer support
  • Automatic differentiation on GPU kernels
  • Formal proofs of kernel numerical stability
  • Integration with Lean's tactic framework

Contributing

Hesper is part of the Verilean organization's effort to bring verified computing to GPUs.

How to Contribute

  1. Fork the repository and create a feature branch
  2. Make your changes following the existing code style
  3. Run the test suite to ensure nothing broke:
    # Core FFI boundary tests
    lake build ffi-tests
    ./.lake/build/bin/ffi-tests
    
    # GPU shader tests (raw WGSL + DSL)
    lake build simple-write
    ./.lake/build/bin/simple-write
    
    # SIMD tests (if you modified SIMD code)
    lake script run buildSimd
    lake build simd-test
    ./.lake/build/bin/simd-test
  4. Add tests for new features (see Examples/Tests/ for examples)
  5. Submit a pull request with a clear description of changes

Testing Guidelines

  • FFI changes: Always run test-ffi to verify Lean ↔ C++ data marshalling
  • DSL changes: Run simple-write to verify WGSL code generation
  • GPU operations: Test with real GPU hardware, not just compilation
  • SIMD changes: Test on both ARM64 (NEON) and x86_64 (AVX2) if possible
  • Cross-platform: macOS (Metal), Linux (Vulkan), Windows (D3D12/Vulkan)

Code Organization for Contributors

Hesper/
├── Hesper/               # Core library
│   ├── WGSL/            # Type-safe shader DSL
│   ├── WebGPU/          # WebGPU bindings (Device, Buffer, Shader, Pipeline)
│   ├── Compute.lean     # High-level compute API
│   ├── Simd.lean        # SIMD Float64 operations
│   ├── Float32.lean     # SIMD Float32 operations
│   └── Float16.lean     # SIMD Float16 operations
├── Examples/             # Example programs (organized by category)
│   ├── DSL/             # DSL feature demonstrations
│   ├── Compute/         # GPU compute examples
│   ├── MachineLearning/ # Neural network training
│   ├── Graphics/        # GLFW rendering demos
│   ├── SIMD/            # CPU SIMD benchmarks
│   ├── Tests/           # Integration tests
│   └── Utilities/       # Helper utilities
├── Tests/                # Unit tests
│   ├── FFIBoundaryTests.lean  # Lean ↔ C++ data conversion tests
│   └── FusionTest.lean        # Operator fusion tests
├── native/               # C++ WebGPU bridge
│   ├── bridge.cpp       # FFI implementation (lean_hesper_* functions)
│   └── CMakeLists.txt   # Dawn build configuration
├── c_src/                # SIMD CPU backend
│   └── simd_ops.cpp     # NEON/AVX2 implementations
└── lakefile.lean         # Lake build script

Key files for contributors:

  • native/bridge.cpp: FFI boundary - all Lean ↔ C++ data conversion happens here
  • Hesper/WGSL/Monad.lean: ShaderM monad for imperative shader construction
  • Hesper/WGSL/Execute.lean: Compiles ShaderM → WGSL and executes on GPU
  • Examples/Tests/SimpleWrite.lean: Reference test showing raw WGSL vs DSL execution
  • Tests/FFIBoundaryTests.lean: Reference test for FFI data conversion

Links

Author

Junji Hashimoto

Twitter/X: @junjihashimoto3

License

Apache License 2.0 - see LICENSE file for details

Acknowledgments

  • Google Dawn for the WebGPU native implementation
  • Lean 4 for the foundation of verified programming
  • WebGPU Working Group for the standard
  • gpu.cpp (Answer.AI): High-level C++ API wrapper inspiration.

Write GPU code that's not just fast—make it correct by construction.

About

Verified GPU programming framework for Lean 4. Write type-safe WebGPU shaders with formal verification, hardware-accelerated matrix ops, and cross-platform support (Metal/Vulkan/D3D12). Build provably correct GPU compute and ML inference engines.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages