|
1 | | -# lux-gpu |
| 1 | +# Lux GPU - High-Performance Cryptographic GPU Library |
2 | 2 |
|
3 | | -GPU acceleration foundation for the Lux crypto stack. |
| 3 | +> **Cross-platform GPU acceleration for cryptography, FHE, and zero-knowledge proofs** |
| 4 | +
|
| 5 | +[](LICENSE) |
| 6 | +[](https://isocpp.org/std/the-standard) |
4 | 7 |
|
5 | 8 | ## Overview |
6 | 9 |
|
7 | | -This library provides high-performance array operations accelerated by Metal (Apple Silicon) and CUDA (NVIDIA). It serves as the foundation layer for all GPU-accelerated cryptographic operations in the Lux ecosystem. |
| 10 | +Lux GPU is a high-performance GPU compute library optimized for cryptographic operations. It provides portable, production-ready implementations of: |
8 | 11 |
|
9 | | -Based on [MLX](https://github.com/ml-explore/mlx) from Apple machine learning research, with extensions for cryptographic workloads. |
| 12 | +- **Number Theoretic Transform (NTT)** - Foundation for polynomial multiplication |
| 13 | +- **Fast Fourier Transform (FFT)** - Complex signal processing |
| 14 | +- **Elliptic Curve Operations** - BLS12-381, BN254 curve arithmetic |
| 15 | +- **Multi-Scalar Multiplication (MSM)** - Batched elliptic curve operations |
| 16 | +- **Cryptographic Hashing** - Poseidon, Blake3 |
| 17 | +- **Fully Homomorphic Encryption (FHE)** - TFHE blind rotation, CKKS |
10 | 18 |
|
11 | | -## Features |
| 19 | +## Backend Support |
12 | 20 |
|
13 | | -- **Unified Memory** - Arrays live in shared memory, accessible from CPU and GPU |
14 | | -- **Lazy Evaluation** - Computations deferred until results needed |
15 | | -- **Metal Backend** - Native Apple Silicon GPU acceleration |
16 | | -- **CUDA Backend** - NVIDIA GPU support (planned) |
17 | | -- **FFT/NTT** - Optimized transforms for polynomial arithmetic |
18 | | -- **Batch Operations** - Parallel processing of independent operations |
| 21 | +| Backend | Platform | Status | |
| 22 | +|---------|----------|--------| |
| 23 | +| **Metal** | macOS/iOS (Apple Silicon) | ✅ Full Support | |
| 24 | +| **WebGPU** | Cross-platform (via Dawn/wgpu) | ✅ Full Support | |
| 25 | +| **CPU** | All platforms (SIMD) | ✅ Fallback | |
| 26 | +| **CUDA** | NVIDIA GPUs | 🔒 Private (contact us) | |
19 | 27 |
|
20 | | -## Dependencies |
| 28 | +## Quick Start |
21 | 29 |
|
22 | | -Built on top of: |
23 | | -- **lux-gpu** (this library) - Base array operations |
| 30 | +### Prerequisites |
24 | 31 |
|
25 | | -Used by: |
26 | | -- **lux-lattice** - NTT acceleration for lattice cryptography |
27 | | -- **lux-crypto** - BLS pairing acceleration |
| 32 | +- CMake 3.20+ |
| 33 | +- C++17 compiler |
| 34 | +- For Metal: Xcode 12+ on macOS |
| 35 | +- For WebGPU: Dawn or wgpu-native |
28 | 36 |
|
29 | | -## Installation |
| 37 | +### Building |
30 | 38 |
|
31 | 39 | ```bash |
32 | | -cmake -B build -DCMAKE_INSTALL_PREFIX=/usr/local |
33 | | -cmake --build build -j |
34 | | -cmake --install build |
| 40 | +# Clone the repository |
| 41 | +git clone https://github.com/luxfi/gpu.git |
| 42 | +cd gpu |
| 43 | + |
| 44 | +# Create build directory |
| 45 | +mkdir build && cd build |
| 46 | + |
| 47 | +# Configure with desired backends |
| 48 | +cmake .. \ |
| 49 | + -DLUX_BUILD_METAL=ON \ |
| 50 | + -DLUX_BUILD_WEBGPU=OFF \ |
| 51 | + -DCMAKE_BUILD_TYPE=Release |
| 52 | + |
| 53 | +# Build |
| 54 | +make -j$(nproc) |
| 55 | + |
| 56 | +# Install |
| 57 | +sudo make install |
| 58 | +``` |
| 59 | + |
| 60 | +### CMake Integration |
| 61 | + |
| 62 | +```cmake |
| 63 | +find_package(lux-gpu REQUIRED) |
| 64 | +target_link_libraries(your_target PRIVATE lux::gpu) |
35 | 65 | ``` |
36 | 66 |
|
37 | | -## Usage |
| 67 | +## Usage Examples |
| 68 | + |
| 69 | +### NTT (Number Theoretic Transform) |
38 | 70 |
|
39 | 71 | ```cpp |
40 | | -#include <lux/gpu/array.h> |
41 | | -#include <lux/gpu/ops.h> |
| 72 | +#include <lux/gpu/ntt.h> |
42 | 73 |
|
43 | | -// Create arrays |
44 | | -auto a = lux::gpu::array({1.0f, 2.0f, 3.0f, 4.0f}); |
45 | | -auto b = lux::gpu::array({5.0f, 6.0f, 7.0f, 8.0f}); |
| 74 | +// Initialize NTT context |
| 75 | +auto ctx = lux::gpu::NttContext::create(1024); // N = 1024 |
46 | 76 |
|
47 | | -// GPU-accelerated operations |
48 | | -auto c = lux::gpu::add(a, b); |
49 | | -auto d = lux::gpu::matmul(a.reshape({2, 2}), b.reshape({2, 2})); |
| 77 | +// Forward NTT |
| 78 | +std::vector<uint64_t> poly(1024); |
| 79 | +ctx->forward(poly.data(), poly.size()); |
50 | 80 |
|
51 | | -// FFT for signal processing |
52 | | -auto spectrum = lux::gpu::fft::fft(a); |
| 81 | +// Inverse NTT |
| 82 | +ctx->inverse(poly.data(), poly.size()); |
53 | 83 | ``` |
54 | 84 |
|
55 | | -## CMake Integration |
| 85 | +### BLS12-381 Operations |
56 | 86 |
|
57 | | -```cmake |
58 | | -find_package(lux-gpu REQUIRED) |
59 | | -target_link_libraries(myapp PRIVATE lux::gpu) |
| 87 | +```cpp |
| 88 | +#include <lux/gpu/bls12_381.h> |
| 89 | +
|
| 90 | +using namespace lux::gpu; |
| 91 | +
|
| 92 | +// Point multiplication |
| 93 | +auto G1 = bls12::G1Affine::generator(); |
| 94 | +auto scalar = bls12::Scalar::from_bytes(data); |
| 95 | +auto result = bls12::g1_mul(G1, scalar); |
| 96 | +
|
| 97 | +// Batch MSM (Multi-Scalar Multiplication) |
| 98 | +std::vector<bls12::G1Affine> points = {...}; |
| 99 | +std::vector<bls12::Scalar> scalars = {...}; |
| 100 | +auto msm_result = bls12::msm(points, scalars); |
60 | 101 | ``` |
61 | 102 |
|
62 | | -## Go Bindings |
| 103 | +### Poseidon Hash |
63 | 104 |
|
64 | | -See [luxfi/crypto](https://github.com/luxfi/crypto) for Go bindings that wrap this library. |
| 105 | +```cpp |
| 106 | +#include <lux/gpu/poseidon.h> |
| 107 | + |
| 108 | +// Hash two field elements |
| 109 | +auto a = lux::gpu::Fe::from_u64(42); |
| 110 | +auto b = lux::gpu::Fe::from_u64(123); |
| 111 | +auto hash = lux::gpu::poseidon_hash_2(a, b); |
| 112 | +``` |
65 | 113 |
|
66 | 114 | ## Architecture |
67 | 115 |
|
68 | 116 | ``` |
69 | | -lux-gpu (this) ← Foundation (Metal/CUDA) |
70 | | - ▲ |
71 | | -lux-lattice ← NTT acceleration |
72 | | - ▲ |
73 | | -lux-fhe ← TFHE/CKKS/BGV |
| 117 | +lux-gpu/ |
| 118 | +├── mlx/ # Core library |
| 119 | +│ ├── backend/ |
| 120 | +│ │ ├── metal/ # Apple Metal shaders (.metal) |
| 121 | +│ │ └── webgpu/ # Portable WGSL shaders (.wgsl) |
| 122 | +│ └── kernels/ # Kernel registry and dispatch |
| 123 | +├── include/ # Public headers |
| 124 | +├── benchmarks/ # Performance tests |
| 125 | +└── tests/ # Unit tests |
74 | 126 | ``` |
75 | 127 |
|
76 | | -## Documentation |
| 128 | +## Performance |
| 129 | + |
| 130 | +Benchmarked on Apple M1 Max: |
| 131 | + |
| 132 | +| Operation | Lux GPU | Reference | Speedup | |
| 133 | +|-----------|---------|-----------|---------| |
| 134 | +| NTT (N=2^20) | 2.1ms | 12ms (CPU) | 5.7x | |
| 135 | +| MSM (2^16 points) | 48ms | 320ms (CPU) | 6.7x | |
| 136 | +| Poseidon (batch 10K) | 0.8ms | 8ms (CPU) | 10x | |
| 137 | +| Blind Rotate (TFHE) | 1.2ms | 15ms (CPU) | 12.5x | |
| 138 | + |
| 139 | +## CUDA Support |
| 140 | + |
| 141 | +High-performance CUDA kernels are available for NVIDIA GPUs through a separate commercial license. These provide: |
77 | 142 |
|
78 | | -- [GPU Acceleration Guide](https://luxfi.github.io/crypto/docs/gpu-acceleration) |
79 | | -- [C++ Libraries Overview](https://luxfi.github.io/crypto/docs/cpp-libraries) |
| 143 | +- 2-3x faster MSM than open-source alternatives |
| 144 | +- Optimized memory access patterns |
| 145 | +- Multi-GPU support |
| 146 | +- Production-ready for blockchain validators |
| 147 | + |
| 148 | +**Contact**: cuda@lux.industries |
80 | 149 |
|
81 | 150 | ## License |
82 | 151 |
|
83 | | -MIT License - see [LICENSE](LICENSE) |
| 152 | +BSD 3-Clause License - Ecosystem Edition (BSD-3-Clause-Eco) |
| 153 | + |
| 154 | +``` |
| 155 | +Copyright (c) 2024-2026 Lux Industries Inc. |
| 156 | +
|
| 157 | +Commercial use of this software is permitted provided that the software |
| 158 | +operates as part of, or in connection with, the Lux Network of blockchains. |
| 159 | +
|
| 160 | +For external commercial licensing, contact: license@lux.industries |
| 161 | +``` |
| 162 | + |
| 163 | +See [LICENSE](LICENSE) for full terms. |
| 164 | + |
| 165 | +## Contributing |
| 166 | + |
| 167 | +We welcome contributions! Please see [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines. |
| 168 | + |
| 169 | +## Related Projects |
| 170 | + |
| 171 | +- [lux/node](https://github.com/luxfi/node) - Lux blockchain node |
| 172 | +- [lux/coreth](https://github.com/luxfi/coreth) - EVM implementation |
| 173 | +- [lux/fhe](https://github.com/luxfi/fhe) - FHE library using lux-gpu |
| 174 | +- [lux/crypto](https://github.com/luxfi/crypto) - Cryptographic primitives |
| 175 | + |
| 176 | +## Support |
| 177 | + |
| 178 | +- **Documentation**: https://docs.lux.industries/gpu |
| 179 | +- **Issues**: https://github.com/luxfi/gpu/issues |
| 180 | +- **Discord**: https://discord.gg/lux |
84 | 181 |
|
85 | | -## Links |
| 182 | +--- |
86 | 183 |
|
87 | | -- [lux-lattice](https://github.com/luxcpp/lattice) - Lattice cryptography |
88 | | -- [lux-fhe](https://github.com/luxcpp/fhe) - Fully Homomorphic Encryption |
89 | | -- [lux-crypto](https://github.com/luxcpp/crypto) - Core cryptography |
90 | | -- [luxfi/crypto](https://github.com/luxfi/crypto) - Go bindings |
| 184 | +Built with ❤️ by [Lux Industries Inc.](https://lux.industries) | [Hanzo AI](https://hanzo.ai) |
0 commit comments