Algebraic enhancements for GEMM & AI accelerators
-
Updated
Feb 28, 2025 - Python
Algebraic enhancements for GEMM & AI accelerators
Open-source AI Accelerator Stack integrating compute, memory, and software — from RTL to PyTorch.
Minimal TPU implementation with 8x8 systolic array and PyTorch integration
SystemVerilog Implementations of CUDA/TensorCore/TPU GEMM Operations
AS501 AI Semiconductor Design Basics & Practice Final Project
High-performance systolic array computing framework with AI agents and medical compliance.
Parametric Verilog systolic implementation of Cannon's Matrix Multiplication on an M×M torus mesh.
Weight Stationary Systolic Array GEMM accelerator
4×4 7-bit matrix multiplication hardware accelerator using a systolic array, with a Python driver for the Basys 3 FPGA and a systolic array UVC using UVM.
Technical Showcase: 22B True-MoE Engine running on 6GB VRAM (GTX 1060). Demonstrates "Surgical" NF4 quantization, dynamic expert swapping, and the custom "Grace Hopper" pipeline.
Add a description, image, and links to the systolic-array topic page so that developers can more easily learn about it.
To associate your repository with the systolic-array topic, visit your repo's landing page and select "manage topics."