Mini-Inference Engine

CUDA GEMM optimization tutorial and mini inference runtime
Compact C++17/CUDA codebase with a 7-stage kernel path and a conservative ~85% cuBLAS reference result on the RTX 3080 1024×1024 benchmark

简体中文 · Docs · Getting Started

What this repository contains

Mini-Inference Engine keeps the scope narrow:

Progressive GEMM kernels: naive, tiled, coalesced, double-buffered, register-blocked, fused, and vectorized CUDA implementations.
Minimal runtime pieces: Tensor, InferenceEngine, MemoryPool, StreamManager, AutoTuner, and Profiler.
Benchmarks and tests: buildable examples plus host/GPU test split.
Bilingual documentation: practical guides for building, profiling, and understanding the code.

Build and test

The stable local CUDA path uses the system GCC 12 / G++ 12 toolchain:

cmake --preset gcc-cuda
cmake --build --preset gcc-cuda
ctest --preset gcc-cuda

cmake --preset release-gcc-cuda
cmake --build --preset release-gcc-cuda
./build-release-gcc-cuda/benchmark

If your shell is already using a clean system compiler, default and release remain available. tests_host covers utilities that do not need a GPU device. tests_gpu covers CUDA runtime and kernel behavior; they may skip without an available NVIDIA GPU, but configuration and compilation still require the CUDA Toolkit.

Repository layout

Area	Purpose
`src/`	CUDA kernels and runtime implementation
`include/`	Public headers for kernels, runtime, and utilities
`benchmarks/`	Benchmark and demo entry points
`tests/`	Host tests and GPU-backed behavior tests
`docs/`	GitHub Pages source and long-form documentation
`CHANGELOG.md`	Single change log for the whole project

Documentation

Topic	English	中文
Getting started	docs/en/guides/getting-started.md	docs/zh/guides/getting-started.md
Architecture	docs/en/architecture.md	docs/zh/architecture.md
GEMM deep dive	docs/en/deep-dive/gemm-optimization.md	docs/zh/deep-dive/gemm-optimization.md
Performance tuning	docs/en/performance-tuning.md	docs/zh/performance-tuning.md
API reference	docs/en/api-reference.md	docs/zh/api-reference.md
Contributing	docs/en/contributing.md	docs/zh/contributing.md

Project rules

Use .clang-format; functions and variables use snake_case, classes use PascalCase, and constants/template parameters use UPPER_SNAKE_CASE.
Wrap CUDA API calls with CUDA_CHECK() and cuBLAS calls with CUBLAS_CHECK().
Prefer DeviceMemory or PooledMemory over raw GPU allocation lifetimes.
Add new source files explicitly to CMakeLists.txt; do not rely on recursive globbing.
Keep GitHub Pages focused on documentation and keep all release history in the root CHANGELOG.md.

Name		Name	Last commit message	Last commit date
Latest commit History 74 Commits
.github		.github
assets		assets
benchmarks		benchmarks
config		config
docs		docs
include		include
scripts		scripts
src		src
tests		tests
.clang-format		.clang-format
.clang-tidy		.clang-tidy
.clangd		.clangd
.editorconfig		.editorconfig
.gitignore		.gitignore
404.md		404.md
CHANGELOG.md		CHANGELOG.md
CMakeLists.txt		CMakeLists.txt
CMakePresets.json		CMakePresets.json
LICENSE		LICENSE
README.md		README.md
README.zh-CN.md		README.zh-CN.md
SECURITY.md		SECURITY.md
robots.txt		robots.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mini-Inference Engine

What this repository contains

Build and test

Repository layout

Documentation

Project rules

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Mini-Inference Engine

What this repository contains

Build and test

Repository layout

Documentation

Project rules

About

Topics

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages