Le Kien Cuong lecuong1502

Hi there, I'm Le Kien Cuong 👋

Software & AI Engineer | GPU Computing, Machine Learning and Deep Learning

I am passionate about stripping away the "black box" of modern AI frameworks and building high-performance systems from the ground up. I specialize in writing raw CUDA kernels, optimizing deep learning models, and exploring the limits of GPU architectures.

👨‍💻 About Me

🔭 I’m currently working on building high-performance, lightweight LLM inference engines, AI and NLP Applications.
🌱 I’m currently diving deep into NVIDIA GPU architecture, Machine Learning, Deep Learning and Optimization.
👯 I’m looking to collaborate on Open-source AI Infrastructure & Low-level Systems.
⚡ Fun fact: I believe the best way to understand how things work is to build them entirely from scratch.

🛠️ Tech Stack & Tools

Languages & Core Tools

Frameworks & Expertise

🔥 Featured Project

🚀 NanoInfer

A lightweight, from-scratch CUDA inference engine for transformer models. Built to deeply understand GPU computing fundamentals.

No PyTorch, No TensorRT. Just raw, hand-optimized CUDA C++.
Features Custom GEMM (tiled implementation with shared memory).
Implements Flash Attention v1/v2 and Online Softmax.
Achieves parity with TensorRT FP16 using INT8 Quantization (dp4a).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly