Skip to content
View lecuong1502's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report lecuong1502

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
lecuong1502/README.md

Hi there, I'm Le Kien Cuong 👋

Software & AI Engineer | GPU Computing, Machine Learning and Deep Learning

I am passionate about stripping away the "black box" of modern AI frameworks and building high-performance systems from the ground up. I specialize in writing raw CUDA kernels, optimizing deep learning models, and exploring the limits of GPU architectures.

Email LinkedIn


👨‍💻 About Me

  • 🔭 I’m currently working on building high-performance, lightweight LLM inference engines, AI and NLP Applications.
  • 🌱 I’m currently diving deep into NVIDIA GPU architecture, Machine Learning, Deep Learning and Optimization.
  • 👯 I’m looking to collaborate on Open-source AI Infrastructure & Low-level Systems.
  • ⚡ Fun fact: I believe the best way to understand how things work is to build them entirely from scratch.

🛠️ Tech Stack & Tools

Languages & Core Tools

C++ CUDA Python JavaScript CMake Linux

Frameworks & Expertise

PyTorch TensorFlow AI Machine Learning Deep Learning


🔥 Featured Project

A lightweight, from-scratch CUDA inference engine for transformer models. Built to deeply understand GPU computing fundamentals.

  • No PyTorch, No TensorRT. Just raw, hand-optimized CUDA C++.
  • Features Custom GEMM (tiled implementation with shared memory).
  • Implements Flash Attention v1/v2 and Online Softmax.
  • Achieves parity with TensorRT FP16 using INT8 Quantization (dp4a).

Popular repositories Loading

  1. NanoInfer NanoInfer Public

    A lightweight, from-scratch CUDA inference engine for Transformers. No PyTorch/TensorRT. Deep dive into Custom GEMM, Flash Attention, Fused Kernels & INT8.

    Cuda 5

  2. NanoOCR NanoOCR Public

    NanoOCR — Internal document OCR system powered by Qwen3-VL-4B-Instruct, with a FastAPI backend, PostgreSQL database, and a React frontend that displays recognized text as structured Markdown. Suppo…

    Python 2

  3. deep-learning-project deep-learning-project Public

    Jupyter Notebook 1

  4. Project-III-Dormitory-Management Project-III-Dormitory-Management Public

    Website supports managing multiple tasks and services at the university's dormitory

    JavaScript 1

  5. CUDA-Inference-Engine CUDA-Inference-Engine Public

    A minimal transformer inference engine built from scratch in CUDA C++, targeting GPT-2 scale models. No high-level ML frameworks — just raw CUDA kernels, explicit memory management, and a clean Pyt…

    Cuda 1

  6. shopping-web-backend shopping-web-backend Public

    JavaScript