🚀 Hybrid-ModernCNN Digit Recognition Framework

A full-stack Deep Learning framework designed for building, testing, and deploying state-of-the-art digit recognition models.

📋 Table of Contents

Project Overview
Project Metadata
Key Features
Architecture & Methodology
Performance & Results
Installation & Usage
The Mission
About the Author

🔭 Project Overview

Hybrid-ModernCNN is not just a classification model; it is a comprehensive testbed for Computer Vision experimentation. It combines a high-performance backend (PyTorch/FastAPI) with an interactive React frontend, allowing users to draw digits in real-time and see immediate predictions.

The core utilizes a Modern Hybrid Architecture, merging the residual learning of ResNet with the design philosophy of ConvNeXt. It is optimized for real-world handwriting, handling irregular strokes, varying thickness, and noise through a robust preprocessing pipeline.

📸 Interface Preview

The interactive drawing board featuring real-time inference and ensemble predictions.

🛠 Project Metadata

Attribute	Details
Author	Sobhan Nasiri
Major	Computer Engineering
Current Version	1.0.0-Stable
Last Update	February 2026
Frameworks	PyTorch, FastAPI, React.js, OpenCV
License	MIT

🌟 Key Features

🧠 Advanced Deep Learning

Modern Architecture: Custom block design utilizing Depthwise Convolutions, GELU activation, and LayerNorm2d for stable training.
Ensemble-Ready: Includes a dedicated Inference_Handler class capable of combining multiple models (Soft Voting) to maximize accuracy.
Large Kernel Design: Uses 7x7 kernels to expand the Receptive Field, mimicking the global attention mechanisms found in Vision Transformers.

🛡️ Robust Preprocessing Pipeline

Real-world data is messy. This project uses OpenCV to standardize user inputs:

Auto-Centering: Centers the digit based on center-of-mass.
Padding & Resizing: Maintains aspect ratio while resizing to 28x28.
Data Augmentation: Salt & Pepper noise, Gaussian Blur, and Soft Dilation to simulate ink spread.

Final quality check: Visualizing correct orientation and realistic noise simulation.

🏗 Architecture & Methodology

The model consists of Modern Blocks designed to overcome the limitations of standard CNNs:

Residual Connections: Prevents gradient vanishing, allowing for deeper networks.
Inverted Bottleneck: Expands channels by 4x inside the block to increase feature extraction capacity before projecting back.
Modern Components: Replaces standard BatchNorm/ReLU with LayerNorm and GELU, aligning with state-of-the-art ConvNeXt designs.

📊 Performance & Results

The model achieves exceptional accuracy on the validation set, demonstrating the effectiveness of the hybrid architecture.

Peak Accuracy: 99.50% - 99.54%
Training Efficiency: Optimized DataLoaders with num_workers=8 and pin_memory=True.

Training logs showing convergence to >99.5% accuracy.

💻 Installation & Usage

This project is split into a Python backend and a React frontend. Follow the steps below to get the system running.

1. Backend Setup (Python/FastAPI)

Navigate to the backend directory and start the Uvicorn server:

cd Digits_backend
# Install dependencies (if not already done)
pip install -r requirements.txt

# Run the server
uvicorn main:app --reload

2. Frontend Setup (React)
Open a new terminal, navigate to the frontend directory, and start the development server:

cd digits_frontend
# Install dependencies
npm install

# Run the application
npm run dev

Note: Once both servers are running, open the localhost link provided by the frontend terminal (usually http://localhost:5173) to access the application.

🎯 The Mission
Why was this built?

The primary goal of this project is to provide a modular framework for students and researchers interested in the MNIST dataset and digit recognition.

Instead of building a UI and preprocessing pipeline from scratch, you can use this project as a foundation. It provides:

Ready-to-use Training Loops: Just define your model.

Optimized DataLoaders: Pre-configured for performance.

Interactive Testing: A draw-and-predict interface to test your model against real human handwriting immediately.

You focus on the architecture; this framework handles the rest.

👨‍💻 About the Author
Sobhan Nasiri
Computer Engineering Student (Term 4)

Passionate about Computer Vision and Deep Learning. This project represents a deep dive into modern neural network architectures, moving beyond basic tutorials to implement research-grade concepts like Inverted Bottlenecks and Ensemble Inference.

© 2026 Sobhan Nasiri. Licensed under MIT.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.github/workflows		.github/workflows
Create_Model		Create_Model
Model_Handler_PTH		Model_Handler_PTH
digits-backend		digits-backend
digits-frontend		digits-frontend
.gitignore		.gitignore
Figure_1.png		Figure_1.png
README.md		README.md
Training.png		Training.png
environment.yml		environment.yml
image_3f5aff.png		image_3f5aff.png
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚀 Hybrid-ModernCNN Digit Recognition Framework

📋 Table of Contents