This project applies deep learning to automatically classify indoor room images by interior design style. Instead of manually sorting or labeling design aesthetics, the system learns visual patterns from thousands of labeled room images and predicts the style with high consistency.
Placed 4th among college participants in a Kaggle competition for indoor scene style classification.
- Extract meaningful visual features from indoor scenes
- Classify rooms into predefined design style categories
- Evaluate and compare multiple state-of-the-art CNN and Transformer models
- Improve generalization using advanced data augmentation
-
Total images: 13,163
-
Number of styles (classes): 17
-
Styles include: Modern, Minimalist, Scandinavian, Industrial, Victorian, Boho, Shabby Chic, Contemporary, Tropical, Coastal, Farmhouse, and more
-
Data Split:
- Training: 80% (10,530 images)
- Validation: 20% (2,633 images)
-
Stratified split to preserve class balance
-
Class weighting applied to handle dataset imbalance
A dynamic augmentation pipeline was built using TensorFlow/Keras to improve robustness and reduce overfitting.
- Horizontal flipping
- Random translation (up to 10%)
- Rotation (±10°)
- Zoom (up to 20%)
- Brightness & saturation variation (±20%)
- Planckian Jitter (custom layer) to simulate realistic warm/cool lighting changes
- Treats images as token sequences
- Uses self-attention instead of convolution
- Parameters: ~86M
- Test Accuracy: 40%
- Optimized CNN using MBConv + Squeeze-and-Excitation
- Pretrained on ImageNet
- Parameters: ~34M
- Test Accuracy: 28%
- Residual CNN baseline for hierarchical feature extraction
- Parameters: ~24.7M
- Validation Accuracy: 79%
- Modern CNN inspired by Vision Transformers
- Efficient and lightweight (~28M params)
- Test Accuracy: 38%
- Multi-scale convolution architecture
- Strong at capturing complex scene structures
- Validation Accuracy: 80%
| Model | Parameters | Epochs | Training Accuracy | Validation Accuracy |
|---|---|---|---|---|
| ViT-Base | ~86M | 10 | 0.50 | 0.45 |
| ConvNeXt-Tiny | ~28M | 10 | 0.79 | 0.47 |
| ResNet50 | ~22M | 60 | 0.93 | 0.79 |
| EfficientNet-B3 | ~34M | 25 | 0.40 | 0.36 |
| InceptionV3 | ~25M | 60 | 0.92 | 0.80 |
- Transfer Learning (custom classification heads)
- Adam Optimizer (best performance among tested optimizers)
- Early Stopping to prevent overfitting
- ReduceLROnPlateau for adaptive learning rate tuning
- Model Checkpointing to save best validation model