Skip to content

Bonus-Hunters/Scene-Style-Classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Deep Learning for Indoor Scene Style Classification

📌 Project Overview

This project applies deep learning to automatically classify indoor room images by interior design style. Instead of manually sorting or labeling design aesthetics, the system learns visual patterns from thousands of labeled room images and predicts the style with high consistency.

Placed 4th among college participants in a Kaggle competition for indoor scene style classification.


Core Objective

  • Extract meaningful visual features from indoor scenes
  • Classify rooms into predefined design style categories
  • Evaluate and compare multiple state-of-the-art CNN and Transformer models
  • Improve generalization using advanced data augmentation

Dataset Summary

  • Total images: 13,163

  • Number of styles (classes): 17

  • Styles include: Modern, Minimalist, Scandinavian, Industrial, Victorian, Boho, Shabby Chic, Contemporary, Tropical, Coastal, Farmhouse, and more

  • Data Split:

    • Training: 80% (10,530 images)
    • Validation: 20% (2,633 images)
  • Stratified split to preserve class balance

  • Class weighting applied to handle dataset imbalance


Data Augmentation Strategy

A dynamic augmentation pipeline was built using TensorFlow/Keras to improve robustness and reduce overfitting.

📐Geometric Augmentations

  • Horizontal flipping
  • Random translation (up to 10%)
  • Rotation (±10°)
  • Zoom (up to 20%)

Photometric Augmentations

  • Brightness & saturation variation (±20%)
  • Planckian Jitter (custom layer) to simulate realistic warm/cool lighting changes

Models Implemented & Evaluated

Vision Transformer (ViT-Base-Patch16-224)

  • Treats images as token sequences
  • Uses self-attention instead of convolution
  • Parameters: ~86M
  • Test Accuracy: 40%

EfficientNet-B3

  • Optimized CNN using MBConv + Squeeze-and-Excitation
  • Pretrained on ImageNet
  • Parameters: ~34M
  • Test Accuracy: 28%

ResNet50

  • Residual CNN baseline for hierarchical feature extraction
  • Parameters: ~24.7M
  • Validation Accuracy: 79%

ConvNeXt-Tiny

  • Modern CNN inspired by Vision Transformers
  • Efficient and lightweight (~28M params)
  • Test Accuracy: 38%

InceptionV3

  • Multi-scale convolution architecture
  • Strong at capturing complex scene structures
  • Validation Accuracy: 80%

Performance Summary

Model Parameters Epochs Training Accuracy Validation Accuracy
ViT-Base ~86M 10 0.50 0.45
ConvNeXt-Tiny ~28M 10 0.79 0.47
ResNet50 ~22M 60 0.93 0.79
EfficientNet-B3 ~34M 25 0.40 0.36
InceptionV3 ~25M 60 0.92 0.80

Training Techniques Used

  • Transfer Learning (custom classification heads)
  • Adam Optimizer (best performance among tested optimizers)
  • Early Stopping to prevent overfitting
  • ReduceLROnPlateau for adaptive learning rate tuning
  • Model Checkpointing to save best validation model

About

A repo for the Neural Network Project and Competition

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors