This project focuses on semantic segmentation of aerial drone imagery using both CNN-Transformer hybrid models and pre-trained DeepLabV3 backbones.
The pipeline was designed to handle high-resolution drone images, achieving strong segmentation performance across multiple evaluation metrics.
- ✨ Key Highlights
- 🖼️ Example Outputs
- ⚙️ Methodology
- 📊 Results
- 📦 Tech Stack
- 🔮 Future Improvements
- 👤 Author
- 🧠 Computer Vision Model: Developed a semantic segmentation framework using a DeepLabV3 backbone integrated with a CNN-Transformer hybrid.
- 📈 Performance: Achieved 84% pixel-wise accuracy, 60% mIoU, and an F1-score of 74% on a dataset of 400 labeled aerial images.
- 🔄 Pipeline Engineering: Built a complete PyTorch pipeline for data preprocessing, training, and evaluation.
- 🛠️ Preprocessing: Implemented image tiling and mask remapping for efficient training on high-resolution drone imagery.
- 📊 Evaluation Metrics: Used mIoU, pixel accuracy, and F1-score for comprehensive model evaluation.
- High-resolution aerial drone images split into tiles for GPU-efficient training.
- Remapped segmentation masks into consistent class labels.
- DeepLabV3: Used pre-trained backbones for semantic segmentation.
- Hybrid CNN-Transformer: Designed a custom architecture combining convolutional layers for local feature extraction and Transformer blocks for global context.
- Metrics: Mean Intersection over Union (mIoU), Pixel Accuracy, F1-score.
- Evaluated on a public dataset of 400 labeled aerial drone images.
| Model | Pixel Accuracy | mIoU | F1 Score |
|---|---|---|---|
| DeepLabV3 (pre-trained) | 81% | 55% | 70% |
| CNN-Transformer (Hybrid) | 84% | 60% | 74% |
✅ Hybrid CNN-Transformer outperformed baseline DeepLabV3 across all metrics.
- Language: Python 3
- Frameworks/Libraries: PyTorch, Torchvision, NumPy, OpenCV, Matplotlib
- Deep Learning: DeepLabV3, Transformer layers
- Tools: Jupyter Notebook, CUDA-enabled GPU
- ✅ Expand dataset with more diverse aerial imagery.
- ✅ Add support for multi-class segmentation beyond current labels.
- ✅ Deploy as a web-based visualization tool with FastAPI/Streamlit.
- ✅ Experiment with Vision Transformers (ViT) and Swin Transformers.
Jacob Almon
Svetya Koppisetty
Connor MacDonald