Cloud Type Recognition using CNNs

A deep learning project for semantic segmentation of cloud types from satellite imagery using Convolutional Neural Networks.

📌 Overview

Weather forecasting has been crucial to human civilization for millennia. While traditional physics-based simulations have dominated meteorological analysis, machine learning has recently emerged as a powerful complementary approach. This project focuses on one specific application: classifying cloud types from satellite imagery using semantic segmentation.

Since weather patterns are inherently spatial, Convolutional Neural Networks (CNNs) are particularly well-suited for this task. This project serves as a learning playground for:

Dataset creation – Building image and mask datasets from raw data
Image augmentation – Applying transformations to improve model robustness
CNN architectures – Experimenting with different network structures for semantic segmentation

The primary goal is to develop a complete data pipeline and explore various CNN architectures for cloud type classification.

🗂️ Dataset

This project uses the Understanding Cloud Organization dataset from a 2019 Kaggle competition.

Dataset Details:

Satellite images of clouds with hand-labeled segmentation masks
Four cloud types: Flower, Gravel, Fish, and Sugar
Task: Predict segmentation masks for each cloud type given an input image
Evaluation metric: Dice coefficient (measures pixel-wise prediction accuracy)

⚙️ Installation

Prerequisites

Python 3.7+
Kaggle account and API credentials
(Optional) Access to a GPU cluster for training

Setup Instructions

1. Clone the repository

git clone git@github.com:SATheinen/cloud-type-recognition.git
cd cloud-type-recognition

2. Create and activate a virtual environment

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -r requirements.txt

3. Configure Kaggle API

First, create a Kaggle account and generate API credentials from your account settings.

pip install kaggle
mkdir -p ~/.kaggle
touch ~/.kaggle/kaggle.json

Edit ~/.kaggle/kaggle.json and paste your API credentials:

{
  "username": "your_username",
  "key": "your_api_key"
}

Set proper permissions:

chmod 600 ~/.kaggle/kaggle.json

4. Download the dataset

kaggle competitions download -c understanding_cloud_organization
unzip understanding_cloud_organization.zip

🚀 Usage

Local Execution

Option 1: Jupyter Notebook

jupyter notebook main.ipynb

Option 2: Visual Studio Code

code main.ipynb

Cluster Execution

For training on a compute cluster (configurations may need adjustment based on your environment):

1. Connect to cluster and submit job

ssh your_cluster
cd cloud-type-recognition
sbatch job.slurm

2. Create SSH tunnel (from your local machine)

ssh -L 8889:localhost:8889 -J user_name@login user_name@node_address

3. Access notebook

Open http://localhost:8889 in your browser to access the Jupyter notebook running on cluster resources.

Configuration

All hyperparameters and settings can be configured in the second cell of main.ipynb. To run a full training session:

Set your desired parameters
Click Kernel → Restart Kernel and Run All Cells

Example Output

Epoch: 9
Train loss: 0.7080
Val loss: 0.6830
Dice coefficient: 0.3156

⭐ Features

🎭 Complete data pipeline – End-to-end image and mask loading
🔁 Heavy augmentation – Extensive image transformations for robustness
💬 Dynamic loss functions – Flexible loss calculation for different scenarios
🧱 Debugging tools – Built-in utilities for development and troubleshooting

📁 Project Structure

cloud-type-recognition/
├── job.slurm                  # SLURM batch script for cluster execution
├── main.ipynb                 # Main training notebook
├── README.md                  # This file
├── requirements.txt           # Python dependencies
├── train.csv                  # Training labels and metadata
├── sample_submission.csv      # Example submission format
├── train_images/              # Training images directory
├── train_images_broken/       # Corrupted/invalid images
├── train_images.txt           # List of training image files
└── test_images/               # Test images directory

🔧 Technologies Used

Python – Primary programming language
PyTorch – Deep learning framework
Segmentation Models PyTorch – Pre-built architectures (GitHub)
Albumentations – Image augmentation library
Jupyter Notebook – Interactive development environment

📊 Results

The model successfully learns to segment different cloud types from satellite imagery. Below are examples comparing the original images, ground truth masks, and model predictions:

Original Image	Ground Truth Mask	Model Prediction

Raw satellite imagery	Hand-labeled segmentation	CNN output

Model Performance

Metric	Value
Final Train Loss	0.7080
Final Val Loss	0.6830
Dice Coefficient	0.3156

The model shows promising segmentation capabilities, with room for improvement through hyperparameter tuning and architecture optimization.

🤝 Contributing

This is primarily a personal learning project, but suggestions and improvements are welcome! Feel free to open an issue or submit a pull request.

📄 License

This project is available under the MIT License. See LICENSE file for details.

🙏 Acknowledgments

Kaggle Understanding Cloud Organization Competition for providing the dataset
Segmentation Models PyTorch for pre-built architectures (MIT License)

👤 Author

Silas Theinen

🔗 GitHub: @SATheinen
💼 LinkedIn: Silas Theinen

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cloud Type Recognition using CNNs

📌 Overview

🗂️ Dataset

⚙️ Installation

Prerequisites

Setup Instructions

🚀 Usage

Local Execution

Cluster Execution

Configuration

Example Output

⭐ Features

📁 Project Structure

🔧 Technologies Used

📊 Results

Model Performance

🤝 Contributing

📄 License

🙏 Acknowledgments

👤 Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
example_images		example_images
README.md		README.md
job.slurm		job.slurm
main.ipynb		main.ipynb
requirements.txt		requirements.txt

SATheinen/cloud-type-recognition

Folders and files

Latest commit

History

Repository files navigation

Cloud Type Recognition using CNNs

📌 Overview

🗂️ Dataset

⚙️ Installation

Prerequisites

Setup Instructions

🚀 Usage

Local Execution

Cluster Execution

Configuration

Example Output

⭐ Features

📁 Project Structure

🔧 Technologies Used

📊 Results

Model Performance

🤝 Contributing

📄 License

🙏 Acknowledgments

👤 Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages