Skip to content

ShihabRezaAdit/DMSA-CloudNet

Repository files navigation

DMSA-CloudNet

Dilated Multi-Scale Attention CloudNet for Ground-Based Cloud Segmentation

Python PyTorch License Model Size Parameters

Md Shihab Reza¹ · Dr. Muhammad Hasibur Rashid Chayon²
¹ North South University (NSU), Dhaka, Bangladesh
² American International University-Bangladesh (AIUB), Dhaka, Bangladesh


Abstract

We present DMSA-CloudNet, an ultra-lightweight encoder-decoder architecture for ground-based cloud segmentation. Despite containing only 1,727 parameters (6.75 KB) — smaller than a typical JPEG image — DMSA-CloudNet achieves competitive performance against significantly larger baselines across three public benchmarks: SWIMSeg, SWINSeg, and SWINYSeg. The model combines depthwise-separable convolutions, a Dilated Multi-Scale (DMS) bottleneck with four parallel receptive fields (d=1,2,4,8), and Squeeze-Excitation Skip Attention (SESA) to capture multi-scale cloud structures efficiently. A Hybrid Dice-BCE loss further stabilises training on imbalanced cloud/sky masks.


Qualitative Results

Qualitative comparison across all three datasets

Figure 1. Qualitative comparison of DMSA-CloudNet predictions vs. ground truth across SWIMSeg, SWINSeg, and SWINYSeg. Each row represents one dataset. Columns: Original Image | Ground Truth | DMSA-CloudNet Prediction.


Quantitative Results

Test-Set Performance (80/20 deterministic split, seed=42)

Dataset Precision Recall F1 Score Dice Score Error Rate Params
SWIMSeg 0.7458 0.8835 0.8088 0.8088 0.2226 6.75 KB
SWINSeg 0.8184 0.7690 0.7930 0.7930 0.1957 6.75 KB
SWINYSeg 0.7543 0.9111 0.8254 0.8254 0.2045 6.75 KB

All metrics are computed on the held-out 20% test split, never seen during training.


Architecture

                    ┌──────────────────────────────────┐
   Input            │         DMSA-CloudNet            │
  (3×H×W) ─────────►                                  │
                    │  Encoder (6-ch DS-Conv × 3)      │
                    │    ├─ s1 ──► SESA ──────────┐    │
                    │    ├─ s2 ──► SESA ───────┐  │    │
                    │    └─ s3 ──► SESA ────┐  │  │    │
                    │                       │  │  │    │
                    │  Adapt  (6 → 8 ch)    │  │  │    │
                    │                       │  │  │    │
                    │  DMS Bottleneck       │  │  │    │
                    │   ├─ d=1             │  │  │    │
                    │   ├─ d=2             │  │  │    │
                    │   ├─ d=4             │  │  │    │
                    │   └─ d=8             │  │  │    │
                    │   └──cat──►merge     │  │  │    │
                    │                       │  │  │    │
                    │  Decoder (6-ch DS)    │  │  │    │
                    │   ├─ up3 ◄─────────cat◄─┘  │    │
                    │   ├─ up2 ◄──────────cat◄───┘    │
                    │   └─ up1 ◄───────────cat◄────────┘
                    │                                  │
                    │  Output conv → Sigmoid           │
                    └─────────────────┬────────────────┘
                                      │
                              Mask (1×H×W)

Key Design Decisions

Component Description Benefit
Depthwise-Separable Conv Factorises 3×3 conv into depthwise + pointwise ~8× fewer parameters
DMS Bottleneck 4 parallel dilated branches (d=1,2,4,8) Multi-scale cloud context
SESA Attention SE-style channel recalibration on skip connections Sharpens boundary details
HybridDiceBCE Loss 0.5 × Dice + 0.5 × BCE Stable on class-imbalanced masks
Narrow channels (E=6, B=8) Ultra-low channel widths throughout Entire model < 7 KB

Datasets

Dataset Split Images Setting GT Format
SWIMSeg 811 train / 202 test 1,013 Daytime PNG binary
SWINSeg 92 train / 23 test 115 Nighttime JPG binary
SWINYSeg 5,414 train / 1,354 test 6,768 Day + Night PNG binary

Download and arrange under data/:

data/
├── swimseg/
│   ├── images/          # *.png
│   └── GTmaps/          # *_GT.png
├── swinseg/
│   ├── images/          # *.jpg
│   └── GTmaps/          # *_GT.jpg
└── swinyseg/
    ├── images/          # *.jpg
    └── GTmaps/          # *.png  (same stem, no suffix)

Repository Structure

DMSA-CloudNet/
│
├── models/
│   ├── __init__.py              # Model registry + get_model()
│   └── dmsa_cloudnet.py         # Full architecture implementation
│
├── utils/
│   ├── data_loader.py           # CloudDataset + deterministic splits
│   ├── losses.py                # DiceLoss + HybridDiceBCELoss
│   ├── metrics.py               # F1, Dice, Precision, Recall, Error Rate
│   └── visualization.py         # Training curves, comparison grids
│
├── assets/
│   └── comparison_figure.png    # Qualitative results figure
│
├── checkpoints/                 # Trained weights (see Pretrained Models)
├── data/                        # Datasets (download separately)
├── results/                     # Auto-generated evaluation outputs
├── logs/                        # Auto-generated training logs
│
├── config.py                    # All hyperparameters and paths
├── train.py                     # Training entry point
├── evaluate.py                  # Evaluation entry point
├── predict.py                   # Inference on arbitrary images
├── DMSA_CloudNet_Demo.ipynb     # End-to-end interactive notebook
│
├── requirements.txt
├── .gitignore
└── README.md

Quickstart

1. Clone & Install

git clone https://github.com/ShihabRezaAdit/DMSA-CloudNet.git
cd DMSA-CloudNet
pip install -r requirements.txt

2. Download Datasets

Download SWIMSeg, SWINSeg, and SWINYSeg and place them under data/ as shown above.

3. Train

# Single dataset
python train.py --dataset swimseg

# All three datasets sequentially
python train.py --dataset all

# Force re-train (override existing checkpoint)
python train.py --dataset swimseg --force

# Custom epoch count
python train.py --dataset swinyseg --epochs 30

4. Evaluate

# Single dataset — prints metrics table
python evaluate.py --dataset swimseg

# All datasets — also saves results/all_datasets_summary.csv
python evaluate.py --dataset all

5. Predict on New Images

# Single image → saves prediction mask
python predict.py --input sky.jpg --dataset swimseg

# Entire folder
python predict.py --input ./my_images/ --dataset swimseg --output ./predictions/

6. Interactive Notebook

jupyter notebook DMSA_CloudNet_Demo.ipynb

The notebook covers: setup → dataset exploration → training → evaluation → visual comparison → inference demo.


Reproducibility

Setting Value
Random seed 42
Train / Test split 80% / 20% (deterministic torch.Generator)
Image size 304 × 304
Optimiser Adam, lr=1e-3
LR schedule Exponential decay γ=0.95
Loss HybridDiceBCE (α=0.5)
Max epochs 50
Early stopping patience=10

To fully reproduce all reported results from scratch:

python train.py --dataset all
python evaluate.py --dataset all

All hyperparameters are centralised in config.py — no magic numbers elsewhere.


Configuration

Edit config.py to customise any hyperparameter:

class Config:
    IMG_SIZE       = 304      # Input resolution
    TRAIN_SPLIT    = 0.8      # 80/20 train-test split
    EPOCHS         = 50       # Max training epochs
    PATIENCE       = 10       # Early stopping patience
    LR             = 1e-3     # Initial learning rate
    LR_DECAY_GAMMA = 0.95     # Exponential LR decay
    ALPHA_HDBL     = 0.5      # Dice weight in HybridDiceBCELoss
    RANDOM_SEED    = 42       # Global random seed

Requirements

  • Python ≥ 3.9
  • PyTorch ≥ 2.0 (CUDA recommended)
  • See requirements.txt for full list
pip install -r requirements.txt

Citation

This paper is currently unpublished. If you use this code or build upon this work, please cite:

@misc{reza2026dmsacloudnet,
  title        = {DMSA-CloudNet: Dilated Multi-Scale Attention for Ultra-Lightweight Ground-Based Cloud Segmentation},
  author       = {Reza, Md Shihab and Chayon, Muhammad Hasibur Rashid},
  year         = {2026},
  note         = {Code available at https://github.com/ShihabRezaAdit/DMSA-CloudNet},
  institution  = {North South University and American International University-Bangladesh (AIUB), Dhaka, Bangladesh}
}

License

This project is licensed under the MIT License — see LICENSE for details.

Dataset licenses apply independently:

  • SWIMSeg / SWINSeg / SWINYSeg: please refer to each dataset's official page for usage terms.

North South University · American International University-Bangladesh (AIUB) · Dhaka, Bangladesh

If you find this work useful, please consider giving it a ⭐

About

Dilated Multi-Scale Attention CloudNet: Ultra-lightweight cloud segmentation (1,727 params / 6.75 KB). F1=0.8254 on SWINYSeg. PyTorch.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors