Skip to content

charlesgchen/HABModel

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🛰️ HAB-Sentinel

Distributed Harmful Algal Bloom Detection Pipeline for Lake Erie using Hyperspectral Imaging (HSI) and ground-truth water quality data.

Architecture

┌────────────────────────────────────────────────────────────┐
│  Docker Compose (→ Kubernetes)                             │
│                                                            │
│  ┌──────────┐  ┌──────────────┐  ┌───────────────────────┐│
│  │  MinIO    │  │ Spark Master │  │  Spark Worker(s)      ││
│  │  (S3)     │◄─┤  + Driver    │──│  NumPy / TF / Sedona  ││
│  │ :9000/01  │  │  :8080       │  │                       ││
│  └──────────┘  └──────────────┘  └───────────────────────┘│
└────────────────────────────────────────────────────────────┘
Layer Technology Purpose
Storage MinIO S3-compatible object store for HSI + water quality data
Compute PySpark 3.5 Distributed BIP ingestion, calibration, spatial joins
Spatial H3 Hexagonal Performant raster↔point geospatial fusion
ML TensorFlow 3D-U-Net for hyperspectral HAB classification
Interchange Parquet / TFRecord Spark ↔ TensorFlow data bridge

Prerequisites

  • Docker Desktop ≥ 4.25 with Docker Compose v2
  • 8 GB+ RAM allocated to Docker (Settings → Resources)
  • (Optional) Python 3.11+ for local development

Quick Start

1. Clone & configure

git clone <repo-url> HABModel
cd HABModel
copy .env.template .env
# Edit .env if you want custom MinIO credentials

2. Build & start the stack

docker compose up -d --build

3. Verify services

Service URL Expected
MinIO Console http://localhost:9001 Login page
MinIO API http://localhost:9000 Health endpoint
Spark Master UI http://localhost:8080 1 worker registered

4. Upload data to MinIO

python scripts/upload_to_minio.py

Project Structure

HABModel/
├── docker-compose.yml          # Infrastructure definition
├── docker/spark/
│   ├── Dockerfile              # Spark + deps image
│   └── spark-defaults.conf     # S3A / MinIO config
├── src/
│   ├── ingestion/              # Phase 2: BIP parsing
│   ├── fusion/                 # Phase 3: H3 spatial join
│   ├── exploration/            # Phase 1.5: Visualisation
│   └── ml/                     # Phase 4: TF pipeline
├── scripts/                    # CLI utilities
├── data/                       # Local data (gitignored)
├── requirements.txt
└── .env.template

Data Sources

  • HSI: NASA Pika Gige airborne hyperspectral (BIP, 240 bands, 398–892 nm)
  • Water Quality: NCEI Lake Erie sampling data (microcystins, chlorophyll, nutrients)

License

TBD

About

ML Pipeline used to train models on hyperspectral data and predict HAB levels

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors