Distributed Harmful Algal Bloom Detection Pipeline for Lake Erie using Hyperspectral Imaging (HSI) and ground-truth water quality data.
┌────────────────────────────────────────────────────────────┐
│ Docker Compose (→ Kubernetes) │
│ │
│ ┌──────────┐ ┌──────────────┐ ┌───────────────────────┐│
│ │ MinIO │ │ Spark Master │ │ Spark Worker(s) ││
│ │ (S3) │◄─┤ + Driver │──│ NumPy / TF / Sedona ││
│ │ :9000/01 │ │ :8080 │ │ ││
│ └──────────┘ └──────────────┘ └───────────────────────┘│
└────────────────────────────────────────────────────────────┘
| Layer | Technology | Purpose |
|---|---|---|
| Storage | MinIO | S3-compatible object store for HSI + water quality data |
| Compute | PySpark 3.5 | Distributed BIP ingestion, calibration, spatial joins |
| Spatial | H3 Hexagonal | Performant raster↔point geospatial fusion |
| ML | TensorFlow | 3D-U-Net for hyperspectral HAB classification |
| Interchange | Parquet / TFRecord | Spark ↔ TensorFlow data bridge |
- Docker Desktop ≥ 4.25 with Docker Compose v2
- 8 GB+ RAM allocated to Docker (Settings → Resources)
- (Optional) Python 3.11+ for local development
git clone <repo-url> HABModel
cd HABModel
copy .env.template .env
# Edit .env if you want custom MinIO credentialsdocker compose up -d --build| Service | URL | Expected |
|---|---|---|
| MinIO Console | http://localhost:9001 | Login page |
| MinIO API | http://localhost:9000 | Health endpoint |
| Spark Master UI | http://localhost:8080 | 1 worker registered |
python scripts/upload_to_minio.pyHABModel/
├── docker-compose.yml # Infrastructure definition
├── docker/spark/
│ ├── Dockerfile # Spark + deps image
│ └── spark-defaults.conf # S3A / MinIO config
├── src/
│ ├── ingestion/ # Phase 2: BIP parsing
│ ├── fusion/ # Phase 3: H3 spatial join
│ ├── exploration/ # Phase 1.5: Visualisation
│ └── ml/ # Phase 4: TF pipeline
├── scripts/ # CLI utilities
├── data/ # Local data (gitignored)
├── requirements.txt
└── .env.template
- HSI: NASA Pika Gige airborne hyperspectral (BIP, 240 bands, 398–892 nm)
- Water Quality: NCEI Lake Erie sampling data (microcystins, chlorophyll, nutrients)
TBD