Skip to content

Add SGN-styled classifier#129

Open
vuhoangnamdoan wants to merge 1 commit into
InnovAIte-Deakin:mainfrom
vuhoangnamdoan:feat/sng-classifier
Open

Add SGN-styled classifier#129
vuhoangnamdoan wants to merge 1 commit into
InnovAIte-Deakin:mainfrom
vuhoangnamdoan:feat/sng-classifier

Conversation

@vuhoangnamdoan
Copy link
Copy Markdown
Collaborator

Summary

  • Simplify SGN classifier training pipeline to focus on classification only.
  • Remove unused/abundant components and keep the FireFusion env+fire label merge path.
  • Standardize checkpoint outputs for model weights and preprocessing metadata.

Why

The previous training path mixed multiple responsibilities and optional branches, which made debugging and iteration slow.
This PR narrows scope to the actual architecture used for bushfire risk classification and reduces maintenance overhead.

Changes

  • Refactor src/training/ts_classifier_train.py to a classifier-only flow.
  • Keep/clean FireFusion label construction:
    • env CSV spine
    • satellite fire spatial join
    • left-merge labels with is_burning fallback to 0
  • Simplify training configuration and runtime arguments.
  • Save model + preprocessing artifacts for consistent inference.
  • Fix import/runtime issues encountered during training setup (e.g. SGN/layers import path issues).

Files touched

  • ai-modelling/src/training/ts_classifier_train.py
  • ai-modelling/src/models/bushfire/risk_classifier.py (if applicable)
  • ai-modelling/src/models/bushfire/advanced_classifier/SGN.py (if applicable)
  • ai-modelling/src/models/bushfire/layers/Embed.py (if applicable)
  • ai-modelling/src/models/bushfire/fire_risk_pipeline.py (if applicable)

Test plan

  • Run classifier training smoke test:
    • python -m src.training.ts_classifier_train --max-rows 50000 --epochs 1 --e-layers 1 --d-model 16 --num-groups 2
  • Run full training command with default CSV paths.
  • Confirm epoch logs print and checkpoint files are created.
  • Confirm saved artifacts can be loaded in inference path.
  • Spot-check merged label stats (row count, positive rate).

Risks / Notes

  • Current labels are highly imbalanced; model quality depends on follow-up imbalance handling.
  • SGN depth/sequence settings can fail for short sequence lengths; ensure compatible config (e.g. e_layers=1 for forecast-only sequence length).
  • Spatial join drops unmatched fire detections; monitor this count in logs.

Follow-ups (optional)

  • Add class-weighted/focal loss for imbalance.
  • Persist explicit grid-id to (i, j) mapping for deterministic spatial layout.
  • Add per-batch progress logging and graceful interrupt handling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant