WQSurrogateModels

WQSurrogateModels is a FastAPI backend and reproducibility repository for WQI5-based current-state water quality assessment.

Scope: this repository performs WQI5-based current-state water quality assessment. It does not perform temporal forecasting because the committed dataset does not contain timestamps.

It provides:

a direct_wqi5 baseline
surrogate regression models
/api/v2/* endpoints for WaterMirror and other HTTP clients
reproducibility scripts and experiment documentation

Relationship with the Companion Repository

This project is part of a two-repository system:

WaterMirror: cross-platform mobile frontend for data entry, CSV upload, and result visualization
WQSurrogateModels: FastAPI backend and model/reproducibility repository for WQI5-based current-state water quality assessment

WaterMirror depends on the API contract exposed by this repository. WQSurrogateModels can also be used independently through curl, Postman, or custom scripts.

What This Repository Does

serves a FastAPI backend for WQI5 assessment
supports a direct_wqi5 formula baseline
supports surrogate regression models: lr, mpr, svm, rf, xgboost, lightgbm
provides reproducibility scripts and experiment configuration
keeps compatibility with legacy endpoints while treating /api/v2/* as the primary contract

Architecture

flowchart LR
    A[WaterMirror user input or CSV upload] --> B[WaterMirror frontend]
    B --> C[POST /api/v2/assessment or /api/v2/assessment/csv/summary]
    C --> D[WQSurrogateModels FastAPI service]
    D --> E[Input validation and assessment warnings]
    E --> F{Model selection}
    F --> G[direct_wqi5 baseline]
    F --> H[Surrogate regressors: lr mpr svm rf xgboost lightgbm]
    G --> I[WQI5 score category rating range]
    H --> I
    I --> J[Result payload]
    J --> B

Environment

Copy .env.example to .env and adjust values if needed.

cp .env.example .env

Key variables:

MODEL_DIR=models
DEFAULT_MODEL=direct_wqi5
API_HOST=0.0.0.0
API_PORT=8001
AUTO_PORT=false

Install

pip install .

For development and tests:

pip install -e ".[dev]"

The committed scikit-learn surrogate artifacts in models/ were serialized with scikit-learn 1.5.2. Use that same version when loading them, or retrain and re-export the artifacts in your target version.

To also enable the full set of surrogate models (xgboost, lightgbm):

pip install -e ".[dev,models]"

Run

python main.py

If API_PORT is already occupied, the default behavior is to fail fast with a clearer error message. For local development, you can opt in to automatic fallback ports:

AUTO_PORT=true

With AUTO_PORT=true, the server tries API_PORT first and then scans upward (8002, 8003, ...) until it finds a free port.

API

Primary endpoints live under /api/v2/*.

Quick example

POST /api/v2/assessment

{ "DO": 7.2, "BOD": 2.1, "NH3N": 0.3, "EC": 450, "SS": 12, "model_type": "lightgbm" }

Legacy compatibility endpoints such as POST /predict, POST /score/total/, and GET /status are retained but deprecated.

Documentation

Reproducibility

Run:

pip install -e ".[dev]"
python scripts/reproduce_results.py --config configs/experiment_config.yaml --output-dir results_verification

If you use the local WQI conda environment and want to run the full experiment (all models including xgboost/lightgbm):

conda activate WQI
pip install -e ".[models]"
python scripts/reproduce_results.py --config configs/experiment_config.yaml --output-dir results_verification

To protect archived manuscript outputs, the script now refuses to overwrite an existing results directory unless --overwrite is passed explicitly.

Reproducibility Hyperparameters

The table below describes the revised reproducibility workflow. Archived exploratory scripts may use GridSearchCV and library defaults; see docs/original-benchmark-protocol.md.

Model	Library	Preprocessing	Key Hyperparameters
`direct_wqi5`	formula baseline	none	direct WQI5 equation
`lr`	scikit-learn	mean imputation + standard scaling	default `LinearRegression()`
`mpr`	scikit-learn	mean imputation + polynomial features + standard scaling	`degree=2`, `include_bias=False`
`svm`	scikit-learn	mean imputation + standard scaling	`kernel=rbf`, `C=10.0`, `epsilon=0.1`
`rf`	scikit-learn	mean imputation	`n_estimators=300`, `random_state=0`, `n_jobs=-1`
`xgboost`	xgboost	mean imputation	`n_estimators=300`, `max_depth=6`, `learning_rate=0.05`, `subsample=0.9`, `colsample_bytree=0.9`, `random_state=0`
`lightgbm`	lightgbm	mean imputation	`n_estimators=300`, `learning_rate=0.05`, `random_state=0`

Repeated validation uses stratified random splits over WQI5 categories with seeds 0, 1, 2, 3, 4.

Project Structure

data/: processed datasets and subsets
models/: persisted surrogate model artifacts
src/: API and reusable backend logic
scripts/: reproducibility runners
configs/: experiment settings
tests/: pytest suite

License

Apache License 2.0. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 89 Commits
.github		.github
configs		configs
data		data
docs		docs
models		models
scripts		scripts
src		src
statistics		statistics
tests		tests
.env.example		.env.example
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
main.py		main.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

WQSurrogateModels

Relationship with the Companion Repository

What This Repository Does

Architecture

Environment

Install

Run

API

Quick example

Documentation

Reproducibility

Reproducibility Hyperparameters

Project Structure

License

About

Uh oh!

Releases

Sponsor this project

Uh oh!

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

WQSurrogateModels

Relationship with the Companion Repository

What This Repository Does

Architecture

Environment

Install

Run

API

Quick example

Documentation

Reproducibility

Reproducibility Hyperparameters

Project Structure

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages