GitHub - ptdr1516/Analytics-Dashboard-Data-Forcasting: Predictive Analytics Dashboard for backend throughput forecasting using FastAPI (Python ML service) + Node.js/Express API + React dashboard, supporting Linear/Ridge regression + ARIMA/SARIMAX, confidence bands, anomaly flags, caching, logging, and scheduled retraining.

Predictive Analytics Dashboard – Backend Throughput Forecasting

This project is a small internal analytics tool for engineers to inspect historical backend metrics and forecast future throughput trends using regression models.

It is composed of:

Node.js API (server.js) – exposes /metrics and /predict for the dashboard and schedules retraining.
Python ML service (ml_service/main.py) – handles data ingestion, cleaning, model training and forecasting via FastAPI.
React dashboard (dashboard/) – visualizes historical vs predicted values using line charts.

Data & Assumptions

The data/ folder contains several example time-series datasets. For throughput forecasting the project uses:

Electric_Production.csv as a proxy for backend throughput (continuous production / volume signal).

Additional datasets are exposed as alternative metrics:

daily-minimum-temperatures-in-me.csv → temperature
monthly-beer-production-in-austr.csv → beer
sales-of-shampoo-over-a-three-ye.csv → shampoo

All metrics go through the same pipeline:

Missing values: forward/backward fill after casting to float.
Outliers: clipped using an extended IQR rule ([Q1 − 3×IQR, Q3 + 3×IQR]).
Timestamp normalization: parsed to a DateTimeIndex, sorted, with inferred frequency where possible.

ML Modelling

Models are trained per-metric and stored under models/ using joblib:

Linear Regression (model=linear) – baseline regression with time-aware features.
Ridge Regression + Polynomial features (model=ridge) – captures non-linear trends with regularisation.
ARIMA / SARIMAX (model=arima) – classical time-series model that explicitly models autocorrelation and (simple) seasonality.

For the regression models we do time-aware feature engineering:

raw time index t
lag features: y(t-1), y(t-5), y(t-15)
rolling stats: 5-step rolling mean and rolling std

During training:

Data is split chronologically into train/test (80/20).
Metrics computed and returned:
- MAE (Mean Absolute Error)
- RMSE (Root Mean Squared Error)

For forecasting:

The trained regression model predicts the next N time steps (horizon), propagating lag/rolling features from the latest history.
The ARIMA model uses SARIMAX with basic seasonal structure to account for autocorrelation and trend/seasonality.
A confidence band is generated:
- regression models: prediction ± 1.96 × RMSE.
- ARIMA: uses the model’s analytical prediction intervals.

APIs

Python ML Service (FastAPI, default `http://127.0.0.1:8001`)

GET /health – simple health check.
GET /metrics?metric=throughput&limit=500
- Returns cleaned historical series for the given metric.
GET /predict?metric=throughput&model=ridge&horizon=24
- Trains (if needed) and returns:
  - historical timestamps/values
  - anomaly flags on history (z-score based)
  - future timestamps, predictions and confidence intervals
  - evaluation metrics (MAE, RMSE)
  - simple capacity-risk summary (see below).
POST /train
- JSON body: { "metric": "throughput", "model": "ridge" }
- Forces retraining and persists the model and metrics.

Node.js API (Express, default `http://localhost:4000`)

GET /metrics
- Proxies to the ML service /metrics.
GET /predict
- Proxies to the ML service /predict.
- Adds:
  - logging of latency and model version
  - in-memory caching of predictions (short TTL)
  - graceful degradation (serve cached prediction if ML service is temporarily unavailable).
Scheduled retraining:
- Using node-cron, an hourly job invokes POST /train on the ML service for metric=throughput, model=ridge.
- All runs are logged under logs/server.log.

React Dashboard

Located in dashboard/, the dashboard:

Calls the Node API (http://localhost:4000) via axios.
Renders a responsive line chart with Recharts:
- Historical series in green, with anomaly markers.
- Forecast series in blue dashed line with a shaded confidence band.
- Optional alternative forecast overlaid for model comparison.
Allows engineers to:
- Select metric (throughput, temperature, beer, shampoo).
- Select model (linear, ridge, arima) and optionally compare regression models side-by-side.
- Adjust forecast horizon with a slider.
- View latest MAE/RMSE for primary and alternative models.
- See a capacity saturation risk summary (fraction of forecast above a dynamic threshold and estimated time to breach).

You can override the Node API base URL from the dashboard using:

set REACT_APP_API_BASE=http://localhost:4000  # Windows PowerShell syntax differs; see below.

Running the System Locally (Windows / PowerShell)

From the project root (Analytics Dashboard Data Forcasting):

Python virtual environment & dependencies

py -m venv venv
.\venv\Scripts\activate
pip install -r requirements.txt

Start the ML service

.\venv\Scripts\uvicorn ml_service.main:app --host 127.0.0.1 --port 8001

Start the Node backend (in a second terminal)

cd "C:\Users\Dell\Desktop\Analytics Dashboard Data Forcasting"
npm install
npm start

Start the React dashboard (in a third terminal)

cd "C:\Users\Dell\Desktop\Analytics Dashboard Data Forcasting\dashboard"
npm install
$env:REACT_APP_API_BASE="http://localhost:4000"
npm start

The dashboard will be available at http://localhost:3000 and will talk to the Node API and, through it, to the Python ML service.

Retraining & Logging

Automatic: The Node API uses node-cron to retrain the throughput model every hour.
Manual: You can call the ML /train endpoint directly (e.g. with curl or Postman) to force retraining with a different metric/model.
Logs:
- Node: logs/server.log
- ML service: logs/ml_service.log

Design Rationale (the “why”)

Why regression first?
- Regression with explicit lag/rolling features gives a transparent baseline and is easy to explain to non-ML teammates.
- It’s fast to retrain and deploy, which matters for internal tools.
Why add ARIMA?
- Classical ARIMA/SARIMAX directly models autocorrelation and seasonality, complementing the feature-engineered regressors.
- It provides well-understood confidence intervals useful for capacity planning.
Why these horizons?
- The horizon slider (12–96 steps) lets engineers look at short-term “this shift / this day” and medium-term “this week” forecasts without over-promising far-future accuracy.
Why these metrics (MAE, RMSE)?
- MAE is easy to reason about in the native units of the metric.
- RMSE penalises large errors more strongly, which aligns with capacity risk (under-predicting spikes is worse than small noise).

At a high level the architecture is:

React dashboard → Node API (/metrics, /predict) → Python FastAPI ML service → time-series models over CSV-backed metrics. Node adds caching, logging and failure-handling; Python focuses on data prep, modelling and uncertainty; React focuses on clear, engineer-friendly visualisation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Predictive Analytics Dashboard – Backend Throughput Forecasting

Data & Assumptions

ML Modelling

APIs

Python ML Service (FastAPI, default `http://127.0.0.1:8001`)

Node.js API (Express, default `http://localhost:4000`)

React Dashboard

Running the System Locally (Windows / PowerShell)

Retraining & Logging

Design Rationale (the “why”)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
dashboard		dashboard
data		data
logs		logs
ml_service		ml_service
models		models
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
requirements.txt		requirements.txt
server.js		server.js

Folders and files

Latest commit

History

Repository files navigation

Predictive Analytics Dashboard – Backend Throughput Forecasting

Data & Assumptions

ML Modelling

APIs

Python ML Service (FastAPI, default http://127.0.0.1:8001)

Node.js API (Express, default http://localhost:4000)

React Dashboard

Running the System Locally (Windows / PowerShell)

Retraining & Logging

Design Rationale (the “why”)

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Python ML Service (FastAPI, default `http://127.0.0.1:8001`)

Node.js API (Express, default `http://localhost:4000`)

Packages