This project is a small internal analytics tool for engineers to inspect historical backend metrics and forecast future throughput trends using regression models.
It is composed of:
- Node.js API (
server.js) – exposes/metricsand/predictfor the dashboard and schedules retraining. - Python ML service (
ml_service/main.py) – handles data ingestion, cleaning, model training and forecasting via FastAPI. - React dashboard (
dashboard/) – visualizes historical vs predicted values using line charts.
The data/ folder contains several example time-series datasets. For throughput forecasting the project uses:
Electric_Production.csvas a proxy for backend throughput (continuous production / volume signal).
Additional datasets are exposed as alternative metrics:
daily-minimum-temperatures-in-me.csv→temperaturemonthly-beer-production-in-austr.csv→beersales-of-shampoo-over-a-three-ye.csv→shampoo
All metrics go through the same pipeline:
- Missing values: forward/backward fill after casting to float.
- Outliers: clipped using an extended IQR rule ([Q1 − 3×IQR, Q3 + 3×IQR]).
- Timestamp normalization: parsed to a
DateTimeIndex, sorted, with inferred frequency where possible.
Models are trained per-metric and stored under models/ using joblib:
- Linear Regression (
model=linear) – baseline regression with time-aware features. - Ridge Regression + Polynomial features (
model=ridge) – captures non-linear trends with regularisation. - ARIMA / SARIMAX (
model=arima) – classical time-series model that explicitly models autocorrelation and (simple) seasonality.
For the regression models we do time-aware feature engineering:
- raw time index
t - lag features:
y(t-1),y(t-5),y(t-15) - rolling stats: 5-step rolling mean and rolling std
During training:
- Data is split chronologically into train/test (80/20).
- Metrics computed and returned:
- MAE (Mean Absolute Error)
- RMSE (Root Mean Squared Error)
For forecasting:
- The trained regression model predicts the next N time steps (
horizon), propagating lag/rolling features from the latest history. - The ARIMA model uses SARIMAX with basic seasonal structure to account for autocorrelation and trend/seasonality.
- A confidence band is generated:
- regression models:
prediction ± 1.96 × RMSE. - ARIMA: uses the model’s analytical prediction intervals.
- regression models:
GET /health– simple health check.GET /metrics?metric=throughput&limit=500- Returns cleaned historical series for the given metric.
GET /predict?metric=throughput&model=ridge&horizon=24- Trains (if needed) and returns:
- historical timestamps/values
- anomaly flags on history (z-score based)
- future timestamps, predictions and confidence intervals
- evaluation metrics (MAE, RMSE)
- simple capacity-risk summary (see below).
- Trains (if needed) and returns:
POST /train- JSON body:
{ "metric": "throughput", "model": "ridge" } - Forces retraining and persists the model and metrics.
- JSON body:
GET /metrics- Proxies to the ML service
/metrics.
- Proxies to the ML service
GET /predict- Proxies to the ML service
/predict. - Adds:
- logging of latency and model version
- in-memory caching of predictions (short TTL)
- graceful degradation (serve cached prediction if ML service is temporarily unavailable).
- Proxies to the ML service
- Scheduled retraining:
- Using
node-cron, an hourly job invokesPOST /trainon the ML service formetric=throughput,model=ridge. - All runs are logged under
logs/server.log.
- Using
Located in dashboard/, the dashboard:
- Calls the Node API (
http://localhost:4000) viaaxios. - Renders a responsive line chart with Recharts:
- Historical series in green, with anomaly markers.
- Forecast series in blue dashed line with a shaded confidence band.
- Optional alternative forecast overlaid for model comparison.
- Allows engineers to:
- Select metric (
throughput,temperature,beer,shampoo). - Select model (
linear,ridge,arima) and optionally compare regression models side-by-side. - Adjust forecast horizon with a slider.
- View latest MAE/RMSE for primary and alternative models.
- See a capacity saturation risk summary (fraction of forecast above a dynamic threshold and estimated time to breach).
- Select metric (
You can override the Node API base URL from the dashboard using:
set REACT_APP_API_BASE=http://localhost:4000 # Windows PowerShell syntax differs; see below.From the project root (Analytics Dashboard Data Forcasting):
- Python virtual environment & dependencies
py -m venv venv
.\venv\Scripts\activate
pip install -r requirements.txt- Start the ML service
.\venv\Scripts\uvicorn ml_service.main:app --host 127.0.0.1 --port 8001- Start the Node backend (in a second terminal)
cd "C:\Users\Dell\Desktop\Analytics Dashboard Data Forcasting"
npm install
npm start- Start the React dashboard (in a third terminal)
cd "C:\Users\Dell\Desktop\Analytics Dashboard Data Forcasting\dashboard"
npm install
$env:REACT_APP_API_BASE="http://localhost:4000"
npm startThe dashboard will be available at http://localhost:3000 and will talk to the Node API and, through it, to the Python ML service.
- Automatic: The Node API uses
node-cronto retrain thethroughputmodel every hour. - Manual: You can call the ML
/trainendpoint directly (e.g. withcurlor Postman) to force retraining with a different metric/model. - Logs:
- Node:
logs/server.log - ML service:
logs/ml_service.log
- Node:
- Why regression first?
- Regression with explicit lag/rolling features gives a transparent baseline and is easy to explain to non-ML teammates.
- It’s fast to retrain and deploy, which matters for internal tools.
- Why add ARIMA?
- Classical ARIMA/SARIMAX directly models autocorrelation and seasonality, complementing the feature-engineered regressors.
- It provides well-understood confidence intervals useful for capacity planning.
- Why these horizons?
- The horizon slider (12–96 steps) lets engineers look at short-term “this shift / this day” and medium-term “this week” forecasts without over-promising far-future accuracy.
- Why these metrics (MAE, RMSE)?
- MAE is easy to reason about in the native units of the metric.
- RMSE penalises large errors more strongly, which aligns with capacity risk (under-predicting spikes is worse than small noise).
At a high level the architecture is:
- React dashboard → Node API (
/metrics,/predict) → Python FastAPI ML service → time-series models over CSV-backed metrics. Node adds caching, logging and failure-handling; Python focuses on data prep, modelling and uncertainty; React focuses on clear, engineer-friendly visualisation.