Skip to content

ptdr1516/Analytics-Dashboard-Data-Forcasting

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Predictive Analytics Dashboard – Backend Throughput Forecasting

This project is a small internal analytics tool for engineers to inspect historical backend metrics and forecast future throughput trends using regression models.

It is composed of:

  • Node.js API (server.js) – exposes /metrics and /predict for the dashboard and schedules retraining.
  • Python ML service (ml_service/main.py) – handles data ingestion, cleaning, model training and forecasting via FastAPI.
  • React dashboard (dashboard/) – visualizes historical vs predicted values using line charts.

Data & Assumptions

The data/ folder contains several example time-series datasets. For throughput forecasting the project uses:

  • Electric_Production.csv as a proxy for backend throughput (continuous production / volume signal).

Additional datasets are exposed as alternative metrics:

  • daily-minimum-temperatures-in-me.csvtemperature
  • monthly-beer-production-in-austr.csvbeer
  • sales-of-shampoo-over-a-three-ye.csvshampoo

All metrics go through the same pipeline:

  • Missing values: forward/backward fill after casting to float.
  • Outliers: clipped using an extended IQR rule ([Q1 − 3×IQR, Q3 + 3×IQR]).
  • Timestamp normalization: parsed to a DateTimeIndex, sorted, with inferred frequency where possible.

ML Modelling

Models are trained per-metric and stored under models/ using joblib:

  • Linear Regression (model=linear) – baseline regression with time-aware features.
  • Ridge Regression + Polynomial features (model=ridge) – captures non-linear trends with regularisation.
  • ARIMA / SARIMAX (model=arima) – classical time-series model that explicitly models autocorrelation and (simple) seasonality.

For the regression models we do time-aware feature engineering:

  • raw time index t
  • lag features: y(t-1), y(t-5), y(t-15)
  • rolling stats: 5-step rolling mean and rolling std

During training:

  • Data is split chronologically into train/test (80/20).
  • Metrics computed and returned:
    • MAE (Mean Absolute Error)
    • RMSE (Root Mean Squared Error)

For forecasting:

  • The trained regression model predicts the next N time steps (horizon), propagating lag/rolling features from the latest history.
  • The ARIMA model uses SARIMAX with basic seasonal structure to account for autocorrelation and trend/seasonality.
  • A confidence band is generated:
    • regression models: prediction ± 1.96 × RMSE.
    • ARIMA: uses the model’s analytical prediction intervals.

APIs

Python ML Service (FastAPI, default http://127.0.0.1:8001)

  • GET /health – simple health check.
  • GET /metrics?metric=throughput&limit=500
    • Returns cleaned historical series for the given metric.
  • GET /predict?metric=throughput&model=ridge&horizon=24
    • Trains (if needed) and returns:
      • historical timestamps/values
      • anomaly flags on history (z-score based)
      • future timestamps, predictions and confidence intervals
      • evaluation metrics (MAE, RMSE)
      • simple capacity-risk summary (see below).
  • POST /train
    • JSON body: { "metric": "throughput", "model": "ridge" }
    • Forces retraining and persists the model and metrics.

Node.js API (Express, default http://localhost:4000)

  • GET /metrics
    • Proxies to the ML service /metrics.
  • GET /predict
    • Proxies to the ML service /predict.
    • Adds:
      • logging of latency and model version
      • in-memory caching of predictions (short TTL)
      • graceful degradation (serve cached prediction if ML service is temporarily unavailable).
  • Scheduled retraining:
    • Using node-cron, an hourly job invokes POST /train on the ML service for metric=throughput, model=ridge.
    • All runs are logged under logs/server.log.

React Dashboard

Located in dashboard/, the dashboard:

  • Calls the Node API (http://localhost:4000) via axios.
  • Renders a responsive line chart with Recharts:
    • Historical series in green, with anomaly markers.
    • Forecast series in blue dashed line with a shaded confidence band.
    • Optional alternative forecast overlaid for model comparison.
  • Allows engineers to:
    • Select metric (throughput, temperature, beer, shampoo).
    • Select model (linear, ridge, arima) and optionally compare regression models side-by-side.
    • Adjust forecast horizon with a slider.
    • View latest MAE/RMSE for primary and alternative models.
    • See a capacity saturation risk summary (fraction of forecast above a dynamic threshold and estimated time to breach).

You can override the Node API base URL from the dashboard using:

set REACT_APP_API_BASE=http://localhost:4000  # Windows PowerShell syntax differs; see below.

Running the System Locally (Windows / PowerShell)

From the project root (Analytics Dashboard Data Forcasting):

  1. Python virtual environment & dependencies
py -m venv venv
.\venv\Scripts\activate
pip install -r requirements.txt
  1. Start the ML service
.\venv\Scripts\uvicorn ml_service.main:app --host 127.0.0.1 --port 8001
  1. Start the Node backend (in a second terminal)
cd "C:\Users\Dell\Desktop\Analytics Dashboard Data Forcasting"
npm install
npm start
  1. Start the React dashboard (in a third terminal)
cd "C:\Users\Dell\Desktop\Analytics Dashboard Data Forcasting\dashboard"
npm install
$env:REACT_APP_API_BASE="http://localhost:4000"
npm start

The dashboard will be available at http://localhost:3000 and will talk to the Node API and, through it, to the Python ML service.

Retraining & Logging

  • Automatic: The Node API uses node-cron to retrain the throughput model every hour.
  • Manual: You can call the ML /train endpoint directly (e.g. with curl or Postman) to force retraining with a different metric/model.
  • Logs:
    • Node: logs/server.log
    • ML service: logs/ml_service.log

Design Rationale (the “why”)

  • Why regression first?
    • Regression with explicit lag/rolling features gives a transparent baseline and is easy to explain to non-ML teammates.
    • It’s fast to retrain and deploy, which matters for internal tools.
  • Why add ARIMA?
    • Classical ARIMA/SARIMAX directly models autocorrelation and seasonality, complementing the feature-engineered regressors.
    • It provides well-understood confidence intervals useful for capacity planning.
  • Why these horizons?
    • The horizon slider (12–96 steps) lets engineers look at short-term “this shift / this day” and medium-term “this week” forecasts without over-promising far-future accuracy.
  • Why these metrics (MAE, RMSE)?
    • MAE is easy to reason about in the native units of the metric.
    • RMSE penalises large errors more strongly, which aligns with capacity risk (under-predicting spikes is worse than small noise).

At a high level the architecture is:

  • React dashboard → Node API (/metrics, /predict) → Python FastAPI ML service → time-series models over CSV-backed metrics. Node adds caching, logging and failure-handling; Python focuses on data prep, modelling and uncertainty; React focuses on clear, engineer-friendly visualisation.

About

Predictive Analytics Dashboard for backend throughput forecasting using FastAPI (Python ML service) + Node.js/Express API + React dashboard, supporting Linear/Ridge regression + ARIMA/SARIMAX, confidence bands, anomaly flags, caching, logging, and scheduled retraining.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors