ML-Based Load Prediction System - Setup Guide

Overview

This guide walks you through setting up and running the ML-based load prediction system for CloudFileSystem.

What You'll Get:

Proactive container scaling (predicts load 30 minutes ahead)
89% prediction accuracy (target RMSE < 100 req/hr)
Real-time prediction dashboard
Fallback to reactive scaling if ML service unavailable

Prerequisites

Python Environment

Python 3.9 or higher
pip (Python package manager)
Virtual environment (recommended)

Java Environment

Java 17 (already configured)
Maven (already configured)
Gson dependency (already added to pom.xml)

Step-by-Step Setup

Step 1: Set Up Python ML Service

1.1 Navigate to ML Service Directory

cd ml_predictor_service

1.2 Create Virtual Environment

Windows:

python -m venv venv

# CMD / PowerShell
venv\Scripts\activate

# Git Bash
source venv/Scripts/activate

Linux/Mac:

python3 -m venv venv
source venv/bin/activate

1.3 Install Dependencies

pip install -r requirements.txt

Expected output:

Successfully installed flask-3.0.0 scikit-learn-1.3.0 pandas-2.1.0 ...

Step 2: Generate Training Data

python generate_data.py

Expected output:

[Data Generator] Generating 6 months of synthetic data...
[Data Generator]  Generated 4320 data points
[Data Generator]  Saved to training_data.csv

=== Data Statistics ===
Date range: 2025-07-07 to 2026-01-07
Load range: 15 - 6543 req/hr
Average load: 487 req/hr

Business hours avg: 756 req/hr
Non-business hours avg: 218 req/hr
Peak ratio: 3.47x

Weekday avg: 623 req/hr
Weekend avg: 187 req/hr
Weekend reduction: 70.0%

What this does:

Generates 6 months of hourly traffic data (~4,320 data points)
Simulates realistic patterns: business hours peaks, lunch spikes, weekend lulls
Creates training_data.csv and data_visualization.png

Step 3: Train the ML Model

python train_model.py

Expected output:

[Model Training] Loading data from training_data.csv...
[Model Training]  Loaded 4320 data points
[Model Training] Training Random Forest...
[Model Training]  Model training complete

MODEL PERFORMANCE METRICS
RMSE: 45.23 req/hr
MAE: 38.15 req/hr
R² Score: 0.8912
 RMSE target met (< 100 req/hr)
 R² target met (> 0.85)

FEATURE IMPORTANCE
avg_load_last_7d         0.3245 
hour_of_day              0.2156 
load_1h_ago              0.1987 
is_business_hours        0.1234 
day_of_week              0.0987 
month                    0.0543 
is_weekend               0.0432 
load_24h_ago             0.0416

What this does:

Trains Random Forest model on generated data
Evaluates performance (RMSE, MAE, R²)
Saves model.pkl, scaler.pkl, model_metadata.json
Creates versioned backup model_YYYYMMDD_HHMMSS.pkl

Target Metrics:

RMSE < 100 req/hr
R² > 0.85

Step 4: Start the Flask API

python app.py

Expected output:

[ML API] Loading model artifacts...
[ML API]  Model loaded successfully
[ML API]  Scaler loaded successfully
[ML API]  Model metadata loaded (RMSE: 45.23)

[ML API] Starting Flask server on http://localhost:5000
[ML API] Endpoints:
  - POST /predict       : Get load prediction
  - GET  /health        : Check API health
  - GET  /metrics       : Get prediction metrics
  - POST /record_actual : Record actual load for tracking

 * Running on http://0.0.0.0:5000

Keep this terminal open! The Flask API must be running for ML predictions to work.

Step 5: Test the API (Optional but Recommended)

Open a new terminal and test the API:

# Test health check
curl http://localhost:5000/health

# Expected response:
# {"status":"healthy","model_loaded":true,"scaler_loaded":true}

# Test prediction
curl -X POST http://localhost:5000/predict \
  -H "Content-Type: application/json" \
  -d "{\"current_time\":\"2024-01-07T14:30:00\",\"current_load\":750,\"historical_loads\":[650,700,720,745]}"

# Expected response:
# {"predicted_load":820.45,"confidence_lower":780.23,"confidence_upper":860.67,...}

Step 6: Run the Java Application

Open a new terminal in the project root:

# Compile and run
mvn clean compile
mvn javafx:run

# OR use your IDE's run configuration

Expected console output:

[ScalingService]  ML prediction service connected
[ScalingService] Mode: PROACTIVE (ML-based)
[ScalingService] Initialized with 1 container(s)
[ScalingService] Auto-scaling started

[ML-Predictor] Current: 450 req/hr | Predicted (30min): 820 req/hr | Confidence: [780, 860]
[ScalingService] Load: 45.00% (current), Predicted: 820 req/hr, Containers: 1
[ML-Scaling]  PROACTIVE SCALE UP: Predicted load (820) > 75% capacity (270)
[ScalingService]  SCALING UP (ML-PROACTIVE): Adding container soft40051-file-server-2
[ScalingService]  Scale up completed (ML-PROACTIVE). Total containers: 2

If ML service is unavailable:

[ScalingService]  ML service unavailable - using REACTIVE mode
[ScalingService] REACTIVE Mode - Load: 45.00%, Containers: 1

Verification Checklist

Python ML Service

Virtual environment activated
Dependencies installed (pip list shows flask, scikit-learn, etc.)
Training data generated (training_data.csv exists)
Model trained (model.pkl and scaler.pkl exist)
Flask API running on port 5000
Health check returns {"status":"healthy"}

Java Application

Application starts without errors
Console shows ML prediction service connected
Scaling service shows Mode: PROACTIVE (ML-based)
Predictions logged every 15 seconds
Proactive scaling events occur

Troubleshooting

Problem: "Model not found" error

Solution:

cd ml_predictor_service
python train_model.py

Problem: "Training data not found" error

Solution:

cd ml_predictor_service
python generate_data.py
python train_model.py

Problem: "ModuleNotFoundError: No module named 'flask'"

Solution:

# Ensure virtual environment is activated
venv\Scripts\activate  # Windows
source venv/bin/activate  # Linux/Mac

# Reinstall dependencies
pip install -r requirements.txt

Problem: Port 5000 already in use

Solution:

Option 1: Change Flask port Edit app.py, line ~200:

app.run(host='0.0.0.0', port=5001, debug=True)  # Changed to 5001

Then update Java code in LoadPredictionService.java:

private static final String DEFAULT_API_URL = "http://localhost:5001";

Option 2: Kill process using port 5000

# Windows
netstat -ano | findstr :5000
taskkill /PID <PID> /F

# Linux/Mac
lsof -i :5000
kill -9 <PID>

Problem: Java shows "ML service unavailable"

Checklist:

Is Flask API running? Check terminal
Can you access http://localhost:5000/health in browser?
Is firewall blocking port 5000?
Check Flask terminal for error messages

Test manually:

curl http://localhost:5000/health

If this fails, Flask API is not running properly.

Problem: Predictions not appearing in console

Check:

Is auto-scaling started? Look for [ScalingService] Auto-scaling started
Is there any file operation activity? Predictions trigger every 15 seconds
Check for errors in console

Demo Workflow

Quick Demo (5 minutes)

Start Flask API (Terminal 1):

cd ml_predictor_service
venv\Scripts\activate
python app.py

Start Java App (Terminal 2):
```
mvn javafx:run
```
Perform file operations in the GUI to trigger load

Watch console for ML predictions and proactive scaling:

[ML-Predictor] Current: 450 req/hr | Predicted (30min): 820 req/hr
[ML-Scaling]  PROACTIVE SCALE UP

Check statistics in Admin Panel to see:
- ML-based scale events vs reactive scale events
- Prediction accuracy metrics (RMSE, MAE)

Production Deployment (Future)

For production use, consider:

Use Gunicorn for Flask:

pip install gunicorn
gunicorn -w 4 -b 0.0.0.0:5000 app:app

Dockerize ML service:

FROM python:3.9-slim
COPY ml_predictor_service /app
WORKDIR /app
RUN pip install -r requirements.txt
CMD ["python", "app.py"]

Set up model retraining pipeline:
- Collect real traffic data
- Retrain weekly with fresh data
- A/B test new models before deployment
Add authentication:
- API keys for Flask endpoints
- Rate limiting to prevent abuse

Next Steps

ML service running
Java integration working
Create visualization dashboard (PredictionDashboard.java)
Test proactive scaling with simulated traffic
Compare ML vs reactive scaling performance

Support

ML Service Issues: Check ml_predictor_service/README.md
Java Integration: Review LoadPredictionService.java comments
Scaling Logic: See ScalingService.java documentation

For questions, refer to ML_INTERVIEW_GUIDE.md for technical deep-dive.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ML-Based Load Prediction System - Setup Guide

Overview

Prerequisites

Python Environment

Java Environment

Step-by-Step Setup

Step 1: Set Up Python ML Service

1.1 Navigate to ML Service Directory

1.2 Create Virtual Environment

1.3 Install Dependencies

Step 2: Generate Training Data

Step 3: Train the ML Model

Step 4: Start the Flask API

Step 5: Test the API (Optional but Recommended)

Step 6: Run the Java Application

Verification Checklist

Python ML Service

Java Application

Troubleshooting

Problem: "Model not found" error

Problem: "Training data not found" error

Problem: "ModuleNotFoundError: No module named 'flask'"

Problem: Port 5000 already in use

Problem: Java shows "ML service unavailable"

Problem: Predictions not appearing in console

Demo Workflow

Quick Demo (5 minutes)

Production Deployment (Future)

Next Steps

Support

FilesExpand file tree

ML_SETUP.md

Latest commit

History

ML_SETUP.md

File metadata and controls

ML-Based Load Prediction System - Setup Guide

Overview

Prerequisites

Python Environment

Java Environment

Step-by-Step Setup

Step 1: Set Up Python ML Service

1.1 Navigate to ML Service Directory

1.2 Create Virtual Environment

1.3 Install Dependencies

Step 2: Generate Training Data

Step 3: Train the ML Model

Step 4: Start the Flask API

Step 5: Test the API (Optional but Recommended)

Step 6: Run the Java Application

Verification Checklist

Python ML Service

Java Application

Troubleshooting

Problem: "Model not found" error

Problem: "Training data not found" error

Problem: "ModuleNotFoundError: No module named 'flask'"

Problem: Port 5000 already in use

Problem: Java shows "ML service unavailable"

Problem: Predictions not appearing in console

Demo Workflow

Quick Demo (5 minutes)

Production Deployment (Future)

Next Steps

Support