Skip to content

Latest commit

 

History

History
436 lines (301 loc) · 9.05 KB

File metadata and controls

436 lines (301 loc) · 9.05 KB

ML-Based Load Prediction System - Setup Guide

Overview

This guide walks you through setting up and running the ML-based load prediction system for CloudFileSystem.

What You'll Get:

  • Proactive container scaling (predicts load 30 minutes ahead)
  • 89% prediction accuracy (target RMSE < 100 req/hr)
  • Real-time prediction dashboard
  • Fallback to reactive scaling if ML service unavailable

Prerequisites

Python Environment

  • Python 3.9 or higher
  • pip (Python package manager)
  • Virtual environment (recommended)

Java Environment

  • Java 17 (already configured)
  • Maven (already configured)
  • Gson dependency (already added to pom.xml)

Step-by-Step Setup

Step 1: Set Up Python ML Service

1.1 Navigate to ML Service Directory

cd ml_predictor_service

1.2 Create Virtual Environment

Windows:

python -m venv venv

# CMD / PowerShell
venv\Scripts\activate

# Git Bash
source venv/Scripts/activate

Linux/Mac:

python3 -m venv venv
source venv/bin/activate

1.3 Install Dependencies

pip install -r requirements.txt

Expected output:

Successfully installed flask-3.0.0 scikit-learn-1.3.0 pandas-2.1.0 ...

Step 2: Generate Training Data

python generate_data.py

Expected output:

[Data Generator] Generating 6 months of synthetic data...
[Data Generator]  Generated 4320 data points
[Data Generator]  Saved to training_data.csv

=== Data Statistics ===
Date range: 2025-07-07 to 2026-01-07
Load range: 15 - 6543 req/hr
Average load: 487 req/hr

Business hours avg: 756 req/hr
Non-business hours avg: 218 req/hr
Peak ratio: 3.47x

Weekday avg: 623 req/hr
Weekend avg: 187 req/hr
Weekend reduction: 70.0%

What this does:

  • Generates 6 months of hourly traffic data (~4,320 data points)
  • Simulates realistic patterns: business hours peaks, lunch spikes, weekend lulls
  • Creates training_data.csv and data_visualization.png

Step 3: Train the ML Model

python train_model.py

Expected output:

[Model Training] Loading data from training_data.csv...
[Model Training]  Loaded 4320 data points
[Model Training] Training Random Forest...
[Model Training]  Model training complete

MODEL PERFORMANCE METRICS
RMSE: 45.23 req/hr
MAE: 38.15 req/hr
R² Score: 0.8912
 RMSE target met (< 100 req/hr)
 R² target met (> 0.85)

FEATURE IMPORTANCE
avg_load_last_7d         0.3245 
hour_of_day              0.2156 
load_1h_ago              0.1987 
is_business_hours        0.1234 
day_of_week              0.0987 
month                    0.0543 
is_weekend               0.0432 
load_24h_ago             0.0416 

What this does:

  • Trains Random Forest model on generated data
  • Evaluates performance (RMSE, MAE, R²)
  • Saves model.pkl, scaler.pkl, model_metadata.json
  • Creates versioned backup model_YYYYMMDD_HHMMSS.pkl

Target Metrics:

  • RMSE < 100 req/hr
  • R² > 0.85

Step 4: Start the Flask API

python app.py

Expected output:

[ML API] Loading model artifacts...
[ML API]  Model loaded successfully
[ML API]  Scaler loaded successfully
[ML API]  Model metadata loaded (RMSE: 45.23)

[ML API] Starting Flask server on http://localhost:5000
[ML API] Endpoints:
  - POST /predict       : Get load prediction
  - GET  /health        : Check API health
  - GET  /metrics       : Get prediction metrics
  - POST /record_actual : Record actual load for tracking

 * Running on http://0.0.0.0:5000

Keep this terminal open! The Flask API must be running for ML predictions to work.


Step 5: Test the API (Optional but Recommended)

Open a new terminal and test the API:

# Test health check
curl http://localhost:5000/health

# Expected response:
# {"status":"healthy","model_loaded":true,"scaler_loaded":true}

# Test prediction
curl -X POST http://localhost:5000/predict \
  -H "Content-Type: application/json" \
  -d "{\"current_time\":\"2024-01-07T14:30:00\",\"current_load\":750,\"historical_loads\":[650,700,720,745]}"

# Expected response:
# {"predicted_load":820.45,"confidence_lower":780.23,"confidence_upper":860.67,...}

Step 6: Run the Java Application

Open a new terminal in the project root:

# Compile and run
mvn clean compile
mvn javafx:run

# OR use your IDE's run configuration

Expected console output:

[ScalingService]  ML prediction service connected
[ScalingService] Mode: PROACTIVE (ML-based)
[ScalingService] Initialized with 1 container(s)
[ScalingService] Auto-scaling started

[ML-Predictor] Current: 450 req/hr | Predicted (30min): 820 req/hr | Confidence: [780, 860]
[ScalingService] Load: 45.00% (current), Predicted: 820 req/hr, Containers: 1
[ML-Scaling]  PROACTIVE SCALE UP: Predicted load (820) > 75% capacity (270)
[ScalingService]  SCALING UP (ML-PROACTIVE): Adding container soft40051-file-server-2
[ScalingService]  Scale up completed (ML-PROACTIVE). Total containers: 2

If ML service is unavailable:

[ScalingService]  ML service unavailable - using REACTIVE mode
[ScalingService] REACTIVE Mode - Load: 45.00%, Containers: 1

Verification Checklist

Python ML Service

  • Virtual environment activated
  • Dependencies installed (pip list shows flask, scikit-learn, etc.)
  • Training data generated (training_data.csv exists)
  • Model trained (model.pkl and scaler.pkl exist)
  • Flask API running on port 5000
  • Health check returns {"status":"healthy"}

Java Application

  • Application starts without errors
  • Console shows ML prediction service connected
  • Scaling service shows Mode: PROACTIVE (ML-based)
  • Predictions logged every 15 seconds
  • Proactive scaling events occur

Troubleshooting

Problem: "Model not found" error

Solution:

cd ml_predictor_service
python train_model.py

Problem: "Training data not found" error

Solution:

cd ml_predictor_service
python generate_data.py
python train_model.py

Problem: "ModuleNotFoundError: No module named 'flask'"

Solution:

# Ensure virtual environment is activated
venv\Scripts\activate  # Windows
source venv/bin/activate  # Linux/Mac

# Reinstall dependencies
pip install -r requirements.txt

Problem: Port 5000 already in use

Solution:

Option 1: Change Flask port Edit app.py, line ~200:

app.run(host='0.0.0.0', port=5001, debug=True)  # Changed to 5001

Then update Java code in LoadPredictionService.java:

private static final String DEFAULT_API_URL = "http://localhost:5001";

Option 2: Kill process using port 5000

# Windows
netstat -ano | findstr :5000
taskkill /PID <PID> /F

# Linux/Mac
lsof -i :5000
kill -9 <PID>

Problem: Java shows "ML service unavailable"

Checklist:

  1. Is Flask API running? Check terminal
  2. Can you access http://localhost:5000/health in browser?
  3. Is firewall blocking port 5000?
  4. Check Flask terminal for error messages

Test manually:

curl http://localhost:5000/health

If this fails, Flask API is not running properly.

Problem: Predictions not appearing in console

Check:

  1. Is auto-scaling started? Look for [ScalingService] Auto-scaling started
  2. Is there any file operation activity? Predictions trigger every 15 seconds
  3. Check for errors in console

Demo Workflow

Quick Demo (5 minutes)

  1. Start Flask API (Terminal 1):

    cd ml_predictor_service
    venv\Scripts\activate
    python app.py
  2. Start Java App (Terminal 2):

    mvn javafx:run
  3. Perform file operations in the GUI to trigger load

  4. Watch console for ML predictions and proactive scaling:

    [ML-Predictor] Current: 450 req/hr | Predicted (30min): 820 req/hr
    [ML-Scaling]  PROACTIVE SCALE UP
    
  5. Check statistics in Admin Panel to see:

    • ML-based scale events vs reactive scale events
    • Prediction accuracy metrics (RMSE, MAE)

Production Deployment (Future)

For production use, consider:

  1. Use Gunicorn for Flask:

    pip install gunicorn
    gunicorn -w 4 -b 0.0.0.0:5000 app:app
  2. Dockerize ML service:

    FROM python:3.9-slim
    COPY ml_predictor_service /app
    WORKDIR /app
    RUN pip install -r requirements.txt
    CMD ["python", "app.py"]
  3. Set up model retraining pipeline:

    • Collect real traffic data
    • Retrain weekly with fresh data
    • A/B test new models before deployment
  4. Add authentication:

    • API keys for Flask endpoints
    • Rate limiting to prevent abuse

Next Steps

  • ML service running
  • Java integration working
  • Create visualization dashboard (PredictionDashboard.java)
  • Test proactive scaling with simulated traffic
  • Compare ML vs reactive scaling performance

Support

  • ML Service Issues: Check ml_predictor_service/README.md
  • Java Integration: Review LoadPredictionService.java comments
  • Scaling Logic: See ScalingService.java documentation

For questions, refer to ML_INTERVIEW_GUIDE.md for technical deep-dive.