Predictive Maintenance for Industrial Turbofan Engines

End-to-end machine learning system that classifies turbofan engine health into three states — Normal, Warning, and Critical — using NASA CMAPSS sensor data. Built with a full ML pipeline, REST API, and interactive React dashboard.

Problem Statement

Industrial machines degrade over time. Unplanned failures are costly. This system uses historical sensor readings from turbofan engines to predict whether an engine is operating normally, approaching failure (warning), or in a critical state — enabling proactive maintenance decisions.

Project Structure

predictive-maintenance/
├── data/
│   └── train_FD001.txt          # NASA CMAPSS dataset
├── notebooks/
│   └── predictive_maintenance.ipynb
├── outputs/
│   ├── f1_comparison.png
│   ├── all_metrics_comparison.png
│   ├── confusion_matrices.png
│   └── learning_curves.png
├── backend/
│   ├── app.py
│   ├── train_regressor.py
│   └── requirements.txt
├── frontend/
│   └── src/
│       ├── components/
│       │   ├── Dashboard.jsx
│       │   ├── Predict.jsx
│       │   └── Sidebar.jsx
│       ├── App.jsx
│       └── index.css
├── requirements.txt
└── README.md

Dataset

NASA CMAPSS Turbofan Engine Degradation Simulation — FD001

100 engines run to failure
20,631 total cycle readings
21 sensor channels + 3 operational settings
Download: https://www.kaggle.com/datasets/bishals098/nasa-turbofan-engine-degradation-simulation

ML Pipeline

Target Variable

Engine health state derived from Remaining Useful Life (RUL):

Class	RUL Threshold
Normal	RUL > 50 cycles
Warning	20 < RUL ≤ 50
Critical	RUL ≤ 20 cycles

Feature Engineering

Dropped low-variance sensors (std < 0.1)
Rolling mean and rolling std (window = 5 cycles) per engine unit
Final feature matrix: 36 features

Class Balancing

SMOTE applied only on training folds inside cross-validation to prevent data leakage.

Models Compared

Model	Accuracy	Precision	Recall	F1 Score
Random Forest	91.67%	85.66%	87.06%	86.30%
SVM	89.68%	82.56%	83.87%	83.14%
Gradient Boosting	91.35%	85.26%	85.55%	85.40%
XGBoost	91.17%	84.71%	85.99%	85.31%
KNN	81.90%	75.95%	84.99%	77.99%
Decision Tree	86.28%	78.21%	82.62%	79.72%

Evaluation: 5-Fold Stratified Cross Validation

Best Model: Random Forest — 86.30% Macro F1

Backend

Flask REST API serving the trained Random Forest model.

Live API: https://predictive-maintenance-api-tqzv.onrender.com

Method	Endpoint	Description
GET	/health	API status check
POST	/predict	Single row prediction
POST	/predict_batch	Batch prediction for full engine

Run locally:

cd backend
pip install -r requirements.txt
python train_regressor.py
python app.py

Frontend

React dashboard built with Recharts and Vite.

Live Dashboard: https://predictive-maintenance-dusky-iota.vercel.app

Pages:

Dashboard — model performance comparison, F1 chart, radar metrics, class distribution
Predict — upload train_FD001.txt or enter sensor values manually, get real-time health prediction with RUL estimate and probability breakdown

Run locally:

cd frontend
npm install
npm run dev

Setup — Full Local Run

# 1. Clone the repo
git clone https://github.com/subhrajit08/predictive-maintenance.git
cd predictive-maintenance

# 2. Download dataset
# Place train_FD001.txt in data/

# 3. Install Python dependencies
pip install -r requirements.txt

# 4. Train RUL model
cd backend
python train_regressor.py

# 5. Start backend
python app.py

# 6. Start frontend (new terminal)
cd frontend
npm install
npm run dev

Open http://localhost:3000

Deployment

Service	Platform	URL
Frontend	Vercel	https://predictive-maintenance-dusky-iota.vercel.app
Backend	Render	https://predictive-maintenance-api-tqzv.onrender.com

Notebook

notebooks/predictive_maintenance.ipynb covers:

Data loading and inspection
Exploratory data analysis — RUL distribution, sensor degradation plots
Feature engineering — rolling statistics, low variance sensor removal
Preprocessing — stratified split, SMOTE
Model training — 5-fold cross validation across 6 models
Performance comparison — F1 bar chart, all metrics comparison
Confusion matrices — per model on test set
Learning curves — top 3 models
Final summary and model selection

Result

Tech Stack

Layer	Technology
ML	scikit-learn, XGBoost, imbalanced-learn
Backend	Flask, Flask-CORS, joblib
Frontend	React, Vite, Recharts, Axios
Data	pandas, numpy, matplotlib, seaborn
Dataset	NASA CMAPSS FD001

Key Design Decisions

SMOTE applied inside CV folds only — prevents synthetic samples from leaking into validation
Rolling features computed per engine unit — captures local degradation trends
Recall prioritized for Critical class — missing a real failure is more costly than a false alarm
RUL regression model runs alongside classifier — gives estimated cycles remaining

License

This project is licensed under the MIT License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Predictive Maintenance for Industrial Turbofan Engines

Problem Statement

Project Structure

Dataset

ML Pipeline

Target Variable

Feature Engineering

Class Balancing

Models Compared

Backend

Frontend

Setup — Full Local Run

Deployment

Notebook

Result

Tech Stack

Key Design Decisions

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
backend		backend
frontend		frontend
notebooks		notebooks
outputs		outputs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Predictive Maintenance for Industrial Turbofan Engines

Problem Statement

Project Structure

Dataset

ML Pipeline

Target Variable

Feature Engineering

Class Balancing

Models Compared

Backend

Frontend

Setup — Full Local Run

Deployment

Notebook

Result

Tech Stack

Key Design Decisions

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages