End-to-end machine learning system that classifies turbofan engine health into three states — Normal, Warning, and Critical — using NASA CMAPSS sensor data. Built with a full ML pipeline, REST API, and interactive React dashboard.
Industrial machines degrade over time. Unplanned failures are costly. This system uses historical sensor readings from turbofan engines to predict whether an engine is operating normally, approaching failure (warning), or in a critical state — enabling proactive maintenance decisions.
predictive-maintenance/
├── data/
│ └── train_FD001.txt # NASA CMAPSS dataset
├── notebooks/
│ └── predictive_maintenance.ipynb
├── outputs/
│ ├── f1_comparison.png
│ ├── all_metrics_comparison.png
│ ├── confusion_matrices.png
│ └── learning_curves.png
├── backend/
│ ├── app.py
│ ├── train_regressor.py
│ └── requirements.txt
├── frontend/
│ └── src/
│ ├── components/
│ │ ├── Dashboard.jsx
│ │ ├── Predict.jsx
│ │ └── Sidebar.jsx
│ ├── App.jsx
│ └── index.css
├── requirements.txt
└── README.md
NASA CMAPSS Turbofan Engine Degradation Simulation — FD001
- 100 engines run to failure
- 20,631 total cycle readings
- 21 sensor channels + 3 operational settings
- Download: https://www.kaggle.com/datasets/bishals098/nasa-turbofan-engine-degradation-simulation
Engine health state derived from Remaining Useful Life (RUL):
| Class | RUL Threshold |
|---|---|
| Normal | RUL > 50 cycles |
| Warning | 20 < RUL ≤ 50 |
| Critical | RUL ≤ 20 cycles |
- Dropped low-variance sensors (std < 0.1)
- Rolling mean and rolling std (window = 5 cycles) per engine unit
- Final feature matrix: 36 features
SMOTE applied only on training folds inside cross-validation to prevent data leakage.
| Model | Accuracy | Precision | Recall | F1 Score |
|---|---|---|---|---|
| Random Forest | 91.67% | 85.66% | 87.06% | 86.30% |
| SVM | 89.68% | 82.56% | 83.87% | 83.14% |
| Gradient Boosting | 91.35% | 85.26% | 85.55% | 85.40% |
| XGBoost | 91.17% | 84.71% | 85.99% | 85.31% |
| KNN | 81.90% | 75.95% | 84.99% | 77.99% |
| Decision Tree | 86.28% | 78.21% | 82.62% | 79.72% |
Evaluation: 5-Fold Stratified Cross Validation
Best Model: Random Forest — 86.30% Macro F1
Flask REST API serving the trained Random Forest model.
Live API: https://predictive-maintenance-api-tqzv.onrender.com
| Method | Endpoint | Description |
|---|---|---|
| GET | /health | API status check |
| POST | /predict | Single row prediction |
| POST | /predict_batch | Batch prediction for full engine |
Run locally:
cd backend
pip install -r requirements.txt
python train_regressor.py
python app.pyReact dashboard built with Recharts and Vite.
Live Dashboard: https://predictive-maintenance-dusky-iota.vercel.app
Pages:
- Dashboard — model performance comparison, F1 chart, radar metrics, class distribution
- Predict — upload
train_FD001.txtor enter sensor values manually, get real-time health prediction with RUL estimate and probability breakdown
Run locally:
cd frontend
npm install
npm run dev# 1. Clone the repo
git clone https://github.com/subhrajit08/predictive-maintenance.git
cd predictive-maintenance
# 2. Download dataset
# Place train_FD001.txt in data/
# 3. Install Python dependencies
pip install -r requirements.txt
# 4. Train RUL model
cd backend
python train_regressor.py
# 5. Start backend
python app.py
# 6. Start frontend (new terminal)
cd frontend
npm install
npm run devOpen http://localhost:3000
| Service | Platform | URL |
|---|---|---|
| Frontend | Vercel | https://predictive-maintenance-dusky-iota.vercel.app |
| Backend | Render | https://predictive-maintenance-api-tqzv.onrender.com |
notebooks/predictive_maintenance.ipynb covers:
- Data loading and inspection
- Exploratory data analysis — RUL distribution, sensor degradation plots
- Feature engineering — rolling statistics, low variance sensor removal
- Preprocessing — stratified split, SMOTE
- Model training — 5-fold cross validation across 6 models
- Performance comparison — F1 bar chart, all metrics comparison
- Confusion matrices — per model on test set
- Learning curves — top 3 models
- Final summary and model selection
| Layer | Technology |
|---|---|
| ML | scikit-learn, XGBoost, imbalanced-learn |
| Backend | Flask, Flask-CORS, joblib |
| Frontend | React, Vite, Recharts, Axios |
| Data | pandas, numpy, matplotlib, seaborn |
| Dataset | NASA CMAPSS FD001 |
- SMOTE applied inside CV folds only — prevents synthetic samples from leaking into validation
- Rolling features computed per engine unit — captures local degradation trends
- Recall prioritized for Critical class — missing a real failure is more costly than a false alarm
- RUL regression model runs alongside classifier — gives estimated cycles remaining
This project is licensed under the MIT License.
