👉 Project Repository: ML-Backend
A comprehensive machine learning-driven healthcare analytics system designed to enhance cardiovascular disease (CVD) risk assessment through advanced predictive modeling. This project leverages structured clinical data, lifestyle factors, and physiological measurements to provide early detection capabilities for cardiovascular diseases.
Developed as part of a research initiative at Fr. C. Rodrigues Institute of Technology, Vashi, this system aims to support healthcare professionals in making informed diagnostic decisions and improving patient outcomes.
Cardiovascular diseases (CVDs) remain a leading cause of global mortality, necessitating innovative approaches for early detection and risk stratification. This project presents a hybrid machine learning framework that combines multiple supervised learning algorithms including Logistic Regression, Support Vector Machines, Random Forest, and XGBoost.
The system focuses on:
- Optimizing model performance through advanced feature engineering
- Handling class imbalance using SMOTE techniques
- Ensuring clinical interpretability through explainable AI methods
- Providing actionable insights for healthcare professionals
- Develop robust ML models for early CVD detection
- Implement feature engineering techniques for improved accuracy
- Address data imbalance challenges in medical datasets
- Create interpretable models suitable for clinical deployment
- Establish a framework for integration with existing healthcare systems
- 🔍 CVD Risk Prediction using state-of-the-art ML models
- ⚙️ Advanced Feature Engineering for optimal data representation
- ⚖️ Class Imbalance Handling with SMOTE implementation
- 📊 Comprehensive Performance Evaluation (accuracy, precision, recall, F1-score)
- 🔎 Explainable AI Integration with SHAP and LIME
- 🩺 EHR and Telemedicine Ready for seamless healthcare integration
- 📈 Real-time Risk Assessment capabilities
- 🛡️ Data Privacy Compliance with healthcare standards
| Category | Technologies |
|---|---|
| Core Language | Python 3.8+ |
| Data Processing | Pandas, NumPy |
| Machine Learning | Scikit-learn, XGBoost |
| Visualization | Matplotlib, Seaborn, Plotly |
| Explainability | SHAP, LIME |
| Web Framework | Next.js (Frontend) |
| Development | Jupyter Notebooks |
| Model | Purpose | Key Advantages |
|---|---|---|
| Random Forest | Ensemble prediction | Robust, handles overfitting |
| XGBoost | Gradient boosting ensemble | Superior performance, feature importance |
cvd-risk-predictor/ ├── 📁 data/ # Dataset and processed files │ ├── raw/ # Original datasets │ ├── processed/ # Cleaned and engineered data │ └── external/ # External data sources ├── 📁 models/ # Trained models and artifacts │ ├── saved_models/ # Serialized model files │ └── model_configs/ # Configuration files ├── 📁 notebooks/ # Jupyter Notebooks │ ├── EDA.ipynb # Exploratory Data Analysis │ ├── Model_Training.ipynb # Model development │ └── Evaluation.ipynb # Performance analysis ├── 📁 src/ # Source code │ ├── data_preprocessing.py # Data cleaning and preparation │ ├── feature_engineering.py # Feature creation and selection │ ├── model_training.py # Training pipeline │ ├── evaluation.py # Model evaluation metrics │ └── prediction.py # Inference engine ├── 📁 outputs/ # Results and visualizations │ ├── plots/ # Generated charts and graphs │ └── reports/ # Analysis reports ├── 📁 config/ # Configuration files ├── 📄 requirements.txt # Python dependencies └── 📄 README.md # Project documentation
| Name | Role | Contact |
|---|---|---|
| Aryan Nair | Lead Developer & Data Scientist | nairaryan135@gmail.com |
| Dhyan Patel | ML Engineer & Backend Developer | dhyanbpatel2005@gmail.com |
| Steffi Varghese | Data Analyst & Frontend Developer | steffiv875@gmail.com |
| Revant Shinde | System Architect & DevOps | revantshinde@gmail.com |
- Dr. Smita Dange - Principal Investigator
- Dr. Shashikant Dugad - Technical Advisor
Fr. C. Rodrigues Institute of Technology, Vashi
Department of Computer Engineering
