This project predicts patient risk for three diseases using machine learning:
- Diabetes
- Heart disease
- Kidney disease
It includes dataset download, model training, saved model files, and final performance metrics.
Build a screening-oriented ML system that identifies whether a patient is at risk for major diseases from clinical attributes.
This is a decision-support prototype and not a medical diagnosis system.
diabetes_risk_project.ipynb: detailed diabetes workflow (EDA, preprocessing, training, evaluation, explainability)multi_disease_risk_project.ipynb: multi-disease analysis and comparisondownload_train_export.py: downloads datasets, trains models, saves artifacts, exports performance indexdiabetes_risk_report.md: deep technical reportproject_objective_and_work_done.pdf: concise project summary PDF
datasets/-> downloaded CSV filesmodels/-> trained.joblibmodelsperformance/-> exported performance metrics and model indexresults_package/-> reusable package for reading and summarizing outcomes
- Diabetes:
datasets/diabetes.csv - Heart:
datasets/heart.csv - Kidney:
datasets/kidney.csv
Source URLs are defined in download_train_export.py.
- Load disease dataset
- Clean and preprocess features (imputation + scaling)
- Train candidate models (Logistic Regression, Random Forest)
- Select best model with cross-validation ROC-AUC
- Calibrate probabilities using
CalibratedClassifierCV - Evaluate on holdout test split
- Save model and export metrics
The following metrics are generated in performance/performance_index.csv:
- ROC-AUC
- PR-AUC
- F1-score
- Precision
- Recall
- Confusion values (
tp,tn,fp,fn)
| Disease | Model | ROC-AUC | PR-AUC | F1 | Precision | Recall |
|---|---|---|---|---|---|---|
| Diabetes | Logistic Regression | 0.8117 | 0.6716 | 0.5361 | 0.6047 | 0.4815 |
| Heart | Random Forest | 0.9026 | 0.9202 | 0.8649 | 0.7805 | 0.9697 |
| Kidney | Logistic Regression | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 |
Install dependencies:
pip install -r requirements.txtTrain models and generate outputs:
python download_train_export.pyGenerate a compact summary report from metrics package:
python -m results_package.metrics_reportmodels/diabetes_model.joblibmodels/heart_model.joblibmodels/kidney_model.joblibperformance/performance_index.csvperformance/performance_index.jsonperformance/model_index.jsonperformance/results_summary.md(from package)
This project is for risk screening and educational/research purposes.
Any clinical decision must be reviewed by qualified medical professionals.