This project implements a machine learning classification system to predict the likelihood of diabetes using medical diagnostic data.
It demonstrates a complete end-to-end ML workflow, including:
- Data preprocessing
- Model training
- Evaluation
- Model persistence
To classify whether a patient is diabetic or not based on key medical features, enabling early detection and analysis.
The dataset contains medical predictor variables and a target variable.
- Pregnancies
- Glucose
- BloodPressure
- SkinThickness
- Insulin
- BMI
- DiabetesPedigreeFunction
- Age
- Outcome
- 1 → Diabetic
- 0 → Non-diabetic
- Python 🐍
- NumPy – Numerical operations
- Pandas – Data manipulation
- Scikit-learn – Machine learning
- Pickle – Model serialization
- Jupyter Notebook
- 📊 diabetes.csv – Dataset used for training and testing
- 📓 DiabetesPrediction_ML_Model.ipynb – Notebook with EDA, training, and evaluation
- 💾 classification_model.pkl – Saved trained model
- 📄 README.md – Project documentation
- Data loading and exploration
- Handling missing and invalid values
- Data splitting (train/test)
- Model training (classification)
- Model evaluation using metrics
- Saving the trained model for reuse
- Achieves reliable predictive accuracy on test data
- Demonstrates effective use of classification algorithms
- Suitable for learning and academic purposes
The trained model is saved as: classification_model.pkl
It can be loaded and used for predictions in Python using pickle.
This project is built to:
- Understand machine learning classification
- Work with real-world medical datasets
- Build an end-to-end ML pipeline
- Practice model deployment concepts
- Compare multiple ML models
- Apply hyperparameter tuning
- Improve feature engineering
- Deploy as a web app (Flask / Streamlit)
- Add dashboards and visualizations
This project is for educational purposes only and should not be used for real medical diagnosis.
Anupam Singh (Kirisaki)
Machine Learning Student