This project focuses on predicting diabetes in patients using a Decision Tree Classifier. The model is trained on medical diagnostic data to classify whether a patient is diabetic or non-diabetic. The project demonstrates a complete machine learning pipeline from data loading to model evaluation and visualization.
The dataset consists of medical attributes commonly used for diabetes diagnosis.
- Pregnancies
- Glucose
- BloodPressure
- SkinThickness
- Insulin
- BMI
- DiabetesPedigreeFunction
- Age
- Outcome
0→ Non-Diabetic1→ Diabetic
- Python
- Pandas
- NumPy
- Matplotlib
- Seaborn
- Scikit-learn
- Graphviz
- Google Colab
- Load dataset using Pandas
- Perform basic data inspection
- Split dataset into training and testing sets (80:20)
- Train Decision Tree Classifier
- Evaluate model accuracy
- Visualize decision tree
from sklearn.tree import DecisionTreeClassifier
y_pred = model.predict(x_test)
accuracy_score(y_test, y_pred)
74.67%
import graphviz
graphviz.Source(export_graphviz(
model,
feature_names=x.columns,
filled=True
))
├── diabetes.csv ├── diabetes_decision_tree.ipynb ├── README.md
Handle zero values using imputation Hyperparameter tuning Feature selection Compare with other models (SVM, Random Forest) Model deployment using Flask or FastAPI
Saravanavel E AI & Data Science Student GitHub: https://github.com/SaravanavelE
This project is intended for educational and academic use.
model = DecisionTreeClassifier() model.fit(x_train, y_train)