This project focuses on building multiple machine learning models to predict outcomes based on different datasets. It demonstrates the complete workflow of supervised learning, including data preprocessing, model training, evaluation, and visualization.
- Apply various machine learning algorithms
- Train and test models on real-world datasets
- Evaluate performance using metrics and visualizations
- Gain hands-on experience in supervised learning
| Algorithm | Dataset | Task |
|---|---|---|
| Logistic Regression | Titanic | Survival Prediction |
| Decision Tree | Heart Disease | Disease Detection |
| Random Forest | Loan Dataset | Loan Approval |
| SVM | Diabetes | Diabetes Prediction |
| KNN | Iris | Flower Classification |
| Naive Bayes | Spam | Spam Detection |
ML-Predictive-Modeling/
│
├── notebooks/
│ ├── 01_logistic_regression_titanic.ipynb
│ ├── 02_decision_tree_heart.ipynb
│ ├── 03_random_forest_loan.ipynb
│ ├── 04_svm_diabetes.ipynb
│ ├── 05_knn_iris.ipynb
│ ├── 06_naive_bayes_spam.ipynb
│
├── datasets/
│ ├── Titanic-Dataset.csv
│ ├── heart.csv
│ ├── diabetes.csv
│ ├── spam.csv
│ ├── train_u6lujuX_CVtuZ9i.csv
│
├── README.md
└── requirements.txt
- Python
- Pandas
- NumPy
- Matplotlib
- Seaborn
- Scikit-learn
- Data Loading
- Data Cleaning & Preprocessing
- Feature Selection & Encoding
- Train-Test Split
- Model Training
- Prediction
- Evaluation
- Visualization
Each model is evaluated using:
- Accuracy Score
- Confusion Matrix
- Classification Report
- ROC Curve (for performance analysis)
Used to understand relationships between features.
Shows predicted vs actual classification results.
Evaluates model performance across thresholds.
Displays the most important features affecting predictions.
Each notebook includes real-time predictions with:
- Input data
- Model output
- Handling missing data and preprocessing techniques
- Encoding categorical variables
- Understanding different ML algorithms
- Model evaluation and performance comparison
- Working with real-world datasets
- Clone the repository
- Install dependencies:
pip install -r requirements.txt
- Open notebooks:
jupyter notebook
This project provides a complete understanding of supervised machine learning workflows and demonstrates the application of multiple algorithms on different datasets.
This project is licensed under the Apache 2.0 License. Developed as part of a learning and internship experience.



