Skip to content

vigp17/ML-Classifier-MATLAB

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

Machine Learning Classifier - Iris Dataset

A comprehensive MATLAB implementation comparing multiple machine learning classification algorithms on the famous Iris flower dataset.

🎯 Project Overview

This project implements and compares five different machine learning classifiers to predict iris flower species based on petal and sepal measurements. The project achieves 100% accuracy using Support Vector Machine with Gaussian kernel.

🏆 Results Summary

Algorithm Accuracy Training Time Notes
SVM (Gaussian) 100.00% 92.2 ms 🥇 Best Overall
KNN (k=1) 97.78% 56.4 ms Very fast, highly accurate
SVM (Linear) 97.78% 132.7 ms Excellent generalization
KNN (k=5) 95.56% 22.6 ms Good balance
Decision Tree 95.56% 21.0 ms ⚡ Fastest, most interpretable

📊 Key Features

Algorithms Implemented

  • K-Nearest Neighbors (KNN) - Multiple k values tested
  • Support Vector Machine (SVM) - Three kernel types (Linear, Gaussian, Polynomial)
  • Decision Tree - Multiple depth configurations tested

Evaluation Metrics

  • Accuracy
  • Precision, Recall, F1-Score (per class)
  • Confusion matrices
  • Training time comparison
  • Feature importance analysis

Visualizations

  • Data distribution scatter plots
  • Confusion matrices for all models
  • Performance comparison charts
  • Feature importance plots
  • Hyperparameter tuning results

🚀 Quick Start

Prerequisites

  • MATLAB R2020a or later
  • Statistics and Machine Learning Toolbox

Running the Project

% Navigate to project directory
cd MLClassifier/examples

% Run individual classifiers
first_classifier      % K-Nearest Neighbors
svm_classifier        % Support Vector Machine
tree_classifier       % Decision Tree

% Compare all algorithms
compare_classifiers   % Complete comparison

📁 Project Structure

MLClassifier/
├── src/
│   ├── algorithms/           # ML algorithm implementations
│   ├── preprocessing/        # Data preprocessing functions
│   ├── evaluation/          # Evaluation metrics
│   └── visualization/       # Plotting functions
├── data/
│   └── sample_datasets/     # Iris and other datasets
├── examples/
│   ├── first_classifier.m   # KNN implementation
│   ├── svm_classifier.m     # SVM implementation
│   ├── tree_classifier.m    # Decision Tree implementation
│   └── compare_classifiers.m # Complete comparison
├── results/
│   └── plots/               # Generated visualizations
├── docs/                    # Documentation
└── README.md

🔬 Detailed Results

Dataset Information

  • Samples: 150 iris flowers
  • Features: 4 (Sepal Length, Sepal Width, Petal Length, Petal Width)
  • Classes: 3 (Setosa, Versicolor, Virginica)
  • Split: 70% training (105 samples), 30% testing (45 samples)

Best Model Performance (SVM Gaussian)

Per-Class Metrics:

Class Precision Recall F1-Score
Setosa 1.0000 1.0000 1.0000
Versicolor 1.0000 1.0000 1.0000
Virginica 1.0000 1.0000 1.0000
Average 1.0000 1.0000 1.0000

Perfect classification on all 45 test samples!

Feature Importance (Decision Tree Analysis)

Feature Importance Score
Petal Length 0.1189
Petal Width 0.0991
Sepal Length 0.0000
Sepal Width 0.0000

Key Insight: Petal measurements are sufficient for classification; sepal measurements don't contribute to the decision tree model.

🎓 What I Learned

Technical Skills

  • Implementation of multiple ML algorithms from scratch
  • Hyperparameter tuning and optimization
  • Model evaluation and comparison methodologies
  • Data visualization best practices
  • Professional code documentation

Key Insights

  • SVM with Gaussian kernel achieves perfect separation for this dataset
  • Decision trees are fastest but may slightly sacrifice accuracy
  • Feature importance analysis reveals that not all features contribute equally
  • Different algorithms have different strengths (speed vs. accuracy vs. interpretability)

🛠️ Technical Details

K-Nearest Neighbors

  • Tested k values: 1, 3, 5, 7, 9, 11, 15, 20
  • Best k: 1 (97.78% accuracy)
  • Trade-off: Lower k = higher variance, higher k = higher bias

Support Vector Machine

  • Kernels tested: Linear, Gaussian (RBF), Polynomial (degree 3)
  • Best kernel: Gaussian (100% accuracy)
  • Linear kernel also performed excellently (97.78%)

Decision Tree

  • Max depths tested: 2, 3, 5, 10, 20
  • Best depth: 3 (95.56% accuracy)
  • Deeper trees didn't improve performance (no overfitting benefit)

💡 Future Enhancements

  • Add more algorithms (Random Forest, Naive Bayes, Neural Networks)
  • Implement cross-validation for more robust evaluation
  • Create GUI application for interactive model selection
  • Add support for custom dataset upload
  • Implement ensemble methods
  • Add hyperparameter grid search automation
  • Export trained models for deployment

📚 References

  • Fisher, R. A. (1936). "The use of multiple measurements in taxonomic problems"
  • MATLAB Documentation: Statistics and Machine Learning Toolbox
  • UCI Machine Learning Repository - Iris Dataset

👨‍💻 Author

Vignesh Pai B

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

  • Built with MATLAB R2025b
  • Dataset from UCI Machine Learning Repository
  • Inspired by the need to compare classical ML algorithms
  • Created as part of learning journey in machine learning

⭐ If you found this project helpful, please consider giving it a star!

Last Updated: January 2026

About

Machine Learning classifier comparison achieving 100% accuracy on Iris dataset

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors