A comprehensive MATLAB implementation comparing multiple machine learning classification algorithms on the famous Iris flower dataset.
This project implements and compares five different machine learning classifiers to predict iris flower species based on petal and sepal measurements. The project achieves 100% accuracy using Support Vector Machine with Gaussian kernel.
| Algorithm | Accuracy | Training Time | Notes |
|---|---|---|---|
| SVM (Gaussian) | 100.00% | 92.2 ms | 🥇 Best Overall |
| KNN (k=1) | 97.78% | 56.4 ms | Very fast, highly accurate |
| SVM (Linear) | 97.78% | 132.7 ms | Excellent generalization |
| KNN (k=5) | 95.56% | 22.6 ms | Good balance |
| Decision Tree | 95.56% | 21.0 ms | ⚡ Fastest, most interpretable |
- K-Nearest Neighbors (KNN) - Multiple k values tested
- Support Vector Machine (SVM) - Three kernel types (Linear, Gaussian, Polynomial)
- Decision Tree - Multiple depth configurations tested
- Accuracy
- Precision, Recall, F1-Score (per class)
- Confusion matrices
- Training time comparison
- Feature importance analysis
- Data distribution scatter plots
- Confusion matrices for all models
- Performance comparison charts
- Feature importance plots
- Hyperparameter tuning results
- MATLAB R2020a or later
- Statistics and Machine Learning Toolbox
% Navigate to project directory
cd MLClassifier/examples
% Run individual classifiers
first_classifier % K-Nearest Neighbors
svm_classifier % Support Vector Machine
tree_classifier % Decision Tree
% Compare all algorithms
compare_classifiers % Complete comparisonMLClassifier/
├── src/
│ ├── algorithms/ # ML algorithm implementations
│ ├── preprocessing/ # Data preprocessing functions
│ ├── evaluation/ # Evaluation metrics
│ └── visualization/ # Plotting functions
├── data/
│ └── sample_datasets/ # Iris and other datasets
├── examples/
│ ├── first_classifier.m # KNN implementation
│ ├── svm_classifier.m # SVM implementation
│ ├── tree_classifier.m # Decision Tree implementation
│ └── compare_classifiers.m # Complete comparison
├── results/
│ └── plots/ # Generated visualizations
├── docs/ # Documentation
└── README.md
- Samples: 150 iris flowers
- Features: 4 (Sepal Length, Sepal Width, Petal Length, Petal Width)
- Classes: 3 (Setosa, Versicolor, Virginica)
- Split: 70% training (105 samples), 30% testing (45 samples)
Per-Class Metrics:
| Class | Precision | Recall | F1-Score |
|---|---|---|---|
| Setosa | 1.0000 | 1.0000 | 1.0000 |
| Versicolor | 1.0000 | 1.0000 | 1.0000 |
| Virginica | 1.0000 | 1.0000 | 1.0000 |
| Average | 1.0000 | 1.0000 | 1.0000 |
Perfect classification on all 45 test samples!
| Feature | Importance Score |
|---|---|
| Petal Length | 0.1189 |
| Petal Width | 0.0991 |
| Sepal Length | 0.0000 |
| Sepal Width | 0.0000 |
Key Insight: Petal measurements are sufficient for classification; sepal measurements don't contribute to the decision tree model.
- Implementation of multiple ML algorithms from scratch
- Hyperparameter tuning and optimization
- Model evaluation and comparison methodologies
- Data visualization best practices
- Professional code documentation
- SVM with Gaussian kernel achieves perfect separation for this dataset
- Decision trees are fastest but may slightly sacrifice accuracy
- Feature importance analysis reveals that not all features contribute equally
- Different algorithms have different strengths (speed vs. accuracy vs. interpretability)
- Tested k values: 1, 3, 5, 7, 9, 11, 15, 20
- Best k: 1 (97.78% accuracy)
- Trade-off: Lower k = higher variance, higher k = higher bias
- Kernels tested: Linear, Gaussian (RBF), Polynomial (degree 3)
- Best kernel: Gaussian (100% accuracy)
- Linear kernel also performed excellently (97.78%)
- Max depths tested: 2, 3, 5, 10, 20
- Best depth: 3 (95.56% accuracy)
- Deeper trees didn't improve performance (no overfitting benefit)
- Add more algorithms (Random Forest, Naive Bayes, Neural Networks)
- Implement cross-validation for more robust evaluation
- Create GUI application for interactive model selection
- Add support for custom dataset upload
- Implement ensemble methods
- Add hyperparameter grid search automation
- Export trained models for deployment
- Fisher, R. A. (1936). "The use of multiple measurements in taxonomic problems"
- MATLAB Documentation: Statistics and Machine Learning Toolbox
- UCI Machine Learning Repository - Iris Dataset
Vignesh Pai B
- Email: vigneshpaib@gmail.com
- LinkedIn: https://www.linkedin.com/in/vigneshpaib/
- GitHub: https://github.com/vigp17
This project is licensed under the MIT License - see the LICENSE file for details.
- Built with MATLAB R2025b
- Dataset from UCI Machine Learning Repository
- Inspired by the need to compare classical ML algorithms
- Created as part of learning journey in machine learning
⭐ If you found this project helpful, please consider giving it a star!
Last Updated: January 2026