Benchmarking Classical Machine Learning Models on Tabular Binary Classification
This repository contains the reproducible code and results for a benchmarking study evaluating the performance of 15 classical machine learning model families across 159 tabular binary‑classification datasets. The project emphasizes methodological transparency, reproducibility, and clear documentation.
This repository includes the final report and supporting Python notebooks for the analysis and comparison of 15 machine learning models on a supervised classification task. The project addresses four core questions:
- Which models perform best overall?
- What makes datasets difficult to classify?
- Which models handle specific complexity types most effectively?
- How do accuracy and speed trade off across models?
The final report documents model performance and throughput, dataset characteristics, and tradeoffs between performance and efficiency. Using the Lorena et al. framework, the report characterizes each of the 159 datasets across 22 complexity dimensions spanning feature discriminability, class separability, geometric structure, and neighborhood cohesion. By correlating these complexity measures with the performance of 15 diverse models, the analysis identifies not only which models perform best overall, but also which algorithms are best suited to handle specific types of dataset difficulty.
Repository Structure
• notebooks
Cleaned Jupyter notebooks used for data preparation, feature processing, model training, and model evaluation. Each notebook is numbered to reflect the recommended execution order.
• Binary Classification Final Report
Complete research paper presenting the benchmarking study of 15 ML models on 159 binary classification tabular datasets. Includes methodology, statistical analyses, performance rankings, dataset complexity analysis, and key findings on what makes classification problems difficult.
• LICENSE
MIT license information governing use, distribution, and modification of this repository.MIT license information governing use, distribution, and modification of this repository.
• Consolidated results Excel file of model performance and dataset complexity
Master data file containing performance results for all 2,384 model-dataset combinations. Includes accuracy/F1/AUC metrics, throughput, dataset complexity scores, and computational statistics.
• README
This document
• requirements.txt
Python package dependencies required to reproduce the environment used for all experiments.
Project Overview
The goal of this project is to systematically evaluate how well different classical machine learning model families perform on heterogeneous tabular datasets. The study includes:
• 22 dataset complexity measures
• 8 model families
• 2,384 model–dataset evaluations
• Accuracy and throughput metrics
• Correlation analyses between complexity and performance.
The analysis is designed to support reproducible benchmarking and to provide insight into how dataset characteristics relate to model performance.
Complexity Analysis
Dataset complexity was measured using the problexity library implementation of Lorena et al. (2019) measures. See notebooks/08_calculate_dataset_complexity.ipynb for the complete calculation workflow.
The problexity library requires specific data formatting. Refer to the notebook for working example.
How to Use This Repository
-
Clone the repository git clone https://github.com//.git
-
Install dependencies pip install -r requirements.txt
-
Open the notebooks Launch Jupyter or VS Code and run the notebooks in the order listed in the notebooks/ directory.
-
Explore results The Excel file in results/ contains all model–dataset outcomes used in the analysis.
Reproducibility Notes
• All experiments were run using a fixed Python environment defined in requirements.txt.
• Random seeds were set where applicable to support reproducibility.
• The 159 raw datasets used for this project are listed in Appendix D of the Final Report, including a link to each dataset. The datasets are not included in this repository.
Datasets
This benchmark uses 159 tabular binary classification tabular datasets from public repositories (UCI, OpenML, Kaggle).
A complete catalog of all 159 datasets, including source links, is provided in Appendix B of the accompanying Final Report.
Note: Datasets are not included in this repository. Users must download datasets individually from their respective sources.
Citation
If you use this repository or its results in academic work, please cite it appropriately: Ed Kaempf, Benchmarking 15 Machine Learning Models for Binary Classification: Accuracy, Complexity, and Speed, December 2025.
License
This project is licensed under the MIT License. Refer to the “LICENSE” file for details.
Contact Information
For questions or suggestions, feel free to contact:
• Name: Ed Kaempf
• Email: edkaempf@gmail.com
• GitHub: github.com/EofK
• Linkedin: https://www.linkedin.com/in/ed-kaempf-4887839b/