Skip to content

paulpel/imbalanced-classification-benchmark

Repository files navigation

Imbalanced Classification Benchmark

Benchmarking five class-imbalance strategies — random under-sampling, random over-sampling, SMOTE, ADASYN, and a custom UMCE ensemble — across three classifiers and twelve imbalanced KEEL datasets, with statistical significance testing (two-way ANOVA + Tukey HSD).

Headline result: the custom UMCE method achieves the highest mean balanced accuracy of every strategy tested — overall and for each of the three classifiers.

Research question

On strongly imbalanced binary datasets, which resampling strategy best helps a standard classifier recover minority-class performance? And does a custom under-sampling ensemble (UMCE) beat the well-known baselines?

Strategies compared

Strategy What it does
Random under-sampling Subsample the majority class down to the minority size.
Random over-sampling Bootstrap the minority class up to the majority size.
SMOTE Synthesise new minority samples by interpolating neighbours.
ADASYN Like SMOTE, but generates more samples in hard-to-learn regions.
UMCE (custom) Under-sampling with a Multiple-Classifier Ensemble: split the majority class into k ≈ (imbalance ratio) balanced subsets, train one classifier per subset together with all minority samples, and combine predictions by majority vote.

Resampling is applied to the training fold only, never to the test fold.

Classifiers

Random Forest, Decision Tree, and Gaussian Naive Bayes — each fitted on StandardScaler-normalised features.

Data

Twelve binary imbalanced datasets from the KEEL repository, each pre-split into 5 stratified cross-validation folds: 5 Ecoli variants, Glass2, and 6 Yeast variants (protein/cell-localisation and glass-type tasks with strong class skew).

Methodology

  • 5-fold cross-validation; seven metrics recorded per fold: accuracy, balanced accuracy, precision, recall, F1, classification error, AUC-ROC.
  • Two-way ANOVA on balanced accuracy with an interaction term: value ~ C(model) + C(method) + C(model):C(method).
  • Tukey HSD post-hoc test over the model × method groups (19 of the pairwise differences are significant at α = 0.05).

Results

Mean balanced accuracy across the 12 datasets (5-fold CV):

Strategy Overall Random Forest Decision Tree Gaussian NB
Random under-sampling 0.782 0.857 0.809 0.680
Random over-sampling 0.723 0.769 0.766 0.633
SMOTE 0.768 0.829 0.802 0.672
ADASYN 0.715 0.766 0.742 0.636
UMCE (ours) 0.819 0.865 0.848 0.742

UMCE is the strongest strategy overall and for every classifier, and Random Forest is the strongest base learner. Full per-dataset numbers are in ranking_results.xlsx; ANOVA/Tukey output in anova_results.xlsx and tukey_results.csv.

Tukey HSD post-hoc comparison of model × method groups

Reproduce

uv venv && source .venv/bin/activate
uv pip install -r requirements.txt

python main.py          # all classifiers × strategies over the 12 datasets -> results/*.json
python calc_average.py  # average each metric across the 5 folds        -> results/average_*.json
python results.py       # two-way ANOVA + Tukey HSD                      -> anova_results.xlsx, tukey_results.csv
python ranking.py       # per-dataset, per-method classifier ranking     -> ranking_results.xlsx

Repository layout

.
├── load_data.py      # parse the KEEL ARFF folds from data_raw/
├── sampling.py       # random under/over-sampling, SMOTE, ADASYN wrappers
├── umce.py           # the custom Under-sampling Multiple-Classifier Ensemble
├── models.py         # train + evaluate RF / DT / GaussianNB, 7 metrics each
├── main.py           # experiment runner -> results/*.json
├── calc_average.py   # average folds    -> results/average_*.json
├── results.py        # two-way ANOVA + Tukey HSD
├── ranking.py        # per-dataset method ranking
├── statistic.py      # normality / ANOVA helpers
├── handle_pickle.py  # small (de)serialisation helper
├── data_raw/         # 12 KEEL datasets, 5 folds each (ARFF)
└── results/          # computed metrics (JSON)

Data attribution

Datasets are from the KEEL imbalanced-classification dataset repository (Alcalá-Fdez et al., KEEL Data-Mining Software Tool). See https://sci2s.ugr.es/keel/imbalanced.php.

About

Benchmarking 5 class-imbalance strategies (incl. a custom UMCE ensemble) across 3 classifiers and 12 KEEL datasets, with two-way ANOVA + Tukey HSD.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages