We introduce Robust-Minisets, a collection of robust benchmark classification datasets in the low resolution realm based on well-established image classification benchmarks, such as CIFAR, Tiny ImageNet, EuroSAT and the MedMNIST collection. We port existing robustness and generalization benchmarks (ImageNet-C, -R, -A and v2) to the small dataset domain introducing novel benchmarks to comprehensively evaluate the robustness and generalization capabilities of image classification models on low resolution datsets. This results in an extensive collection consisting of already existing test sets (e.g. CIFAR-10.1 and Tiny ImageNet-C) as well as the novel benchmarks EuroSAT-C, MedMNIST-C, and Tiny ImageNet-A, -R and -v2 introduced in our ICPR2024 paper "GenFormer - Generated Images are All You Need to Improve Robustness of Transformers on Small Datasets".
Sven Oehri, Nikolas Ebert, Ahmed Abdullah, Didier Stricker & Oliver Wasenmüller
CeMOS - Research and Transfer Center, University of Applied Sciences Mannheim
robust_minisets/:dataset.py: PyTorch datasets and dataloaders of Robust-Minisets.info.py: Dataset informationdictfor each subset of Robust-Minisets.
examples/:getting_started.ipynb: To explore the Robust-Minisets dataset collection with jupyter notebook. It is ONLY intended for a quick exploration, i.e., it does not provide full training and evaluation functionalities.getting_started_without_PyTorch.ipynb: This notebook provides snippets about how to use Robust-Minisets data (the.npzfiles) without PyTorch.
setup.py: To installrobust_minisetsas a module
Setup the required environments and install robust-minisets as a standard Python package from PyPI:
pip install robust-minisets
Or install from source:
pip install --upgrade git+https://github.com/CeMOS-IS/Robust-Minisets.git
Check whether you have installed the latest code version:
>>> import robust_minisets
>>> print(robust_minisets.__version__)
The code requires only common Python environments for machine learning. Basically, it was tested with
- Python 3 (>=3.8)
- torch, torchvision, numpy, Pillow, scikit-learn, scikit-image, tqdm, fire
Higher (or lower) versions should also work (perhaps with minor modifications).
To use a standard test set utilizing the downloaded files:
>>> from robust_minisets import TinyImageNetR
>>> test_dataset = TinyImageNetR(split="test")
To enable automatic downloading by setting download=True:
>>> from robust_minisets import BreastMNISTC
>>> val_dataset = BreastMNISTC(split="val", download=True)
Certain datasets (Tiny ImageNet, EuroSAT) are implemented as training datasets as well:
>>> from robust_minisets import EuroSAT
>>> train_dataset = EuroSAT(split="train", download=True)
-
Great! Our code is designed to work with PyTorch.
-
Explore the Robust-Minisets dataset with jupyter notebook (
getting_started.ipynb), and train basic neural networks in PyTorch.
- Although our code is tested with PyTorch, you are free to parse them with your own code (without PyTorch or even without Python!), as they are only standard NumPy serialization files. It is simple to create a dataset without PyTorch.
- Go to
getting_started_without_PyTorch.ipynb, which provides snippets about how to use Robust-Minisets data (the.npzfiles) without PyTorch. - Simply change the super class of
Robust-Minisetsfromtorch.utils.data.Datasettocollections.Sequence, you will get a standard dataset without PyTorch. Checkdataset_without_pytorch.pyfor more details. - You still have most functionality of our Robust-Minisets code ;)
Please download the dataset(s) via Zenodo. You could also use our code to download automatically by setting download=True in dataset.py.
The Robust-Minisets collection contains several (mostly) test datasets. Each dataset (e.g., tiny-imagenet-r.npz) is comprised of up to 6 keys: train_images, train_labels, val_images, val_labels, test_images and test_labels.
train_images/val_images/test_images:N×W×H× 3.Ndenotes the number of samples,WandHdenote the width and height.train_labels/val_labels/test_labels:N×1.Ndenotes the number of samples.
Following we provide a little overview on the datasets in Robust-Minisets:
- CIFAR-10.1
- CIFAR-10-C
- CIFAR-100-C
- EuroSAT
- EuroSAT-C
- MedMNIST-C
- BreastMNIST-C
- BloodMNIST-C
- DermaMNIST-C
- OCTMNIST-C
- OrganAMNIST-C
- OrganCMNIST-C
- OrganSMNIST-C
- PathMNIST-C
- PneumoniaMNIST-C
- TissueMNIST-C
- Tiny ImageNet
- Tiny ImageNet-A
- Tiny ImageNet-C
- Tiny ImageNet-R
- Tiny ImageNetv2
Here we provide a detailed summary to all datasets of the Robust-Minisets collection.
In this section we provide details about the structure of the corrupted (-C) datasets in the Robust-Minisetscollection. In case you are interested in a detailed evaluation per corruption and/or severity level, the images in the datasets follow the same structure:
- Each dataset is of shape
N$\cdot$ C$\cdot$ S×W×H× 3, whereNdenotes the number of test samples,Cdenotes the number of corruptions, andSdenotes the number of severity levels (S=5). - The images are ordered corruption by corruption and for each corruption from severity level 1 to 5
- The order of corruptions for each dataset and split can be found here or via the info attribute of each dataset (e.g.
TinyImageNetR.info["corruption_dict"])
-
List all available datasets:
python -m robust_minisets available -
Download all available datasets:
python -m robust_minisets download -
Delete all downloaded npz from root:
python -m robust_minisets clean -
Print the dataset details given a dataset flag:
python -m robust_minisets info --flag=<dataset_flag> -
Save the dataset as standard figure and csv files, which could be used for AutoML tools, e.g., Google AutoML Vision:
python -m robust_minisets save --flag=<dataset_flag> --folder=tmp/ --postfix=png --download=TrueBy default,
download=False.
The code is under Apache-2.0 License.
The publication licenses of the datasets can be found within the info dictionary via robust_minisets.INFO[<dataset_flag>].
This research was partly funded by Albert and Anneliese Konanz Foundation, the German Research Foundation under grant INST874/9-1 and the Federal Ministry of Education and Research Germany in the project M2Aind-DeepLearning (13FH8I08IA).
If you find this work useful, please consider citing us:
@inproceedings{oehri2024genformer,
title = {GenFormer – Generated Images are All You Need to Improve Robustness of Transformers on Small Datasets},
author = {Oehri, Sven and Ebert, Nikolas and Abdullah, Ahmed and Stricker, Didier and Wasenm{\"u}ller, Oliver},
booktitle = {International Conference on Pattern Recognition (ICPR)},
year = {2024},
}DISCLAIMER: Robust-Minisets is based on a wide range of existing datasets and benchmarks. Thus, please also cite source data paper(s) of the Robust-Miniset subset(s):
- CIFAR-10.1
- EuroSAT
- ImageNet-A
- ImageNet-C
- ImageNet-R
- ImageNetv2
- MedMNIST, the respective source datasets (described here)
v1.0.0: Robust-Minisets v1 release
