Robust-Minisets

We introduce Robust-Minisets, a collection of robust benchmark classification datasets in the low resolution realm based on well-established image classification benchmarks, such as CIFAR, Tiny ImageNet, EuroSAT and the MedMNIST collection. We port existing robustness and generalization benchmarks (ImageNet-C, -R, -A and v2) to the small dataset domain introducing novel benchmarks to comprehensively evaluate the robustness and generalization capabilities of image classification models on low resolution datsets. This results in an extensive collection consisting of already existing test sets (e.g. CIFAR-10.1 and Tiny ImageNet-C) as well as the novel benchmarks EuroSAT-C, MedMNIST-C, and Tiny ImageNet-A, -R and -v2 introduced in our ICPR2024 paper "GenFormer - Generated Images are All You Need to Improve Robustness of Transformers on Small Datasets".

Sven Oehri, Nikolas Ebert, Ahmed Abdullah, Didier Stricker & Oliver Wasenmüller
CeMOS - Research and Transfer Center, University of Applied Sciences Mannheim

Code Structure

robust_minisets/:
- dataset.py: PyTorch datasets and dataloaders of Robust-Minisets.
- info.py: Dataset information dict for each subset of Robust-Minisets.
examples/:
- getting_started.ipynb: To explore the Robust-Minisets dataset collection with jupyter notebook. It is ONLY intended for a quick exploration, i.e., it does not provide full training and evaluation functionalities.
- getting_started_without_PyTorch.ipynb: This notebook provides snippets about how to use Robust-Minisets data (the .npz files) without PyTorch.
setup.py: To install robust_minisets as a module

Installation and Requirements

Setup the required environments and install robust-minisets as a standard Python package from PyPI:

pip install robust-minisets

Or install from source:

pip install --upgrade git+https://github.com/CeMOS-IS/Robust-Minisets.git

Check whether you have installed the latest code version:

>>> import robust_minisets
>>> print(robust_minisets.__version__)

The code requires only common Python environments for machine learning. Basically, it was tested with

Python 3 (>=3.8)
torch, torchvision, numpy, Pillow, scikit-learn, scikit-image, tqdm, fire

Higher (or lower) versions should also work (perhaps with minor modifications).

Quick Start

To use a standard test set utilizing the downloaded files:

>>> from robust_minisets import TinyImageNetR
>>> test_dataset = TinyImageNetR(split="test")

To enable automatic downloading by setting download=True:

>>> from robust_minisets import BreastMNISTC
>>> val_dataset = BreastMNISTC(split="val", download=True)

Certain datasets (Tiny ImageNet, EuroSAT) are implemented as training datasets as well:

>>> from robust_minisets import EuroSAT
>>> train_dataset = EuroSAT(split="train", download=True)

If you use PyTorch...

Great! Our code is designed to work with PyTorch.
Explore the Robust-Minisets dataset with jupyter notebook (getting_started.ipynb), and train basic neural networks in PyTorch.

If you do not use PyTorch...

Although our code is tested with PyTorch, you are free to parse them with your own code (without PyTorch or even without Python!), as they are only standard NumPy serialization files. It is simple to create a dataset without PyTorch.
Go to getting_started_without_PyTorch.ipynb, which provides snippets about how to use Robust-Minisets data (the .npz files) without PyTorch.
Simply change the super class of Robust-Minisets from torch.utils.data.Dataset to collections.Sequence, you will get a standard dataset without PyTorch. Check dataset_without_pytorch.py for more details.
You still have most functionality of our Robust-Minisets code ;)

Dataset

Please download the dataset(s) via Zenodo. You could also use our code to download automatically by setting download=True in dataset.py.

The Robust-Minisets collection contains several (mostly) test datasets. Each dataset (e.g., tiny-imagenet-r.npz) is comprised of up to 6 keys: train_images, train_labels, val_images, val_labels, test_images and test_labels.

train_images / val_images / test_images: N × W × H × 3. N denotes the number of samples, W and H denote the width and height.
train_labels / val_labels / test_labels: N × 1. N denotes the number of samples.

Following we provide a little overview on the datasets in Robust-Minisets:

CIFAR-10.1
CIFAR-10-C
CIFAR-100-C
EuroSAT
EuroSAT-C
MedMNIST-C
- BreastMNIST-C
- BloodMNIST-C
- DermaMNIST-C
- OCTMNIST-C
- OrganAMNIST-C
- OrganCMNIST-C
- OrganSMNIST-C
- PathMNIST-C
- PneumoniaMNIST-C
- TissueMNIST-C
Tiny ImageNet
Tiny ImageNet-A
Tiny ImageNet-C
Tiny ImageNet-R
Tiny ImageNetv2

Here we provide a detailed summary to all datasets of the Robust-Minisets collection.

Corruption Details

In this section we provide details about the structure of the corrupted (-C) datasets in the Robust-Minisetscollection. In case you are interested in a detailed evaluation per corruption and/or severity level, the images in the datasets follow the same structure:

Each dataset is of shape N $\cdot$ C $\cdot$ S × W × H × 3, where N denotes the number of test samples, C denotes the number of corruptions, and S denotes the number of severity levels (S=5).
The images are ordered corruption by corruption and for each corruption from severity level 1 to 5
The order of corruptions for each dataset and split can be found here or via the info attribute of each dataset (e.g. TinyImageNetR.info["corruption_dict"])

Command Line Tools

List all available datasets:
```
  python -m robust_minisets available
```
Download all available datasets:
```
  python -m robust_minisets download
```
Delete all downloaded npz from root:
```
  python -m robust_minisets clean
```

Print the dataset details given a dataset flag:

  python -m robust_minisets info --flag=<dataset_flag>

Save the dataset as standard figure and csv files, which could be used for AutoML tools, e.g., Google AutoML Vision:
```
  python -m robust_minisets save --flag=<dataset_flag> --folder=tmp/ --postfix=png --download=True
```
By default, download=False.

License

The code is under Apache-2.0 License.

The publication licenses of the datasets can be found within the info dictionary via robust_minisets.INFO[<dataset_flag>].

Acknowledgements

This research was partly funded by Albert and Anneliese Konanz Foundation, the German Research Foundation under grant INST874/9-1 and the Federal Ministry of Education and Research Germany in the project M2Aind-DeepLearning (13FH8I08IA).

Citing

If you find this work useful, please consider citing us:

@inproceedings{oehri2024genformer,
    title = {GenFormer – Generated Images are All You Need to Improve Robustness of Transformers on Small Datasets},
    author = {Oehri, Sven and Ebert, Nikolas and Abdullah, Ahmed and Stricker, Didier and Wasenm{\"u}ller, Oliver},
    booktitle = {International Conference on Pattern Recognition (ICPR)},
    year = {2024},
}

DISCLAIMER: Robust-Minisets is based on a wide range of existing datasets and benchmarks. Thus, please also cite source data paper(s) of the Robust-Miniset subset(s):

CIFAR-10.1
EuroSAT
ImageNet-A
ImageNet-C
ImageNet-R
ImageNetv2
MedMNIST, the respective source datasets (described here)

Release versions

v1.0.0: Robust-Minisets v1 release

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
assets		assets
examples		examples
robust_minisets		robust_minisets
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
datasets.md		datasets.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Robust-Minisets

Code Structure

Installation and Requirements

Quick Start

If you use PyTorch...

If you do not use PyTorch...

Dataset

Corruption Details

Command Line Tools

License

Acknowledgements

Citing

Release versions

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Robust-Minisets

Code Structure

Installation and Requirements

Quick Start

If you use PyTorch...

If you do not use PyTorch...

Dataset

Corruption Details

Command Line Tools

License

Acknowledgements

Citing

Release versions

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages