-
Download and extract the COCO 2014 Train images and 2014 Val images from here.
-
Download the Karpathy split for COCO from here.
-
Run
notebooks/preprocess_mscoco.ipynb, updating paths at the top of the notebook. -
Update the
PATHSvariable at the top oflibs/datasets/utils.py.
-
Download the Flickr30k images from here.
-
Download the Karpathy split for Flickr30k from here.
-
Run
notebooks/preprocess_flickr30k.ipynb, updating paths at the top of the notebook. -
Update the
PATHSvariable at the top oflibs/datasets/utils.py.
-
Download the MMIMDB dataset from here.
-
Run
notebooks/preprocess_mmimdb.ipynb, updating paths at the top of the notebook. -
Update the
PATHSvariable at the top oflibs/datasets/utils.py.
-
Obtain access to the MIMIC-CXR-JPG Database Database on PhysioNet and download the dataset.
-
Download and unzip the
mimic-cxr-reports.zipfile from this repository. -
Run
notebooks/preprocess_mimiccxr.ipynb, updating paths at the top of the notebook. -
Update the
PATHSvariable at the top oflibs/datasets/utils.py.
-
Download the Image URLs with annotations here.
-
Download the images, e.g. with the img2dataset library.
-
Write a temporary csv file that the library can load:
import pandas as pd from pathlib import Path df = pd.read_json('/PATH/ImageNetRed/dataset_no_images/mini-imagenet-annotations.json') df = pd.DataFrame(df['data'].apply(lambda x: x[0]).tolist()) out_dir = Path('/PATH/ImageNetRed/dataset_no_images/mini-imagenet/images') out_dir.mkdir(exist_ok = True) df = df.rename(columns = { 'image/uri': 'url' }) df.to_csv('./temp_mini_imagenet.csv') -
Run the following:
img2dataset --input_format csv --url_list=/PATH/temp_mini_imagenet.csv --output_folder=/PATH/ImageNetRed/dataset_no_images/mini-imagenet/images --thread_count=64 --image_size=256 --resize_mode keep_ratio
-
-
Run
notebooks/preprocess_imagenet_red.ipynb, updating paths at the top of the notebook. -
Update the
PATHSvariable at the top oflibs/datasets/utils.py.