This repository contains an automated image preprocessing designed for the SpiceSpectrum dataset — a collection of 11,000 images across 11 different spice categories. The pipeline prepares raw image data for use in tasks such as image classification or SPICE variety classification. dataset and image metadata available at: https://data.mendeley.com/datasets/5v7w2hx8n5/2
Paper available at: https://www.sciencedirect.com/science/article/pii/S2352340925008194
- Total Images: 11,000
- Spice Categories: 11 types (e.g., Turmeric, Coriander, Cumin, etc.)
- Structure: Images are organized into folders by spice type.
Square Cropping
Automatically crops each image to the largest centered square.
Image Resizing
Resizes images to a uniform size (either 256×256 or 512×512 pixels).
Format Support
Supports .jpg, .jpeg, .png, .bmp, .gif, and .tiff.
Folder Structure Preservation
Maintains the original subfolder hierarchy when saving processed images.
Error Handling
Skips unreadable or corrupted image files and logs them.
Visualization
Displays side-by-side visual comparison of original vs. processed images using matplotlib.
git clone https://github.com/noushad999/spicespectrum.git
cd spicespectrumpip install pillow matplotlibpython foldername.py- Output Directory: Where processed images should be saved.
- Target Size: Enter
256or512for resizing.
SpiceSpectrum/
├── Turmeric/
│ ├── Turmeric1.jpg
│ └── ...
├── Cumin/
│ └── ...
└── ...
This preprocessing pipeline is designed for:
- Ensuring input uniformity for SPICE metric evaluation
- Cleaning and standardizing datasets for spice classification or recognition systems
This project is licensed under the MIT License.
Contributions are welcome! Feel free to open issues or submit pull requests to improve functionality, add features, or fix bugs.
For questions or collaborations, please contact: [Md Noushad Jahan Ramim] 📧 Noor Mairukh Khan Arnob 📧 Md Mubtasim Fuad 📧 contactwithnoushad@gmail.com
If you use ideas from our work useful, please cite our paper
@article{ramim2025spicespectrum,
title={SpiceSpectrum: Class-balanced Dataset of Commercially Valuable Spice Cultivars},
author={Ramim, Md Noushad Jahan and Islam, Samira and Towkir, Muhtasin and Fuad, Md Mubtasim and Arnob, Noor Mairukh Khan},
journal={Data in Brief},
pages={112097},
year={2025},
publisher={Elsevier}
}