This repository provides a unified benchmarking framework for evaluating multiple deep learning–based Raman spectra classifiers on three open-source Raman spectroscopy datasets:
- MLROD [1]
- Bacteria-ID [2]
- API (Active Pharmaceutical Ingredients) [3]
The following models are benchmarked under consistent training and evaluation protocols:
- Deep CNN [4] (referred to as
mlrodin the codebase) - SANet [5]
- RamanNet [6]
- Transformer [7]
- RamanFormer [8]
All models are implemented in PyTorch, and the pipeline supports dataset preprocessing, training, hyperparameter tuning, and evaluation using test accuracy and macro-averaged F1 score.
Follow the steps below to reproduce the benchmarking experiments.
git clone https://github.com/asineesh/Benchmark_Raman_DeepLearning/
cd Benchmark_Raman_DeepLearningCreate the following empty directories
mkdir results/Bacteria_ID/models
mkdir datasets/Bacteria_IDDownload it from https://odr.io/MLROD#/search/display/1348/eyJkdF9pZCI6IjYwMCJ9 and place it in the directory datasets/MLROD/.
Run the processing.ipynb and test_processing.ipynb notebooks to generate .pkl files containing all the spectra interpolated to have a common spectral domain.
Download it from https://github.com/csho33/bacteria-ID/blob/master/README.md and place it in the directory datasets/Bacteria_ID.
Download it from https://springernature.figshare.com/articles/dataset/Open-source_Raman_spectra_of_chemical_compounds_for_active_pharmaceutical_ingredient_development/27931131 and place it in the directory datasets/Pharma/.
Run the explore.ipynb notebook to generate the .pkl files for the train, validation and test splits of the dataset.
To train a model on a given dataset, execute the corresponding training module located in the results/ directory as a Python script. For example, to train the RamanNet model on the Bacteria ID dataset for the 30 category isolate classification problem, run the following from the root directory
python -m results.Bacteria_ID.thirty.train_RamanNetDuring training:
- Hyperparameter tuning is performed using the validation set.
- The model achieving the best validation accuracy is saved to
results/trained_models/.
To compute test accuracy and macro F1 score, run the corresponding evaluation notebooks located in results/trained_models/.
Ensure that the paths to the trained model checkpoints are updated appropriately before running the notebooks.
- Berlanga, Genesis, Quentin Williams, and Nathan Temiquel. "Convolutional neural networks as a tool for Raman spectral mineral classification under low signal, dusty Mars conditions." Earth and Space Science 9.10 (2022): e2021EA002125.
- Ho, Chi-Sing, et al. "Rapid identification of pathogenic bacteria using Raman spectroscopy and deep learning." Nature communications 10.1 (2019): 4927.
- Flanagan, Aaron R., and Frank G. Glavin. "Open-source Raman spectra of chemical compounds for active pharmaceutical ingredient development." Scientific Data 12.1 (2025): 498.
- Liu, Jinchao, et al. "Deep convolutional neural networks for Raman spectrum recognition: a unified solution." Analyst 142.21 (2017): 4067-4074.
- Deng, Lin, et al. "Scale-adaptive deep model for bacterial raman spectra identification." IEEE Journal of Biomedical and Health Informatics 26.1 (2021): 369-378.
- Ibtehaz, Nabil, et al. "RamanNet: a generalized neural network architecture for Raman spectrum analysis." Neural Computing and Applications 35.25 (2023): 18719-18735.
- Liu, Bo, et al. "Classification of deep-sea cold seep bacteria by transformer combined with Raman spectroscopy." Scientific Reports 13.1 (2023): 3240.
- Koyun, Onur Can, et al. "RamanFormer: A transformer-based quantification approach for Raman mixture components." ACS omega 9.22 (2024): 23241-23251.