Bird species recognition (BSR) is a critical tool for biodiversity monitoring and ecological health assessment. This study proposes human auditory representation learning, hereafter HARL, a novel approach that integrates gammatone- and Mel-spectrogram features with deep learning architectures, including ResNet50 and multi-head attention (MHA) mechanisms, to address these challenges. Experiments demonstrate that HARL significantly outperforms baseline methods. The combination of gammatone- and Mel-spectrogram features proves particularly effective, with MHA further enhancing generalization across regions. These results highlight the potential of HARL for ecological monitoring and conservation, offering a scalable and accurate solution for automated BSR in diverse geographic contexts. Our work bridges human auditory science and machine learning, providing a foundation for future research in bioacoustics and biodiversity conservation.
The DB3V and S1–S2 datasets were used for all experiments in this work.
-
Jing, X., Zhang, L., Xie, J., Gebhard, A., Baird, A., & Schuller, B. (2024). DB3V: A Dialect Dominated Dataset of Bird Vocalisation for Cross-corpus Bird Species Recognition, INTERSPEECH 2024, pp. 127-131, Kos, Greece. Zenodo. https://doi.org/10.5281/zenodo.11544734
-
Morgan MM, Braasch J. Open set classification strategies for long-term environmental field recordings for bird species recognition. The Journal of the Acoustical Society of America. 2022 Jun 1;151(6):4028-38. Zenodo. https://zenodo.org/records/6456604
To set up and visualize Mel and Gamma spectrograms, follow these steps:
-
Set up the environment:
- Run
addpath('your_own_specified_path/features/gammatonegram')in MATLAB 2024b to include the required files. - Replace
your_own_specified_pathwith the actual path to thefeatures/gammatonegramfolder on your device.
- Run
-
Visualize spectrograms:
- Run
visualization_demo.mto generate and display Mel and Gamma spectrograms along with their delta variants.
- Run
-
Extract and store spectrograms:
- Run
listFilesAndFolders.mto extract, process, and store Mel and Gamma spectrograms.
- Run
-
Access pre-extracted features:
- Download pre-extracted spectrogram features in
.matformat from: Zipped Features (OneDrive). - Unzip the files to
your_own_specified_path/features/before runninglistFilesAndFolders.m.
- Download pre-extracted spectrogram features in
Resources:
- MATLAB 2024b Documentation for environment setup and scripting.
- Download MATLAB 2024b to install the required software.
- GitHub Markdown Guide for formatting this README.
Reminder:
- Ensure all folder paths (e.g.,
your_own_specified_path) are updated to match your local setup. - Verify that the OneDrive link is accessible; contact the repository owner if access is restricted.
-
Set up the environment:
- numpy==1.24.4
- scipy==1.10.1
- torch==2.0.1
- torchaudio==2.0.2
- torchvision==0.15.2
- scikit-learn==1.3.2
- matplotlib==3.7.2
- umap-learn==0.5.3.
- Update folder paths in all scripts to match your local setup.
-
Train Mel spectrogram models:
- Run
D1D2_mel_wo_atten.pyfor training with D1 and testing with D2, without attention. - Run
D1D2_mel_wi_atten.pyfor training with D1 and testing with D2, with attention.
- Run
-
Train Gammatone spectrogram models:
- Run
D1D2_gamma_wo_atten.pyfor training with D1 and testing with D2, without attention. - Run
D1D2_gamma_wi_atten.pyfor training with D1 and testing with D2, with attention.
- Run
-
Train combined Mel and Gammatone models:
- Run
D1D2_mel_plus_gamma_wo_atten.pyfor combined training with D1 and testing with D2, without attention. - Run
D1D2_mel_plus_gamma_wi_atten.pyfor combined training with D1 and testing with D2, with attention.
- Run
-
Visualize results:
- Run
ROC.pyandUMAPs.pyto generate ROC curves and UMAP visualizations.
- Run
Resources:
- Python 3 Documentation for Python environment setup.
Reminder:
- Update all folder paths (e.g.,
your_own_specified_path) to match your local setup for MATLAB and Python. - For Python scripts, adjust paths for other
DmDncases (e.g., D1D3) as needed. - Install Python dependencies (e.g.,
pip install numpy matplotlib tensorflow umap-learn scipy).
