Multi-Task Deep Learning with Over-Sampling and Style Randomization for Improved Cross-Regional Bird Vocalization Recognition
Cross-regional bird vocalization recognition (BVR) poses significant challenges due to spectral overlap, class imbalance, and domain shifts caused by ecological and acoustic variability. This paper proposes HARL-MOS, a multi-task deep learning framework that integrates class-balanced over-sampling and style randomization into a dual-branch ResNet50 architecture to improve generalization across diverse regions. HARL-MOS extends a prior auditory representation model by introducing a vocalization type auxiliary classification task to promote style-invariant representations, while over-sampling addresses the long-tail distribution of rare species. A style randomization module perturbs acoustic feature statistics during training to enhance robustness to domain-specific variations. HARL-MOS was evaluated on the DB3V dataset spanning three ecologically distinct regions in the contiguous United States under six cross-region train–test protocols, and on a separate two-site soundscape dataset with overlapping species. Experimental results demonstrated consistent improvements over a standard baseline and the prior HARL framework; in the most challenging D2D1 setting, HARL-MOS improved the macro F1-score by 4.33 percentage points over HARL and by 27.80 points over the baseline. In fine-grained cross-site BVR, HARL-MOS maintained stable improved performance with ACC reaching up to 90.36% and 91.71% for S1S2 and S2S1, indicating reduced sensitivity to domain shift. These results demonstrate that HARL-MOS is a reliable framework for automated BVR, offering practical benefits for biodiversity monitoring and conservation efforts.
The DB3V ans S1-S2 datasets were used for all the experiments in this work.
- Jing, X., Zhang, L., Xie, J., Gebhard, A., Baird, A., & Schuller, B. (2024). DB3V: A Dialect Dominated Dataset of Bird Vocalisation for Cross-corpus Bird Species Recognition, INTERSPEECH 2024, pp. 127-131, Kos, Greece. Zenodo. https://doi.org/10.5281/zenodo.11544734
| Species (Common name) | Code | D1 (Western Cordillera) | D2 (Interior Plains) | D3 (Eastern Highlands) | Sound Type | Freq. (kHz) |
|---|---|---|---|---|---|---|
| Agelaius phoeniceus (Red-winged Blackbird) | 0 | 1,295 | 54 | 839 | Song | 2.8–5.7 |
| Cardinalis cardinalis (Northern Cardinal) | 1 | 778 | 166 | 1,299 | Song | 3.5–4.0 |
| Certhia americana (Brown Creeper) | 2 | 345 | 12 | 132 | Call | 3.7–8.0 |
| Corvus brachyrhynchos (American Crow) | 3 | 645 | 123 | 435 | Call | 0.5–1.8 |
| Molothrus ater (Brown-headed Cowbird) | 4 | 392 | 50 | 96 | Call | 0.5–12.0 |
| Setophaga aestiva (American Yellow Warbler) | 5 | 730 | 9 | 297 | Song | 3.0–8.0 |
| Setophaga ruticilla (American Redstart) | 6 | 199 | 107 | 579 | Song | 3.0–8.0 |
| Spinus tristis (American Goldfinch) | 7 | 223 | 94 | 283 | Song | 1.6–6.7 |
| Tringa semipalmata (Willet) | 8 | 138 | 29 | 106 | Call | 1.5–2.5 |
| Turdus migratorius (American Robin) | 9 | 1,038 | 187 | 791 | Song | 1.8–3.7 |
- Morgan MM, Braasch J. Open set classification strategies for long-term environmental field recordings for bird species recognition. The Journal of the Acoustical Society of America. 2022 Jun 1;151(6):4028-38. Zenodo. https://zenodo.org/records/6456604
| Species (Common name) | Code | S1 | S2 | Sound Type |
|---|---|---|---|---|
| Eastern chipmunk “chuck” (Tamias striatus) | ECMK | 537 | 1,161 | Call |
| Fall field cricket (Gryllus pennsylvanicus) | FFCR | 5,296 | 6,101 | Song |
| Eastern chipmunk “chirp” (Tamias striatus) | ECMC | 101 | 2,305 | Call |
| American robin (Turdus migratorius) | AMRO | 1,411 | 5,554 | Song |
| American crow (Corvus brachyrhynchos) | AMCR | 221 | 1,477 | Call |
| Blue jay (Cyanocitta cristata) | BLJA | 770 | 1,002 | Call |
To set up and visualize Mel and Gamma spectrograms, follow these steps:
-
Set up the environment:
- Run
addpath('your_own_specified_path/features/gammatonegram')in MATLAB 2024b to include the required files. - Replace
your_own_specified_pathwith the actual path to thefeatures/gammatonegramfolder on your device.
- Run
-
Visualize spectrograms:
- Run
visualization_demo.mto generate and display Mel and Gamma spectrograms along with their delta variants.
- Run
-
Extract and store spectrograms:
- Run
listFilesAndFolders.mto extract, process, and store Mel and Gamma spectrograms.
- Run
-
Access pre-extracted features:
- Download pre-extracted spectrogram features in
.matformat from: Zipped Features (OneDrive). - Unzip the files to
your_own_specified_path/features/before runninglistFilesAndFolders.m.
- Download pre-extracted spectrogram features in
Resources:
- MATLAB 2024b Documentation for environment setup and scripting.
- Download MATLAB 2024b to install the required software.
- GitHub Markdown Guide for formatting this README.
Reminder:
- Ensure all folder paths (e.g.,
your_own_specified_path) are updated to match your local setup. - Verify that the OneDrive link is accessible; contact the repository owner if access is restricted.
-
Set up the environment:
- numpy==1.24.4
- scipy==1.10.1
- torch==2.0.1
- torchaudio==2.0.2
- torchvision==0.15.2
- scikit-learn==1.3.2
- matplotlib==3.7.2
- umap-learn==0.5.3.
- Update folder paths in all scripts to match your local setup.
-
Train DB3V models, ablation studies:
- Run
HARL-MOS-D2D1.pyfor training with D2 and testing with D1, HARL-MOS framework. - Run
HARL-OS-D2D1.pyfor training with D2 and testing with D1, HARL-OS framework. - Run
HARL-MS-D2D1.pyfor training with D2 and testing with D1, HARL-MS framework. - Run
HARL-MO-D2D1.pyfor training with D2 and testing with D1, HARL-MO framework.
- Run
-
Train S1 and S2 Subsets model, ablation studies:
- Run
HARL-MOS-S1S2.pyfor training with S1 and testing with S2, HARL-MOS framework. - Run
HARL-OS-S1S2.pyfor training with S1 and testing with S2, HARL-OS framework.
- Run
-
Visualize results:
- Please go and check Folder: HARL_MOS_visualizations/.
Resources:
- Python 3 Documentation for Python environment setup.
Reminder:
- Update all folder paths (e.g.,
your_own_specified_path) to match your local setup for MATLAB and Python. - For Python scripts, adjust paths for other
DmDncases (e.g., D1D3) as needed. - Install Python dependencies (e.g.,
pip install numpy matplotlib tensorflow umap-learn scipy).
