A Light-weight Model for Speech Emotion Recognition based on Pattern Learning Block

Paper(Korean)

01. Data preprocessing

feature : MFCC
- signal_len < 100,000 ... zero-padding
- else ... cut

02. Model architecture

Pattern Learning Block(PLB)

Proposed Model

Number of parameters each modules

03. Experiments + Ablation studies

meaning
- proposed : proposed model
- others : remove module
  - DWSCNN : Depth-wise seperable Convolution
  - CBAM : Convolution Block Attention Module
  - Spa : Spatial-attention
  - SA : Self-attention
  - routing : Dynamic Routing
EMO-DB

Name	#Params	max_WA(%)	min_WA(%)	avg_WA(%)	code
proposed	95,288	90.76	85.11	88.18	Link
DWSCNN	69,688	92.11	74.03	82.17	Link
CBAM	90,994	94.49	78.79	85.88	Link
Spa	93,770	92.11	82.38	87.24	Link
SA	95,160	92.93	79.53	84.51	Link
routing	70,712	87.42	70.83	78.66	Link
Δ		-3.73	+2.73	+0.94

RAVDESS

Name	#Params	max_WA(%)	min_WA(%)	avg_WA(%)	code
proposed	95,353	87.50	83.75	85.56	Link
DWSCNN	69,753	78.12	65.62	72.68	Link
CBAM	91,059	88.12	81.25	84.87	Link
Spa	93,835	85.00	77.50	80.68	Link
SA	95,225	82.50	74.37	78.50	Link
routing	70,777	70.62	65.00	67.62	Link
Δ		-0.62	+2.50	+0.69

IEMOCAP

Name	#Params	max_WA(%)	min_WA(%)	avg_WA(%)	code
proposed	95,093	66.20	63.17	65.20	Link
DWSCNN	69,493	65.77	59.52	62.72	Link
CBAM	90,799	69.00	63.47	65.07	Link
Spa	93,575	69.96	65.69	67.40	Link
SA	94,965	67.18	60.91	64.56	Link
routing	70,517	66.66	62.21	64.30	Link
Δ		-3.76	-2.52	-2.20

04. Real-time Inference

setting
- batch_size = 1
- Eq = all test dataset inference time / number of test dataset
  - i.e. Average
Inference time / wav (sec)

H/W	EMO-DB	RAVDESS	IEMOCAP
RTX 3080TI	0.04371	0.03033	0.03416
i7-12700K	0.05000	0.04545	0.04510
RTX 2080TI	0.07182	0.06225	0.04953
i7-8700	0.07622	0.07257	0.06538
Raspberry Pi	1.42443	1.35941	1.22835

05. Memory usage

GPU peak memory usage
- Maximum usage of GPU memory at the moment
- via tf.config.experimental.get_memory_info(‘GPU:0’)
Model size
- saved model weights size

Model	Num.Params	Peak memory usage(GB)	Model size(Mb)
Proposed	95K	0.000627	0.433616

License

Non-commercial only

Name		Name	Last commit message	Last commit date
Latest commit History 102 Commits
Etc. experiments		Etc. experiments
imgs		imgs
★ 10-Fold result		★ 10-Fold result
★ ablation study		★ ablation study
★ visualization		★ visualization
.gitignore		.gitignore
Config.py		Config.py
README.md		README.md
discord_notice.py		discord_notice.py
my_models.py		my_models.py
paper.pdf		paper.pdf
requirements.txt		requirements.txt
utils.py		utils.py
★ EMODB-training.ipynb		★ EMODB-training.ipynb
★ IEMOCAP-training.ipynb		★ IEMOCAP-training.ipynb
★ RAVDESS-training.ipynb		★ RAVDESS-training.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A Light-weight Model for Speech Emotion Recognition based on Pattern Learning Block

01. Data preprocessing

02. Model architecture

03. Experiments + Ablation studies

04. Real-time Inference

05. Memory usage

License

About

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

A Light-weight Model for Speech Emotion Recognition based on Pattern Learning Block

01. Data preprocessing

02. Model architecture

03. Experiments + Ablation studies

04. Real-time Inference

05. Memory usage

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors

Uh oh!

Languages