😀 😡 😐 😱 EMO SUPERB 🙄 🤢 😭 😯

Introduction

Speech emotion recognition (SER) is a pivotal technology for human-computer interaction systems. However, 80.77% of SER papers yield results that cannot be reproduced. We develop EMO-SUPERB, shorted for EMOtion Speech Universal PERformance Benchmark, aims at enhancing open-source initiatives for SER. EMO-SUPERB includes a user-friendly codebase to leverage 15 state-of-the-art speech self-supervised learning models (SSLMs) for exhaustive evaluation across six open-source SER datasets. EMO-SUPERB streamlines result sharing via an online leaderboard, fostering collaboration within a community-driven benchmark and thereby enhancing the development of SER. On average, 2.58% annotations are annotated using natural language. SER relies on classification models and is unable to process natural languages, leading to the discarding of these valuable annotations. We prompt ChatGPT to mimic annotators, comprehend natural language annotations, and subsequently re-label the data. By utilizing labels generated by ChatGPT, we consistently achieve an average relative gain of 3.08% across all settings.

Fundamental Problems in SER

🚨 Issue1: Typed Description

🔍 Issue: Annotators prefer using typed descriptions (e.g., "Slightly Angry, calm") to annotate emotional speech. However, current SER models solely utilize hard labels for classification instead of natural language.

🛠️ Approach: We employ ChatGPT to mimic annotators, comprehend typed descriptions, and relabel the emotional speech data. 👉 Typed Description

🚨 Issue2: Reproducibility

🔍 Issue: More than 80% of the SER papers do not release code and therefore might not be able to reproduce the results

🛠️ Approach: We develope a codebase supporting 15 self-supervsied learning models for SER. We release the source code for model training and evaluation. 👉 Code

🚨 Issue3: Non-standard Data Partition

🔍 Issue: In most SER datasets, there's no standard data partitions. This causes high potential for data leakage problem. Studies employing the partitions wtih data leakage problem tento achieve more than 4% performance improvment.

🛠️ Approach: We created the standardized dataset partition for 6 open-sourced SER dataset and address the potential data leakage issues. 👉 Standard Dataset Partition

Leaderboard

1	Upstream	#Params (M)	Average	IMPROV (P)	CREMA‐D	POD (P)	B‐POD (P)	IEMOCAP	NNIME	IMPROV (S)	POD (S)	B‐POD (S)
2	XLS-R-1B	965	0.384	0.552	0.676	0.331	0.266	0.329	0.209	0.422	0.384	0.283
3	WavLM	317	0.383	0.559	0.673	0.350	0.252	0.336	0.209	0.430	0.369	0.272
4	Hubert	317	0.383	0.553	0.675	0.342	0.262	0.337	0.197	0.427	0.383	0.274
5	W2V2 R	317	0.379	0.555	0.672	0.331	0.251	0.339	0.196	0.433	0.363	0.269
6	Data2Vec-A	313	0.373	0.536	0.659	0.329	0.254	0.331	0.188	0.414	0.378	0.270
7	DeCoAR 2	90	0.362	0.512	0.646	0.308	0.256	0.320	0.187	0.405	0.353	0.274
8	W2V2	317	0.359	0.469	0.669	0.321	0.255	0.306	0.178	0.396	0.353	0.281
9	APC	4	0.350	0.497	0.608	0.298	0.249	0.316	0.186	0.389	0.340	0.266
10	VQ-APC	5	0.346	0.497	0.603	0.296	0.246	0.312	0.181	0.389	0.331	0.259
11	TERA	21	0.345	0.493	0.596	0.295	0.253	0.308	0.193	0.385	0.337	0.249
12	W2V	33	0.342	0.448	0.612	0.300	0.246	0.304	0.188	0.387	0.336	0.258
13	Mockingjay	85	0.336	0.485	0.576	0.275	0.244	0.308	0.185	0.379	0.318	0.253
14	NPC	19	0.331	0.470	0.570	0.274	0.240	0.304	0.172	0.364	0.333	0.256
15	VQ-W2V	34	0.331	0.442	0.605	0.292	0.246	0.294	0.156	0.361	0.325	0.260
16	M CPC	2	0.315	0.453	0.529	0.265	0.228	0.285	0.175	0.337	0.318	0.246
17	FBANK	0	0.191	0.305	0.144	0.186	0.199	0.242	0.120	0.184	0.170	0.168

Selected Models Radar Plot

We provide the scripts for plotting the radar plot in our EMO-SUPERB Codebase

Make Contribution 🤠

Submit Your Model's Evaluation Result to The Leaderboard

You can submit your model's evaluation result to the leaderboard. Please follow the instructions:
1. Follow our EMO-SUPERB Codebase on GitHub
2. Run the evaluation script
3. Make a Pull Request (PR) on GitHub.

Contribute Your Data Labling Method

You can contribute your data labeling method to the leaderboard. Please follow the instructions:
1. Make a Pull Request (PR) on GitHub