😀 😡 😐 😱 EMO SUPERB 🙄 🤢 😭 😯

EMOtion Speech Universal PERformance Benchmark

An In-depth Look at Speech Emotion Recognition:

Fundamental Problems / Benchmark / Open-source


  • EMO-SUPERB
  • Typed Description
  • Standard Dataset Partition
  • Code

Introduction

Speech emotion recognition (SER) is a pivotal technology for human-computer interaction systems. However, 80.77% of SER papers yield results that cannot be reproduced. We develop EMO-SUPERB, shorted for EMOtion Speech Universal PERformance Benchmark, aims at enhancing open-source initiatives for SER. EMO-SUPERB includes a user-friendly codebase to leverage 15 state-of-the-art speech self-supervised learning models (SSLMs) for exhaustive evaluation across six open-source SER datasets. EMO-SUPERB streamlines result sharing via an online leaderboard, fostering collaboration within a community-driven benchmark and thereby enhancing the development of SER. On average, 2.58% annotations are annotated using natural language. SER relies on classification models and is unable to process natural languages, leading to the discarding of these valuable annotations. We prompt ChatGPT to mimic annotators, comprehend natural language annotations, and subsequently re-label the data. By utilizing labels generated by ChatGPT, we consistently achieve an average relative gain of 3.08% across all settings.

Fundamental Problems in SER

🚨 Issue1: Typed Description


🔍 Issue: Annotators prefer using typed descriptions (e.g., "Slightly Angry, calm") to annotate emotional speech. However, current SER models solely utilize hard labels for classification instead of natural language.

🛠️ Approach: We employ ChatGPT to mimic annotators, comprehend typed descriptions, and relabel the emotional speech data. 👉 Typed Description

🚨 Issue2: Reproducibility


🔍 Issue: More than 80% of the SER papers do not release code and therefore might not be able to reproduce the results

🛠️ Approach: We develope a codebase supporting 15 self-supervsied learning models for SER. We release the source code for model training and evaluation. 👉 Code

🚨 Issue3: Non-standard Data Partition


🔍 Issue: In most SER datasets, there's no standard data partitions. This causes high potential for data leakage problem. Studies employing the partitions wtih data leakage problem tento achieve more than 4% performance improvment.

🛠️ Approach: We created the standardized dataset partition for 6 open-sourced SER dataset and address the potential data leakage issues. 👉 Standard Dataset Partition

Leaderboard

1
Upstream #Params (M) Average IMPROV (P) CREMA‐D POD (P) B‐POD (P) IEMOCAP NNIME IMPROV (S) POD (S) B‐POD (S)
2
XLS-R-1B 965 0.384 0.552 0.676 0.331 0.266 0.329 0.209 0.422 0.384 0.283
3
WavLM 317 0.383 0.559 0.673 0.350 0.252 0.336 0.209 0.430 0.369 0.272
4
Hubert 317 0.383 0.553 0.675 0.342 0.262 0.337 0.197 0.427 0.383 0.274
5
W2V2 R 317 0.379 0.555 0.672 0.331 0.251 0.339 0.196 0.433 0.363 0.269
6
Data2Vec-A
313 0.373 0.536 0.659 0.329 0.254 0.331 0.188 0.414 0.378 0.270
7
DeCoAR 2 90 0.362 0.512 0.646 0.308 0.256 0.320 0.187 0.405 0.353 0.274
8
W2V2 317 0.359 0.469 0.669 0.321 0.255 0.306 0.178 0.396 0.353 0.281
9
APC 4 0.350 0.497 0.608 0.298 0.249 0.316 0.186 0.389 0.340 0.266
10
VQ-APC 5 0.346 0.497 0.603 0.296 0.246 0.312 0.181 0.389 0.331 0.259
11
TERA 21 0.345 0.493 0.596 0.295 0.253 0.308 0.193 0.385 0.337 0.249
12
W2V 33 0.342 0.448 0.612 0.300 0.246 0.304 0.188 0.387 0.336 0.258
13
Mockingjay 85 0.336 0.485 0.576 0.275 0.244 0.308 0.185 0.379 0.318 0.253
14
NPC 19 0.331 0.470 0.570 0.274 0.240 0.304 0.172 0.364 0.333 0.256
15
VQ-W2V 34 0.331 0.442 0.605 0.292 0.246 0.294 0.156 0.361 0.325 0.260
16
M CPC 2 0.315 0.453 0.529 0.265 0.228 0.285 0.175 0.337 0.318 0.246
17
FBANK 0 0.191 0.305 0.144 0.186 0.199 0.242 0.120 0.184 0.170 0.168

Selected Models Radar Plot

We provide the scripts for plotting the radar plot in our EMO-SUPERB Codebase

Make Contribution 🤠


Submit Your Model's Evaluation Result to The Leaderboard

You can submit your model's evaluation result to the leaderboard. Please follow the instructions:
1. Follow our EMO-SUPERB Codebase on GitHub
2. Run the evaluation script
3. Make a Pull Request (PR) on GitHub.


Contribute Your Data Labling Method

You can contribute your data labeling method to the leaderboard. Please follow the instructions:
1. Make a Pull Request (PR) on GitHub