Mar 16, 2026arXiv:2603.15120

How Attention Shapes Emotion: A Comparative Study of Attention Mechanisms for Speech Emotion Recognition

Marc Casals-Salvador, Federico Costa, Rodolfo Zevallos, Javier Hernando

AI Summary

This paper benchmarks several optimized attention mechanisms (RetNet, LightNet, GSA, FoX, and KDA) for Speech Emotion Recognition (SER) against standard self-attention. The study evaluates these mechanisms on the MSP-Podcast dataset, focusing on the trade-off between recognition accuracy and computational efficiency. Results show that while standard self-attention yields the highest accuracy, optimized attention mechanisms significantly reduce inference latency and memory usage, offering improved scalability for SER systems.

Key Contribution

Optimized attention mechanisms can speed up Speech Emotion Recognition by an order of magnitude, but at the cost of some accuracy compared to standard self-attention.

Abstract

Speech Emotion Recognition (SER) plays a key role in advancing human-computer interaction. Attention mechanisms have become the dominant approach for modeling emotional speech due to their ability to capture long-range dependencies and emphasize salient information. However, standard self-attention suffers from quadratic computational and memory complexity, limiting its scalability. In this work, we present a systematic benchmark of optimized attention mechanisms for SER, including RetNet, LightNet, GSA, FoX, and KDA. Experiments on both MSP-Podcast benchmark versions show that while standard self-attention achieves the strongest recognition performance across test sets, efficient attention variants dramatically improve scalability, reducing inference latency and memory usage by up to an order of magnitude. These results highlight a critical trade-off between accuracy and efficiency, providing practical insights for designing scalable SER systems.

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Speech & Audio

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

How Attention Shapes Emotion: A Comparative Study of Attention Mechanisms for Speech Emotion Recognition

Related Papers