Search papers, labs, and topics across Lattice.
This paper addresses the limitations of categorical and dimensional labels in Speech Emotion Recognition (SER) by proposing to represent emotions using color attributes (hue, saturation, value). The authors annotated an emotional speech corpus with color attributes via crowdsourcing and built regression models using machine learning and deep learning techniques. The results demonstrate the relationship between color attributes and emotions in speech, and show that multitask learning of color attribute regression and emotion classification improves performance.
Ditch emotion categories: Representing speech emotions with color attributes unlocks continuous, interpretable scores and boosts recognition accuracy via multitask learning.
Speech emotion recognition (SER) has traditionally relied on categorical or dimensional labels. However, this technique is limited in representing both the diversity and interpretability of emotions. To overcome this limitation, we focus on color attributes, such as hue, saturation, and value, to represent emotions as continuous and interpretable scores. We annotated an emotional speech corpus with color attributes via crowdsourcing and analyzed them. Moreover, we built regression models for color attributes in SER using machine learning and deep learning, and explored the multitask learning of color attribute regression and emotion classification. As a result, we demonstrated the relationship between color attributes and emotions in speech, and successfully developed color attribute regression models for SER. We also showed that multitask learning improved the performance of each task.