Search papers, labs, and topics across Lattice.
This paper investigates the alignment between stated motivations and actual research practices in Speech Emotion Recognition (SER). Through a systematic survey of SER literature, the authors identify a disconnect between the appealing goals (e.g., healthcare applications) and the characteristics of commonly used datasets. The study reveals that these datasets often fail to reflect the proposed real-world deployment contexts, highlighting a significant gap that raises ethical concerns.
SER's noble aspirations of voice-activated healthcare are undermined by datasets that bear little resemblance to real-world emotional expression.
Critical analyses of emotion recognition technology have raised ethical concerns around task validity and potential downstream impacts, urging researchers to ensure alignment between their stated motivations and practice. However, these discussions have not adequately influenced or drawn from research on speech emotion recognition (SER). We address this gap by conducting a systematic survey of SER research to uncover what stated motivations drive this work and if they align with the datasets and emotions studied. We find that while SER research identifies appealing goals, such as well-situated voice-activated systems or healthcare applications, commonly-used datasets do not reflect these proposed deployment contexts, thus presenting a gap between motivations and research practices. We argue that such gaps engender ethical concerns, and that SER research should reassert itself with concrete use-cases to prevent misinterpretations, misuse, and downstream harms.