Search papers, labs, and topics across Lattice.
This paper investigates the impact of speaker identity leakage on speech-based depression detection using the DAIC-WOZ dataset. They introduce a data-splitting strategy to control speaker overlap between training and test sets while maintaining a constant training size, and evaluate the performance of three models. Results demonstrate that models heavily rely on speaker identity cues, leading to a significant performance drop when evaluated on unseen speakers, even with adversarial training.
Depression detection models may be learning *who* is speaking, not *how* depression manifests in speech, inflating reported accuracy.
This study investigates whether speech-based depression detection models learn depression-related acoustic biomarkers or instead rely on speaker identity cues. Using the DAIC-WOZ dataset, we propose a data-splitting strategy that controls speaker overlap between training and test sets while keeping the training size constant, and evaluate three models of varying complexity. Results show that speaker overlap significantly boosts performance, whereas accuracy drops sharply on unseen speakers. Even with a Domain-Adversarial Neural Network, a substantial performance gap remains. These findings indicate that depression-related features extracted by current speech models are highly entangled with speaker identity. Conventional evaluation protocols may therefore overestimate generalization and clinical utility, highlighting the need for strictly speaker-independent evaluation.