Search papers, labs, and topics across Lattice.
This paper introduces a symptom-guided cross-attention mechanism to estimate depression severity from speech by aligning PHQ-8 questionnaire items with emotion-aware speech representations. The approach incorporates a learnable symptom-specific parameter to adaptively control the sharpness of attention distributions, accounting for variations in symptom expression over time. Experiments on the EDAIC dataset demonstrate improved performance compared to existing methods and reveal that the model attends to utterances containing cues related to multiple depressive symptoms, enhancing interpretability.
By explicitly modeling symptom-specific information within speech, this approach surpasses existing depression detection methods and offers clinically relevant, interpretable insights into which speech segments correlate with specific depressive symptoms.
Depression manifests through a diverse set of symptoms such as sleep disturbance, loss of interest, and concentration difficulties. However, most existing works treat depression prediction either as a binary label or an overall severity score without explicitly modeling symptom-specific information. This limits their ability to provide symptom-level analysis relevant to clinical screening. To address this, we propose a symptom-specific and clinically inspired framework for depression severity estimation from speech. Our approach uses a symptom-guided cross-attention mechanism that aligns PHQ-8 questionnaire items with emotion-aware speech representations to identify which segments of a participant's speech are more important to each symptom. To account for differences in how symptoms are expressed over time, we introduce a learnable symptom-specific parameter that adaptively controls the sharpness of attention distributions. Our results on EDAIC, a standard clinical-style dataset, demonstrate improved performance outperforming prior works. Further, analyzing the attention distributions showed that higher attention is assigned to utterances containing cues related to multiple depressive symptoms, highlighting the interpretability of our approach. These findings outline the importance of symptom-guided and emotion-aware modeling for speech-based depression screening.