Search papers, labs, and topics across Lattice.
The paper introduces TherapyProbe, a methodology leveraging adversarial multi-agent simulation to identify relational safety failures in mental health chatbots, focusing on interaction patterns across conversations. TherapyProbe uses open-source models to generate conversation trajectories and surfaces failures like validation spirals and empathy fatigue. The authors translate these failures into a Safety Pattern Library of 23 failure archetypes, providing design recommendations for developers, clinicians, and policymakers.
Uncovered: mental health chatbots can fall into dangerous "validation spirals" or "empathy fatigue" patterns, revealing critical relational safety flaws missed by current single-turn evaluations.
As mental health chatbots proliferate to address the global treatment gap, a critical question emerges: How do we design for relational safety the quality of interaction patterns that unfold across conversations rather than the correctness of individual responses? Current safety evaluations assess single-turn crisis responses, missing the therapeutic dynamics that determine whether chatbots help or harm over time. We introduce TherapyProbe, a design probe methodology that generates actionable design knowledge by systematically exploring chatbot conversation trajectories through adversarial multi-agent simulation. Using open-source models, TherapyProbe surfaces relational safety failures interaction patterns like"validation spirals"where chatbots progressively reinforce hopelessness, or"empathy fatigue"where responses become mechanical over turns. Our contribution is translating these failures into a Safety Pattern Library of 23 failure archetypes with corresponding design recommendations. We contribute: (1) a replicable methodology requiring no API costs, (2) a clinically-grounded failure taxonomy, and (3) design implications for developers, clinicians, and policymakers.