Search papers, labs, and topics across Lattice.
This paper introduces a theory-driven framework for evaluating counselor responses to client resistance in text-based therapy, decomposing responses into four communication mechanisms. They create and release an expert-annotated dataset of real-world counseling excerpts with ratings and rationales. Fine-tuning Llama-3.1-8B-Instruct, the authors achieve superior performance in evaluating communication mechanisms and generating explanations compared to GPT-4o and Claude-3.5-Sonnet, and demonstrate that counselor performance improves with AI-generated feedback.
Llama-3 fine-tuned on a new dataset of counseling interactions smokes GPT-4o and Claude-3.5-Sonnet at evaluating and explaining effective responses to client resistance.
Effectively addressing client resistance is a sophisticated clinical skill in psychological counseling, yet practitioners often lack timely and scalable supervisory feedback to refine their approaches. Although current NLP research has examined overall counseling quality and general therapeutic skills, it fails to provide granular evaluations of high-stakes moments where clients exhibit resistance. In this work, we present a comprehensive pipeline for the multi-dimensional evaluation of human counselors' interventions specifically targeting client resistance in text-based therapy. We introduce a theory-driven framework that decomposes counselor responses into four distinct communication mechanisms. Leveraging this framework, we curate and share an expert-annotated dataset of real-world counseling excerpts, pairing counselor-client interactions with professional ratings and explanatory rationales. Using this data, we perform full-parameter instruction tuning on a Llama-3.1-8B-Instruct backbone to model fine-grained evaluative judgments of response quality and generate explanations underlying. Experimental results show that our approach can effectively distinguish the quality of different communication mechanisms (77-81% F1), substantially outperforming GPT-4o and Claude-3.5-Sonnet (45-59% F1). Moreover, the model produces high-quality explanations that closely align with expert references and receive near-ceiling ratings from human experts (2.8-2.9/3.0). A controlled experiment with 43 counselors further confirms that receiving these AI-generated feedback significantly improves counselors' ability to respond effectively to client resistance.