Search papers, labs, and topics across Lattice.
This paper introduces ComPASS, a framework for enhancing interactive agents with external tools to provide personalized social support, addressing limitations in empathetic dialogue generation. By designing a suite of user-centric tools and creating ComPASS-Bench, the first benchmark for evaluating personalized social support in LLMs, the authors fine-tune the Qwen3-8B model to improve its performance in delivering substantive companionship. The results show that tool-augmented responses significantly outperform traditional empathetic dialogue, with the ComPASS-Qwen model achieving performance levels comparable to larger models while addressing diverse user needs more effectively.
Tool-augmented responses can outperform traditional empathetic dialogue, transforming how agents provide personalized social support.
Developing compassionate interactive systems requires agents to not only understand user emotions but also provide diverse, substantive support. While recent works explore empathetic dialogue generation, they remain limited in response form and content, struggling to satisfy diverse needs across users and contexts. To address this, we explore empowering agents with external tools to execute diverse actions. Grounded in the psychological concept of "social support", this paradigm delivers substantive, human-like companionship. Specifically, we first design a dozen user-centric tools simulating various multimedia applications, which can cover different types of social support behaviors in human-agent interaction scenarios. We then construct ComPASS-Bench, the first personalized social support benchmark for LLM-based agents, via multi-step automated synthesis and manual refinement. Based on ComPASS-Bench, we further synthesize tool use records to fine-tune the Qwen3-8B model, yielding a task-specific ComPASS-Qwen. Comprehensive evaluations across two settings reveal that while the evaluated LLMs can generate valid tool-calling requests with high success rates, significant gaps remain in final response quality. Moreover, tool-augmented responses achieve better overall performance than directly producing conversational empathy. Notably, our trained ComPASS-Qwen demonstrates substantial improvements over its base model, achieving comparable performance to several large-scale models. Our code and data are available at https://github.com/hzp3517/ComPASS.