Search papers, labs, and topics across Lattice.
This paper introduces CollabSim, a simulation framework designed to assess the collaborative competence of large language model (LLM) agents in multi-agent systems (MAS). By leveraging a theory-grounded definition of collaborative capabilities and controlled interaction conditions, the framework allows for an in-depth analysis of how agents establish common ground and maintain shared understanding during tasks. Experiments with four different LLMs demonstrate that CollabSim effectively captures the nuances of agent interactions and reveals significant task-dependent performance variations, highlighting the importance of collaborative skills beyond individual task-solving abilities.
LLM agents often struggle not due to a lack of reasoning skills, but because they fail to collaborate effectively, as revealed by the new CollabSim framework.
Multi-agent systems (MAS) built on large language models have shown growing promise, with their effectiveness resting on agents' ability to coordinate through text-based channels much as human teams do. Yet recent study suggests that MAS often falter not because agents lack individual task-solving ability, but because they lack collaborative competence: the capacity to establish common ground, maintain shared task understanding, balance individual and collective incentives, and repair misalignment as interaction unfolds. Decades of research in Computer-Supported Cooperative Work have characterized these requirements for human teams coordinating under constrained communication, yet existing MAS evaluations focus mainly on task outcomes or single-agent proficiency in reasoning, planning, and tool use. To enable a systematic analysis of agents' collaborative competence in MAS, we introduce CollabSim, a configurable simulation framework that combines a theory-grounded definition of collaborative capabilities, controlled manipulation of interaction conditions, and action-level probing of agents' internal states. Experiments across four LLMs show that CollabSim can capture condition effects, separate model performance patterns, and reveal task-dependent effects of agent design.