UMichApr 22, 2026arXiv:2604.20658

Cooperative Profiles Predict Multi-Agent LLM Team Performance in AI for Science Workflows

Shivani Kumar, Adarsh Bharathwaj, David Jurgens

AI Summary

This paper benchmarks 35 open-weight LLMs on six behavioral economics games to assess their cooperative tendencies. It finds that performance in these games strongly predicts the effectiveness of LLM teams in collaborative AI-for-Science tasks, such as data analysis and scientific report generation under budget constraints. The study demonstrates that models exhibiting cooperative strategies in the games produce higher-quality scientific reports, suggesting that cooperative disposition is a distinct and measurable property of LLMs.

Key Contribution

LLMs that play nice in behavioral economics games make better AI scientists, suggesting cooperation isn't just about general smarts.

Abstract

Multi-agent systems built from teams of large language models (LLMs) are increasingly deployed for collaborative scientific reasoning and problem-solving. These systems require agents to coordinate under shared constraints, such as GPUs or credit balances, where cooperative behavior matters. Behavioral economics provides a rich toolkit of games that isolate distinct cooperation mechanisms, yet it remains unknown whether a model's behavior in these stylized settings predicts its performance in realistic collaborative tasks. Here, we benchmark 35 open-weight LLMs across six behavioral economics games and show that game-derived cooperative profiles robustly predict downstream performance in AI-for-Science tasks, where teams of LLM agents collaboratively analyze data, build models, and produce scientific reports under shared budget constraints. Models that effectively coordinate games and invest in multiplicative team production (rather than greedy strategies) produce better scientific reports across three outcomes, accuracy, quality, and completion. These associations hold after controlling for multiple factors, indicating that cooperative disposition is a distinct, measurable property of LLMs not reducible to general ability. Our behavioral games framework thus offers a fast and inexpensive diagnostic for screening cooperative fitness before costly multi-agent deployment.

Eval Frameworks & Benchmarks Scientific Discovery & Drug Design Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Cooperative Profiles Predict Multi-Agent LLM Team Performance in AI for Science Workflows

Related Papers