Google ResearchSFIFeb 18, 2026arXiv:2602.16301

Multi-agent cooperation through in-context co-player inference

Marissa A. Weis, Marissa A. Weis, Maciej Wolczyk, Maciej Wołczyk, Rajai Nasser, Rajai Nasser, R. Saurous, Rif A. Saurous, B. A. Y. Arcas, Blaise Agüera y Arcas, João Sacramento, João Sacramento, Alexander Meulemans, Alexander Meulemans

AI Summary

This paper investigates multi-agent cooperation using sequence model agents trained with decentralized reinforcement learning against a diverse set of co-players. It demonstrates that in-context learning enables agents to develop best-response strategies and adapt to co-player behavior within episodes, effectively acting as a fast timescale learning algorithm. The key finding is that this setup induces a cooperative mechanism driven by vulnerability to extortion, leading to mutual shaping of in-context learning dynamics and the emergence of cooperative behavior.

Key Contribution

Sequence models can learn to cooperate in multi-agent settings simply by training against diverse partners, no explicit meta-learning required.

Abstract

Achieving cooperation among self-interested agents remains a fundamental challenge in multi-agent reinforcement learning. Recent work showed that mutual cooperation can be induced between"learning-aware"agents that account for and shape the learning dynamics of their co-players. However, existing approaches typically rely on hardcoded, often inconsistent, assumptions about co-player learning rules or enforce a strict separation between"naive learners"updating on fast timescales and"meta-learners"observing these updates. Here, we demonstrate that the in-context learning capabilities of sequence models allow for co-player learning awareness without requiring hardcoded assumptions or explicit timescale separation. We show that training sequence model agents against a diverse distribution of co-players naturally induces in-context best-response strategies, effectively functioning as learning algorithms on the fast intra-episode timescale. We find that the cooperative mechanism identified in prior work-where vulnerability to extortion drives mutual shaping-emerges naturally in this setting: in-context adaptation renders agents vulnerable to extortion, and the resulting mutual pressure to shape the opponent's in-context learning dynamics resolves into the learning of cooperative behavior. Our results suggest that standard decentralized reinforcement learning on sequence models combined with co-player diversity provides a scalable path to learning cooperative behaviors.

RLHF & Preference Learning Tool Use & Agents World Models & Planning

Citation Metrics

Citations0

Influential citations0

References42

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Multi-agent cooperation through in-context co-player inference

Related Papers