May 6, 2026arXiv:2605.04906

Strat-Reasoner: Reinforcing Strategic Reasoning of LLMs in Multi-Agent Games

Yidong He, Yutao Lai, Pengxu Yang, Jiarui Gan, Jiexin Wang, Yi Cai, Mengchen Zhao

AI Summary

Strat-Reasoner is introduced, an RL-based framework that enhances LLMs' strategic reasoning in multi-agent games by recursively integrating other agents' reasoning processes into an agent's own. A centralized Chain-of-Thought comparison module provides reward signals for intermediate reasoning steps, and a hybrid advantage with group-relative RL optimizes the LLM policy. Experiments demonstrate a 22.1% average performance improvement across various multi-agent games, showcasing enhanced strategic abilities.

Key Contribution

LLMs can learn to play multi-agent games far better by recursively modeling the reasoning of other players, leading to a 22% performance boost.

Abstract

While Large Language Models (LLMs) excel in certain reasoning tasks, they struggle in multi-agent games where the final outcome depends on the joint strategies of all agents. In multi-agent games, the non-stationarity of other agents brings significant challenges on the evaluation of the reasoning process and the credit assignment over multiple reasoning steps. Existing single-agent reinforcement learning (RL) approaches and their multi-agent extensions fail to address these challenges as they do not incorporate other agents in the reasoning process. In this work, we propose Strat-Reasoner, a novel RL-based framework that improves LLMs' strategic reasoning ability in multi-agent games. We introduce a novel recursive reasoning paradigm where an agent's reasoning also integrates other agents' reasoning processes. To provide effective reward signals for the intermediate reasoning sequences, we employ a centralized Chain-of-Thought (CoT) comparison module to evaluate the reasoning quality. Finally, we compute an accurate hybrid advantage and develop a group-relative RL approach to optimize the LLM policy. Experimental results show that Strat-Reasoner substantially improves strategic abilities of underlying LLMs, achieving 22.1\% average performance improvements across various multi-agent games.

Reasoning & Chain-of-Thought RLHF & Preference Learning Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Strat-Reasoner: Reinforcing Strategic Reasoning of LLMs in Multi-Agent Games

Related Papers