Mar 29, 2026arXiv:2603.27813

MuSEAgent: A Multimodal Reasoning Agent with Stateful Experiences

Shijian Wang, Jiarui Jin, Runhao Fu, Zexuan Yan, Xingjian Wang, Mengkang Hu, Eric Wang, Xiaoxi Li, Kangning Zhang, Li Yao, Lixu Yao, Wenxiang Jiao, Xuelian Cheng, Yuan Lu, Zongyuan Ge

AI Summary

MuSEAgent is introduced, a multimodal reasoning agent that learns from stateful experiences by abstracting interaction data into atomic decision experiences using hindsight reasoning. These experiences are stored in a quality-filtered bank and retrieved at inference time using policy-driven retrieval. Experiments show MuSEAgent outperforms trajectory-level retrieval baselines on visual perception and multimodal reasoning tasks, demonstrating the value of stateful experience modeling.

Key Contribution

Forget trajectory-level rollouts: MuSEAgent learns faster and reasons better by distilling past interactions into reusable, state-aware decision experiences.

Abstract

Research agents have recently achieved significant progress in information seeking and synthesis across heterogeneous textual and visual sources. In this paper, we introduce MuSEAgent, a multimodal reasoning agent that enhances decision-making by extending the capabilities of research agents to discover and leverage stateful experiences. Rather than relying on trajectory-level retrieval, we propose a stateful experience learning paradigm that abstracts interaction data into atomic decision experiences through hindsight reasoning. These experiences are organized into a quality-filtered experience bank that supports policy-driven experience retrieval at inference time. Specifically, MuSEAgent enables adaptive experience exploitation through complementary wide- and deep-search strategies, allowing the agent to dynamically retrieve multimodal guidance across diverse compositional semantic viewpoints. Extensive experiments demonstrate that MuSEAgent consistently outperforms strong trajectory-level experience retrieval baselines on both fine-grained visual perception and complex multimodal reasoning tasks. These results validate the effectiveness of stateful experience modeling in improving multimodal agent reasoning.

Multimodal Models Reasoning & Chain-of-Thought Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References38

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

MuSEAgent: A Multimodal Reasoning Agent with Stateful Experiences

Related Papers