Search papers, labs, and topics across Lattice.
MuSEAgent is introduced, a multimodal reasoning agent that learns from stateful experiences by abstracting interaction data into atomic decision experiences using hindsight reasoning. These experiences are stored in a quality-filtered bank and retrieved at inference time using policy-driven retrieval. Experiments show MuSEAgent outperforms trajectory-level retrieval baselines on visual perception and multimodal reasoning tasks, demonstrating the value of stateful experience modeling.
Forget trajectory-level rollouts: MuSEAgent learns faster and reasons better by distilling past interactions into reusable, state-aware decision experiences.
Research agents have recently achieved significant progress in information seeking and synthesis across heterogeneous textual and visual sources. In this paper, we introduce MuSEAgent, a multimodal reasoning agent that enhances decision-making by extending the capabilities of research agents to discover and leverage stateful experiences. Rather than relying on trajectory-level retrieval, we propose a stateful experience learning paradigm that abstracts interaction data into atomic decision experiences through hindsight reasoning. These experiences are organized into a quality-filtered experience bank that supports policy-driven experience retrieval at inference time. Specifically, MuSEAgent enables adaptive experience exploitation through complementary wide- and deep-search strategies, allowing the agent to dynamically retrieve multimodal guidance across diverse compositional semantic viewpoints. Extensive experiments demonstrate that MuSEAgent consistently outperforms strong trajectory-level experience retrieval baselines on both fine-grained visual perception and complex multimodal reasoning tasks. These results validate the effectiveness of stateful experience modeling in improving multimodal agent reasoning.