Mar 9, 2026arXiv:2603.07853

SynPlanResearch-R1: Encouraging Tool Exploration for Deep Research with Synthetic Plans

Hansi Zeng, Z. Li, Zoey Li, Chenwei Zhang, Xiaoman Pan, Tao Yang, Fengran Mo, Jiacheng Lin, Xian Li, Jingbo Shang

AI Summary

The paper introduces SynPlanResearch-R1, a framework that synthesizes tool-use trajectories to encourage deeper exploration during the supervised fine-tuning of research agents. This approach addresses the problem of poor exploration behaviors, such as premature termination and biased tool usage, that limit the effectiveness of reinforcement learning with verifiable rewards (RLVR). By providing a strong initialization for RL, SynPlanResearch-R1 improves performance by up to 6.0% on Qwen3-8B and 5.8% on Qwen3-4B backbones across seven multi-hop and open-web benchmarks.

Key Contribution

Jumpstart your research agent: synthetic tool-use plans overcome exploration bottlenecks and boost performance by up to 6% on multi-hop reasoning tasks.

Abstract

Research Agents enable models to gather information from the web using tools to answer user queries, requiring them to dynamically interleave internal reasoning with tool use. While such capabilities can in principle be learned via reinforcement learning with verifiable rewards (RLVR), we observe that agents often exhibit poor exploration behaviors, including premature termination and biased tool usage. As a result, RLVR alone yields limited improvements. We propose SynPlanResearch-R1, a framework that synthesizes tool-use trajectories that encourage deeper exploration to shape exploration during cold-start supervised fine-tuning, providing a strong initialization for subsequent RL. Across seven multi-hop and open-web benchmarks, \framework improves performance by up to 6.0% on Qwen3-8B and 5.8% on Qwen3-4B backbones respectively compared to SOTA baselines. Further analyses of tool-use patterns and training dynamics compared to baselines shed light on the factors underlying these gains. Our code is publicly available at https://github.com/HansiZeng/syn-plan-research.

Eval Frameworks & Benchmarks RLHF & Preference Learning Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References46

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

SynPlanResearch-R1: Encouraging Tool Exploration for Deep Research with Synthetic Plans

Related Papers