MacquarieFeb 26, 2026arXiv:2602.22675

Search More, Think Less: Rethinking Long-Horizon Agentic Search for Efficiency and Generalization

Qianben Chen, Qianben Chen, Tianrui Qin, Tianrui Qin, King Zhu, Qiexiang Wang, Qiexiang Wang, Chengjun Yu, Cheng Yu, Shu Xu, Shunmiao Xu, Jiaqi Wu, Jiaqi Wu, Jiayu Zhang, Jiayu Zhang, Xinpeng Liu, Xinpeng Liu, Xin Gui, Xin Gui, Jingyi Cao, Jingyi Cao, Piaohong Wang, Piaohong Wang, Dingfeng Shi, Dingfeng Shi, He Zhu, He Zhu, Tiannan Wang, Tiannan Wang, Yuqing Wang, Yuqing Wang, Maojia Song, Maojia Song, Tianyu Zheng, Tianyu Zheng, Ge Zhang, Ge Zhang, Jian Yang, Jian Yang, Jiaheng Liu, Jiaheng Liu, MingHao Liu, Minghao Liu, Y. Jiang, Yuchen Eleanor Jiang, Wangchunshu Zhou, Wangchunshu Zhou

AI Summary

The paper introduces Search More, Think Less (SMTL), a framework for long-horizon agentic search that prioritizes parallel evidence acquisition over deep sequential reasoning to improve efficiency and generalization. SMTL employs a unified data synthesis pipeline to train an end-to-end agent across diverse search tasks, using both supervised fine-tuning and reinforcement learning. The resulting agent achieves state-of-the-art performance on benchmarks like BrowseComp, GAIA, Xbench, and DeepResearch Bench, while significantly reducing reasoning steps compared to prior methods like Mirothinker-v1.0.

Key Contribution

Ditch the deep thought: this new agentic search framework slashes reasoning steps by 70% while boosting accuracy by prioritizing parallel evidence gathering.

Abstract

Recent deep research agents primarily improve performance by scaling reasoning depth, but this leads to high inference cost and latency in search-intensive scenarios. Moreover, generalization across heterogeneous research settings remains challenging. In this work, we propose \emph{Search More, Think Less} (SMTL), a framework for long-horizon agentic search that targets both efficiency and generalization. SMTL replaces sequential reasoning with parallel evidence acquisition, enabling efficient context management under constrained context budgets. To support generalization across task types, we further introduce a unified data synthesis pipeline that constructs search tasks spanning both deterministic question answering and open-ended research scenarios with task appropriate evaluation metrics. We train an end-to-end agent using supervised fine-tuning and reinforcement learning, achieving strong and often state of the art performance across benchmarks including BrowseComp (48.6\%), GAIA (75.7\%), Xbench (82.0\%), and DeepResearch Bench (45.9\%). Compared to Mirothinker-v1.0, SMTL with maximum 100 interaction steps reduces the average number of reasoning steps on BrowseComp by 70.7\%, while improving accuracy.

Reasoning & Chain-of-Thought Recommendation & Information Retrieval Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References60

Year2026

VenueN/A

Related Papers

Finding related papers...