Mar 30, 2026arXiv:2603.28376

Marco DeepResearch: Unlocking Efficient Deep Research Agents via Verification-Centric Design

Bin Zhu, Qianghuai Jia, Tian Lan, Junyang Ren, Feng Gu, Feihu Jiang, Longyue Wang, Zhao Xu, Weihua Luo

AI Summary

Marco DeepResearch improves deep research agent performance by incorporating verification mechanisms at three key stages: QA data synthesis, trajectory construction, and test-time scaling. They enhance QA data by ensuring answer uniqueness and correctness, inject verification patterns into training trajectories, and use the agent itself as a verifier during inference. Experiments show Marco DeepResearch, an 8B-parameter model, outperforms or matches the performance of 30B-parameter agents on challenging benchmarks while using significantly fewer tool calls.

Key Contribution

Verification is the secret sauce: an 8B parameter research agent, fortified with verification mechanisms, can now rival or surpass the performance of 30B parameter agents while drastically reducing computational cost.

Abstract

Deep research agents autonomously conduct open-ended investigations, integrating complex information retrieval with multi-step reasoning across diverse sources to solve real-world problems. To sustain this capability on long-horizon tasks, reliable verification is critical during both training and inference. A major bottleneck in existing paradigms stems from the lack of explicit verification mechanisms in QA data synthesis, trajectory construction, and test-time scaling. Errors introduced at each stage propagate downstream and degrade the overall agent performance. To address this, we present Marco DeepResearch, a deep research agent optimized with a verification-centric framework design at three levels: \textbf{(1)~QA Data Synthesis:} We introduce verification mechanisms to graph-based and agent-based QA synthesis to control question difficulty while ensuring answers are unique and correct; \textbf{(2)~Trajectory Construction:} We design a verification-driven trajectory synthesis method that injects explicit verification patterns into training trajectories; and \textbf{(3)~Test-time scaling:} We use Marco DeepResearch itself as a verifier at inference time and effectively improve performance on challenging questions. Extensive experimental results demonstrate that our proposed Marco DeepResearch agent significantly outperforms 8B-scale deep research agents on most challenging benchmarks, such as BrowseComp and BrowseComp-ZH. Crucially, under a maximum budget of 600 tool calls, Marco DeepResearch even surpasses or approaches several 30B-scale agents, like Tongyi DeepResearch-30B.

Eval Frameworks & Benchmarks Reasoning & Chain-of-Thought Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References44

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Marco DeepResearch: Unlocking Efficient Deep Research Agents via Verification-Centric Design

Related Papers