NJUSCUUSTCDec 29, 2025arXiv:2512.23611

Close the Loop: Synthesizing Infinite Tool-Use Data via Multi-Agent Role-Playing

Yuwen Li, Wei Zhang, Ze-Jun Huang, Mason Yang, Jiajun Wu, Shawn Guo, Huahao Hu, Lingyi Sun, Jian Yang, Mingjie Tang, Byran Dai

AI Summary

The paper introduces InfTool, a multi-agent framework comprising a User Simulator, Tool-Calling Assistant, and MCP Server, designed to autonomously generate tool-use trajectories from raw API specifications. InfTool closes the loop by training a model using Group Relative Policy Optimization (GRPO) with gated rewards on the synthesized data, iteratively improving the model's ability to generate higher-quality training data. Experiments on the Berkeley Function-Calling Leaderboard (BFCL) show that InfTool significantly improves a 32B model's accuracy from 19.8% to 70.9%, surpassing larger models and rivaling Claude-Opus, using only synthetic data.

Key Contribution

A 32B model trained entirely on synthetic data from InfTool outperforms models 10x larger on tool use, rivaling even Claude-Opus.

Abstract

Enabling Large Language Models (LLMs) to reliably invoke external tools remains a critical bottleneck for autonomous agents. Existing approaches suffer from three fundamental challenges: expensive human annotation for high-quality trajectories, poor generalization to unseen tools, and quality ceilings inherent in single-model synthesis that perpetuate biases and coverage gaps. We introduce InfTool, a fully autonomous framework that breaks these barriers through self-evolving multi-agent synthesis. Given only raw API specifications, InfTool orchestrates three collaborative agents (User Simulator, Tool-Calling Assistant, and MCP Server) to generate diverse, verified trajectories spanning single-turn calls to complex multi-step workflows. The framework establishes a closed loop: synthesized data trains the model via Group Relative Policy Optimization (GRPO) with gated rewards, the improved model generates higher-quality data targeting capability gaps, and this cycle iterates without human intervention. Experiments on the Berkeley Function-Calling Leaderboard (BFCL) demonstrate that InfTool transforms a base 32B model from 19.8% to 70.9% accuracy (+258%), surpassing models 10x larger and rivaling Claude-Opus, and entirely from synthetic data without human annotation.

Data Curation & Synthetic Data RLHF & Preference Learning Tool Use & Agents

Citation Metrics

Citations1

Influential citations0

References26

Year2025

VenuearXiv.org

Related Papers

Finding related papers...

Search

Close the Loop: Synthesizing Infinite Tool-Use Data via Multi-Agent Role-Playing

Related Papers