Microsoft ResearchUCLAMar 10, 2025arXiv:2503.07826

Magnet: Multi-turn Tool-use Data Synthesis and Distillation via Graph Translation

Fan Yin, Zifeng Wang, I-Hung Hsu, Jun Yan, Ke Jiang, Yanfei Chen, Jindong Gu, Long T. Le, Kai-Wei Chang, Chen-Yu Lee, Hamid Palangi, Tomas Pfister

AI Summary

The paper introduces Magnet, a framework for synthesizing multi-turn tool-use training data for LLMs by automatically translating function signature paths into query and executable function call sequences. It models function interactions as a graph and uses node operations to build signature paths, then employs context distillation with positive (reference function calls) and negative hints (incorrect function calls) to guide trajectory generation. Fine-tuning a 14B model with Magnet-synthesized data achieves state-of-the-art performance on BFCL-v3 and ToolQuery, outperforming Gemini-1.5-pro-002.

Key Contribution

Forget hand-annotated data: Magnet distills multi-turn tool-use skills into LLMs by automatically generating training trajectories that outperform even Gemini 1.5 Pro.

Abstract

Large language models (LLMs) have exhibited the ability to effectively utilize external tools to address user queries. However, their performance may be limited in complex, multi-turn interactions involving users and multiple tools. To address this, we propose Magnet, a principled framework for synthesizing high-quality training trajectories to enhance the function calling capability of large language model agents in multi-turn conversations with humans. The framework is based on automatic and iterative translations from a function signature path to a sequence of queries and executable function calls. We model the complicated function interactions in multi-turn cases with graph and design novel node operations to build reliable signature paths. Motivated by context distillation, when guiding the generation of positive and negative trajectories using a teacher model, we provide reference function call sequences as positive hints in context and contrastive, incorrect function calls as negative hints. Experiments show that training with the positive trajectories with supervised fine-tuning and preference optimization against negative trajectories, our 14B model, Magnet-14B-mDPO, obtains 68.01 on BFCL-v3 and 73.30 on ToolQuery, surpassing the performance of the teacher model Gemini-1.5-pro-002 by a large margin in function calling.

Data Curation & Synthetic Data Inference & Quantization Tool Use & Agents

Citation Metrics

Citations20

Influential citations1

References32

Year2025

VenueAnnual Meeting of the Association for Computational Linguistics

Related Papers

Finding related papers...

Search

Magnet: Multi-turn Tool-use Data Synthesis and Distillation via Graph Translation

Related Papers