Search papers, labs, and topics across Lattice.
The paper introduces TacTool, a novel framework designed to enhance tool selection and invocation in AI agents powered by large language models (LLMs). TacTool leverages multiple LLMs to improve the reliability of tool use, addressing a known limitation even in advanced models like GPT5. Experimental results on Nestful and BFCL v3 benchmarks demonstrate that TacTool outperforms GPT40, achieving a 27% and 3% improvement, respectively.
TacTool significantly boosts LLM-based agent performance on tool-use benchmarks, outperforming even GPT40 by a substantial margin on complex tasks.
Large language models (LLMs) are becoming the centerpiece in the design and deployment of Agentic artificial intelligence (AI) systems. AI agents typically have (a) reasoning ability to analyze and think through the given task, (b) context/memory to remember things in the short-term and long-term, and (c) tools at their disposal to interact with the outside world. While solving the given task, it must decide whether tool use is required; if so, it must then select the appropriate tool and invoke it with the correct parameters. Although LLMs have advanced considerably in recent years, their tool-use capabilities remain limited. Even OpenAI’s most capable model to date, GPT5, continues to struggle with reliable tool usage. In this paper, we propose TacTool, which empowers AI agents with improved tool selection and tool call formulation using different LLMs. We conduct experiments using Nestful and Berkeley Function Calling Leaderboard version 3 (BFCLv3) benchmarks and show that TacTool achieves $\sim 27 \%$ and $\sim 3 \%$ improvement over GPT40 on Nestful and BFCL v3 dataset, respectively.