Search papers, labs, and topics across Lattice.
This paper introduces Capability-Aligned Hierarchical Learning (CAHL), a novel approach that jointly optimizes high-level and low-level policies for tool-augmented LLMs, addressing the misalignment issues that arise when these policies are optimized separately. By employing Reinforcement Learning with Value Regularization (RLVR), CAHL enhances the coordination between the planner and executor, resulting in improved performance on tool-use tasks. Experimental results on benchmarks like API-Bank, BFCL, and the open-ended environment Bamboogle validate the effectiveness of this alignment strategy, showcasing significant advancements in LLM capabilities.
Jointly optimizing high-level and low-level policies can dramatically enhance LLM performance in tool-use tasks, overcoming planner-executor misalignment.
Tool learning enables LLMs to invoke external tools to accomplish tasks. Prior studies have demonstrated the effectiveness of a hierarchical structure: a high-level policy handles global planning and decomposes tasks into manageable sub-tasks, and a low-level policy focuses on invoking tools to solve these sub-tasks. However, these works typically optimize the high-level and low-level policies separately, leading to planner-executor misalignment and limiting LLM performance on tool-use tasks. In this paper, we propose a method called Capability-Aligned Hierarchical Learning (CAHL), which leverages RLVR to jointly optimize both policies, enabling better alignment between the high-level planner and the low-level executor. Experiments on constrained tool-use benchmarks (API-Bank and BFCL) and an open-ended environment (Bamboogle) demonstrate the effectiveness of CAHL.