Search papers, labs, and topics across Lattice.
The paper introduces TopoCurate, a framework that models interaction topology from multi-trial rollouts to improve tool-use agent training by projecting trajectories into a semantic quotient topology that captures the divergence between effective strategies and failure modes. TopoCurate uses this topology to select trajectories for SFT based on reflective recovery, semantic efficiency, and strategic diversity, and to select tasks for RL based on error branch ratios and strategic heterogeneity. Experiments on BFCLv3 and Tau2 Bench demonstrate that TopoCurate achieves consistent performance gains over SOTA baselines for both SFT and RL.
Forget outcome-based filtering: TopoCurate uses interaction topology to surface informative tool-use trajectories and tasks, boosting SFT and RL performance by up to 6.9%.
Training tool-use agents typically relies on outcome-based filtering: Supervised Fine-Tuning (SFT) on successful trajectories and Reinforcement Learning (RL) on pass-rate-selected tasks. However, this paradigm ignores interaction dynamics: successful trajectories may lack error recovery or exhibit redundancy, while pass rates fail to distinguish structurally informative tasks from trivial ones. We propose \textbf{TopoCurate}, an interaction-aware framework that projects multi-trial rollouts from the same task into a unified semantic quotient topology. By merging equivalent action-observation states, this projection transforms scattered linear trajectories into a structured manifold that explicitly captures how tool invocations and environmental responses drive the divergence between effective strategies and failure modes. Leveraging this representation, we introduce a dual-selection mechanism: for SFT, we prioritize trajectories demonstrating reflective recovery, semantic efficiency, and strategic diversity to mitigate covariate shift and mode collapse; for RL, we select tasks with high error branch ratios and strategic heterogeneity, maximizing gradient Signal-to-Noise Ratio to address vanishing signals in sparse-reward settings. Evaluations on BFCLv3 and Tau2 Bench show that TopoCurate achieves consistent gains of 4.2\% (SFT) and 6.9\% (RL) over state-of-the-art baselines. We will release the code and data soon for further investigations.