Search papers, labs, and topics across Lattice.
This paper explores the "Less-Is-More" hypothesis for training LLM agents, finding that fewer, higher-quality training trajectories can outperform training with larger, noisier datasets. They introduce STITCH, a method for filtering low-value tokens and retaining decision-critical information in training trajectories. Experiments across multiple agent frameworks, model scales, and programming languages demonstrate significant performance improvements using STITCH, confirming the effectiveness of the "Less-Is-More" paradigm in agentic tasks.
Forget scaling laws: high-quality training data lets smaller LLMs crush larger ones at agentic coding tasks.
Training effective software engineering agents requires large volumes of task-specific trajectories, incurring substantial data construction costs. Inspired by the "Less-Is-More" hypothesis in mathematical reasoning, we investigate its extension to agentic scenarios and propose an end-to-end training framework that achieves superior agentic capabilities with fewer but higher-quality training trajectories. This is achieved via STITCH (Sliding-memory Trajectory Inference and Task Chunking Heuristic), a coarse-to-fine mechanism that filters low-value noise and retains decision-critical tokens to maximize training signal quality. We conduct experiments across multiple agent frameworks (e.g., mini-SWE-agent, MSWE-agent), model scales (30B to 355B), and multilingual settings (Python, Java, and ArkTS). On SWE-bench Verified, models trained with STITCH achieve up to 63.16% relative improvement over base models. On Multi-SWE-bench (Java), MiniMax-M2.5-STITCH achieves 43.75% with our CodeArts Agent scaffold (+16.67%). On HarmonyOS (ArkTS), GLM-4.7-STITCH improves the compilation pass rate to 61.31% (+43.34%) with less than 1K training trajectories. Our results confirm that the "Less-Is-More" paradigm generalizes effectively to complex agentic tasks across diverse languages and model scales.