Artificial Intelligence LaboratoryECNUSYSUApr 1, 2026arXiv:2604.00824

Yet Even Less Is Even Better For Agentic, Reasoning, and Coding LLMs

Yang Ye, Jingyuan Tan, Tianyue Jiang, Ruizhe Ye, Qiankun He, Jiarui Yang, Sicong Liang, Chongjian Yue, Peibai Xu, Lufan Lu, Taotao Qian, Junbao Hu, Yuechan Hao, Ensheng Shi, Yi Hao, Na Fan, Xin Tan, Shuai Yao, Zhiwei Shen, Zongchen Li, Yanlin Wang, Chong Chen

AI Summary

This paper explores the "Less-Is-More" hypothesis for training LLM agents, finding that fewer, higher-quality training trajectories can outperform training with larger, noisier datasets. They introduce STITCH, a method for filtering low-value tokens and retaining decision-critical information in training trajectories. Experiments across multiple agent frameworks, model scales, and programming languages demonstrate significant performance improvements using STITCH, confirming the effectiveness of the "Less-Is-More" paradigm in agentic tasks.

Key Contribution

Forget scaling laws: high-quality training data lets smaller LLMs crush larger ones at agentic coding tasks.

Abstract

Training effective software engineering agents requires large volumes of task-specific trajectories, incurring substantial data construction costs. Inspired by the "Less-Is-More" hypothesis in mathematical reasoning, we investigate its extension to agentic scenarios and propose an end-to-end training framework that achieves superior agentic capabilities with fewer but higher-quality training trajectories. This is achieved via STITCH (Sliding-memory Trajectory Inference and Task Chunking Heuristic), a coarse-to-fine mechanism that filters low-value noise and retains decision-critical tokens to maximize training signal quality. We conduct experiments across multiple agent frameworks (e.g., mini-SWE-agent, MSWE-agent), model scales (30B to 355B), and multilingual settings (Python, Java, and ArkTS). On SWE-bench Verified, models trained with STITCH achieve up to 63.16% relative improvement over base models. On Multi-SWE-bench (Java), MiniMax-M2.5-STITCH achieves 43.75% with our CodeArts Agent scaffold (+16.67%). On HarmonyOS (ArkTS), GLM-4.7-STITCH improves the compilation pass rate to 61.31% (+43.34%) with less than 1K training trajectories. Our results confirm that the "Less-Is-More" paradigm generalizes effectively to complex agentic tasks across diverse languages and model scales.

Code Generation & Program Synthesis Tool Use & Agents Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Yet Even Less Is Even Better For Agentic, Reasoning, and Coding LLMs

Related Papers