Search papers, labs, and topics across Lattice.
This paper introduces LoopCoder-v2, a family of 7B Parallel Loop Transformers (PLT) that optimize test-time computation by employing a gain-cost analysis for loop-count selection. By training on 18 trillion tokens, the authors find that a two-loop configuration significantly enhances performance across various benchmarks, including code generation and reasoning, while additional loops lead to diminishing returns and reduced representational diversity. The study reveals a non-monotonic relationship between loop count and performance, emphasizing the importance of balancing refinement gains against positional mismatches introduced by cross-loop position offsets.
A two-loop configuration in LoopCoder-v2 boosts code generation performance by over 50% compared to a non-looped baseline, while more loops actually hinder results.
Looped Transformers scale latent computation by repeatedly applying shared blocks, but sequential looping increases latency and KV-cache memory with the loop count. Parallel loop Transformers (PLT) alleviate this cost through cross-loop position offsets (CLP) and shared-KV gated sliding-window attention, making loop count a practical design choice. We therefore study PLT loop-count selection through a gain--cost view: an extra loop may refine representations, but CLP also introduces a positional mismatch at each loop boundary. We instantiate this study by training LoopCoder-v2, a family of 7B PLT coders with different loop counts, from scratch on 18T tokens, followed by matched instruction tuning and evaluation. Empirically, the two-loop variant delivers broad gains over the non-looped baseline across code generation, code reasoning, agentic software engineering, and tool-use benchmarks, improving SWE-bench Verified from 43.0 to 64.4 points and Multi-SWE from 14.0 to 31.0 points. In contrast, variants with three or more loops regress, revealing a strongly non-monotonic loop-count effect. Our diagnostics show that loop 2 provides the main productive refinement, while later loops yield diminishing, oscillatory updates and reduced representational diversity. Because the CLP-induced mismatch remains roughly fixed as refinement gains shrink, the offset cost increasingly dominates. This gain--cost trade-off explains PLT's saturation at two loops and provides diagnostics for loop-count selection.