Search papers, labs, and topics across Lattice.
This paper analyzes the "unreasonable effectiveness" of AI scaling laws, arguing that their predictive power stems from abstracting away implementation details and focusing on "logical compute." It posits that the observed transferability of scaling laws across model families and training regimes arises from this abstraction. The paper further suggests that diminishing returns in scaling are counteracted by continuous efficiency improvements in hardware, algorithms, and systems, framing progress as a race to achieve Moore-like efficiency doublings.
Scaling laws work so well because they capture the essence of computation, not the specifics of implementation, leading to a persistent efficiency arms race.
Classical AI scaling laws, especially for pre-training, describe how training loss decreases with compute in a power-law form. Their effectiveness has a basic and very practical sense: they make progress predictable, albeit at a declining rate. Yet their effectiveness is also unreasonable in two further senses. First, these laws are largely empirical and observational, but they appear repeatedly across model families and increasingly across training-adjacent regimes. Second, despite the diminishing returns they predict, progress in practice has often continued through rapidly improving efficiency, visible for example in falling cost per token. This paper argues that both features arise from the same source: scaling laws are unusually effective because they abstract away from many realization details. The compute variable is best understood as logical compute, an implementation-agnostic notion of model-side work, while the practical burden of scaling depends on how efficiently real resources are converted into that compute. This abstraction helps explain both why the laws travel so well across settings and why they give rise to a persistent efficiency game in hardware, algorithms, and systems. Once efficiency is made explicit, the main practical question becomes how many efficiency doublings are required to keep scaling productive despite diminishing returns. Under that view, diminishing returns are not only a geometric flattening of the loss curve, but also rising pressure for cost reduction, system-level innovation, and the breakthroughs needed to sustain Moore-like efficiency doublings.