March 11 – March 18, 2026

Scaling Laws & Emergent Abilities - Weekly Roundup

16 papers published across 2 labs.

100% acceleration

Selected Labs publishing this week

Top Papers

Mar 18, 2026

2w ago·also Charlie

The 1/W Law: An Analytical Study of Context-Length Routing Topology and GPU Generation Gains for LLM Inference Energy Efficiency

Forget buying new GPUs – clever context-length routing can boost your LLM inference energy efficiency by 2.5x, dwarfing the 1.7x gain from upgrading to a B200.

Huamin Chen, Xunzhuo Liu, Yuhan Liu +3

Distributed Systems & Hardware Inference & Quantization Scaling Laws & Emergent Abilities

Xuyang Cao +82w ago

ShapleyLaw: A Game-Theoretic Approach to Multilingual Scaling Laws

Optimizing multilingual training? Shapley values reveal the hidden cross-lingual transfer effects that current scaling laws miss, leading to better language mixture ratios.

Xuyang Cao, Qianying Liu, Chuan Xiao +6

Data Curation & Synthetic Data Natural Language Processing Scaling Laws & Emergent Abilities

Mar 17, 2026

2w ago

FEAT: A Linear-Complexity Foundation Model for Extremely Large Structured Data

Forget quadratic attention: FEAT achieves state-of-the-art performance on structured data with linear complexity and 40x faster inference.

Zhenghang Song, Tang Qian, Lu Chen +7

Architecture Design (Transformers, SSMs, MoE)Scaling Laws & Emergent Abilities Training Efficiency & Optimization

2w ago

MDM-Prime-v2: Binary Encoding and Index Shuffling Enable Compute-optimal Scaling of Diffusion Language Models

Masked diffusion language models can now achieve 21.8x better compute efficiency than autoregressive models, thanks to binary encoding and index shuffling.

Chen-Hao Chao, Weiye Sun, Wei-Fang Sun +3

Architecture Design (Transformers, SSMs, MoE)Scaling Laws & Emergent Abilities Training Efficiency & Optimization

Mar 16, 2026

CMU ML2w ago·also Together

Mamba-3: Improved Sequence Modeling using State Space Principles

Mamba-3 delivers a 1.8 point accuracy boost over competing models in downstream language tasks, proving that SSM-inspired techniques can unlock substantial performance gains without sacrificing inference efficiency.

Aakash Lahoti, Kevin Y. Li, Caitlin Wang +4

Architecture Design (Transformers, SSMs, MoE)Inference & Quantization Scaling Laws & Emergent Abilities

All Papers (16)

Mar 18, 2026

2w ago·also Charlie

The 1/W Law: An Analytical Study of Context-Length Routing Topology and GPU Generation Gains for LLM Inference Energy Efficiency

Forget buying new GPUs – clever context-length routing can boost your LLM inference energy efficiency by 2.5x, dwarfing the 1.7x gain from upgrading to a B200.

Huamin Chen, Xunzhuo Liu, Yuhan Liu +3

Distributed Systems & Hardware Inference & Quantization Scaling Laws & Emergent Abilities

Xuyang Cao +82w ago

ShapleyLaw: A Game-Theoretic Approach to Multilingual Scaling Laws

Optimizing multilingual training? Shapley values reveal the hidden cross-lingual transfer effects that current scaling laws miss, leading to better language mixture ratios.

Xuyang Cao, Qianying Liu, Chuan Xiao +6

Data Curation & Synthetic Data Natural Language Processing Scaling Laws & Emergent Abilities

Mar 17, 2026

2w ago

FEAT: A Linear-Complexity Foundation Model for Extremely Large Structured Data

Forget quadratic attention: FEAT achieves state-of-the-art performance on structured data with linear complexity and 40x faster inference.

Zhenghang Song, Tang Qian, Lu Chen +7

Architecture Design (Transformers, SSMs, MoE)Scaling Laws & Emergent Abilities Training Efficiency & Optimization

2w ago

MDM-Prime-v2: Binary Encoding and Index Shuffling Enable Compute-optimal Scaling of Diffusion Language Models

Masked diffusion language models can now achieve 21.8x better compute efficiency than autoregressive models, thanks to binary encoding and index shuffling.

Chen-Hao Chao, Weiye Sun, Wei-Fang Sun +3

Architecture Design (Transformers, SSMs, MoE)Scaling Laws & Emergent Abilities Training Efficiency & Optimization

Mar 16, 2026

CMU ML2w ago·also Together

Mamba-3: Improved Sequence Modeling using State Space Principles

Aakash Lahoti, Kevin Y. Li, Caitlin Wang +4

Architecture Design (Transformers, SSMs, MoE)Inference & Quantization Scaling Laws & Emergent Abilities

Tsinghua AI2w ago

Why the Valuable Capabilities of LLMs Are Precisely the Unexplainable Ones

LLMs' true power lies in the "unexplainable" – capabilities that exceed rule-based systems, challenging the pursuit of full interpretability.

Quan Cheng

Interpretability & Mechanistic Interp Scaling Laws & Emergent Abilities

2w ago

Deriving Hyperparameter Scaling Laws via Modern Optimization Theory

Forget trial-and-error: this paper derives hyperparameter scaling laws for modern optimizers directly from convergence bounds, potentially automating and optimizing the hyperparameter tuning process.

Egor Shulgin, Dimitri von Rutte, Tianyue H. Zhang +3

Scaling Laws & Emergent Abilities Training Efficiency & Optimization

Mar 15, 2026

Mark Baciak +22w ago

Punctuated Equilibria in Artificial Intelligence: The Institutional Scaling Law and the Speciation of Sovereign AI

Forget scaling laws: smaller, domain-adapted AI systems can mathematically outperform massive generalist models in real-world institutional settings, thanks to a non-monotonic relationship between model size and "institutional fitness."

Mark Baciak, Thomas A. Cellucci, Deanna M. Falkowski

Constitutional AI & AI Ethics Scaling Laws & Emergent Abilities

Mar 12, 2026

CMU ML2w ago·also Petuum

IsoCompute Playbook: Optimally Scaling Sampling Compute for LLM RL

Forget simple scaling laws: the compute-optimal number of parallel rollouts in LLM RL plateaus, revealing distinct mechanisms for easy vs. hard problems.

Zhoujun Cheng, Yutao Xie, Yuxiao Qu +16

RLHF & Preference Learning Scaling Laws & Emergent Abilities Training Efficiency & Optimization

2w ago·also ENS MVA

Language Generation with Replay: A Learning-Theoretic View of Model Collapse

Re-training LLMs on their own generated content can fundamentally limit what they can learn, but only under specific, theoretically-defined conditions related to generation quality.

G. Racca, Michal Valko, M. Valko +1

Data Curation & Synthetic Data Natural Language Processing Scaling Laws & Emergent Abilities

2w ago

Scaling Laws for Educational AI Agents

Forget brute-force scaling: the secret to better educational AI agents lies in carefully structuring their roles, skills, and tools.

Mengsong Wu, Hao Hao, Shuzhen Bi +5

Eval Frameworks & Benchmarks Scaling Laws & Emergent Abilities Tool Use & Agents

J. Vilar +22w ago

Scaling Laws and Paradoxical Metastable States in Nanofilament Entropic Separation

Nanofilaments can paradoxically aggregate due to entropic forces, defying the conventional wisdom that entropy always favors disaggregation at the nanoscale.

J. Vilar, J. M. Rubi, L. Saiz

Scaling Laws & Emergent Abilities Scientific Discovery & Drug Design

Konstantin N. Krestnikov +12w ago

Compression Favors Consistency, Not Truth: When and Why Language Models Prefer Correct Information

Language models seem to prefer truth not because they're seeking it, but because correct information is often easier to compress and more internally consistent.

Konstantin N. Krestnikov, Konstantin Krestnikov

Data Curation & Synthetic Data Inference & Quantization Scaling Laws & Emergent Abilities

Sanchit Pandey2w ago

Can Small Language Models Use What They Retrieve? An Empirical Study of Retrieval Utilization Across Model Scale

RAG with small language models (<8B parameters) can be a net negative, as they often ignore retrieved context and even "forget" existing knowledge.

Sanchit Pandey

Open-Source Models & Weights Recommendation & Information Retrieval Scaling Laws & Emergent Abilities

Mar 11, 2026

Xiangwen Wang +23w ago

Systematic Scaling Analysis of Jailbreak Attacks in Large Language Models

Prompt-based jailbreak attacks aren't just effective, they're shockingly efficient, outperforming optimization-based methods by more effectively navigating the prompt space.

Xiangwen Wang, Ananth Balashankar, Varun Chandrasekaran

Eval Frameworks & Benchmarks Red-Teaming & Adversarial Robustness Scaling Laws & Emergent Abilities

Do-Yeon Kim +33w ago

Efficiency vs Demand in AI Electricity: Implications for Post-AGI Scaling

AI electricity demand won't necessarily explode as AI scales – whether it does or doesn't hinges on sustained efficiency improvements outpacing income-driven demand.

Do-Yeon Kim, Jiseok Ahn, H. Mcjeon +1

Distributed Systems & Hardware Scaling Laws & Emergent Abilities Training Efficiency & Optimization

Search

Scaling Laws & Emergent Abilities - Weekly Roundup

Selected Labs publishing this week

Top Papers

All Papers (16)