March 4 – March 11, 2026

Scaling Laws & Emergent Abilities - Weekly Roundup

12 papers published across 2 labs.

100% acceleration

Selected Labs publishing this week

Google Research1 Stanford HAI1

Top Papers

Mar 11, 2026

Xiangwen Wang +23w ago

Systematic Scaling Analysis of Jailbreak Attacks in Large Language Models

Prompt-based jailbreak attacks aren't just effective, they're shockingly efficient, outperforming optimization-based methods by more effectively navigating the prompt space.

Xiangwen Wang, Ananth Balashankar, Varun Chandrasekaran

Eval Frameworks & Benchmarks Red-Teaming & Adversarial Robustness Scaling Laws & Emergent Abilities

Do-Yeon Kim +33w ago

Efficiency vs Demand in AI Electricity: Implications for Post-AGI Scaling

AI electricity demand won't necessarily explode as AI scales – whether it does or doesn't hinges on sustained efficiency improvements outpacing income-driven demand.

Do-Yeon Kim, Jiseok Ahn, H. Mcjeon +1

Distributed Systems & Hardware Scaling Laws & Emergent Abilities Training Efficiency & Optimization

Mar 10, 2026

Ruihan Xu +23w ago

On the Width Scaling of Neural Optimizers Under Matrix Operator Norms I: Row/Column Normalization and Hyperparameter Transfer

Row-normalized optimizers can match Muon's performance on large language models while being faster in large-token and low-loss regimes, offering a practical alternative for pre-training.

Ruihan Xu, Jiajin Li, Yiping Lu

Architecture Design (Transformers, SSMs, MoE)Scaling Laws & Emergent Abilities Training Efficiency & Optimization

3w ago

Memorization capacity of deep ReLU neural networks characterized by width and depth

Forget parameter counts – the true memorization capacity of deep ReLU networks is fundamentally bounded by the product of squared width and squared depth, $W^2L^2$, scaling linearly with data size.

Xin Yang, Yunfei Yang

Architecture Design (Transformers, SSMs, MoE)Scaling Laws & Emergent Abilities Training Efficiency & Optimization

Isabelle Augenstein3w ago

Understanding the Interplay between LLMs' Utilisation of Parametric and Contextual Knowledge: A keynote at ECIR 2025

Language models often disregard provided context, choosing instead to rely on potentially outdated or conflicting information learned during pre-training, revealing a critical flaw in their knowledge integration.

Isabelle Augenstein

Interpretability & Mechanistic Interp Recommendation & Information Retrieval Scaling Laws & Emergent Abilities

All Papers (12)

Mar 11, 2026

Xiangwen Wang +23w ago

Systematic Scaling Analysis of Jailbreak Attacks in Large Language Models

Prompt-based jailbreak attacks aren't just effective, they're shockingly efficient, outperforming optimization-based methods by more effectively navigating the prompt space.

Xiangwen Wang, Ananth Balashankar, Varun Chandrasekaran

Eval Frameworks & Benchmarks Red-Teaming & Adversarial Robustness Scaling Laws & Emergent Abilities

Do-Yeon Kim +33w ago

Efficiency vs Demand in AI Electricity: Implications for Post-AGI Scaling

AI electricity demand won't necessarily explode as AI scales – whether it does or doesn't hinges on sustained efficiency improvements outpacing income-driven demand.

Do-Yeon Kim, Jiseok Ahn, H. Mcjeon +1

Distributed Systems & Hardware Scaling Laws & Emergent Abilities Training Efficiency & Optimization

Mar 10, 2026

Ruihan Xu +23w ago

On the Width Scaling of Neural Optimizers Under Matrix Operator Norms I: Row/Column Normalization and Hyperparameter Transfer

Row-normalized optimizers can match Muon's performance on large language models while being faster in large-token and low-loss regimes, offering a practical alternative for pre-training.

Ruihan Xu, Jiajin Li, Yiping Lu

Architecture Design (Transformers, SSMs, MoE)Scaling Laws & Emergent Abilities Training Efficiency & Optimization

3w ago

Memorization capacity of deep ReLU neural networks characterized by width and depth

Forget parameter counts – the true memorization capacity of deep ReLU networks is fundamentally bounded by the product of squared width and squared depth, $W^2L^2$, scaling linearly with data size.

Xin Yang, Yunfei Yang

Architecture Design (Transformers, SSMs, MoE)Scaling Laws & Emergent Abilities Training Efficiency & Optimization

Isabelle Augenstein3w ago

Understanding the Interplay between LLMs' Utilisation of Parametric and Contextual Knowledge: A keynote at ECIR 2025

Isabelle Augenstein

Interpretability & Mechanistic Interp Recommendation & Information Retrieval Scaling Laws & Emergent Abilities

Mar 9, 2026

Rajamangala University of Technology3w ago·also Shizuoka University

Are We Winning the Wrong Game? Revisiting Evaluation Practices for Long-Term Time Series Forecasting

Chasing marginal MSE/MAE improvements on leaderboards may be blinding researchers to the real goal of time series forecasting: capturing temporal structure and supporting downstream decisions.

Thanapol Phungtua-eng, Yoshitaka Yamamoto

Eval Frameworks & Benchmarks Scaling Laws & Emergent Abilities

Daniel Kilov3w ago

Emergence is Overrated: AGI as an Archipelago of Experts

Forget elegant compression and unifying principles: AGI might just be a vast, brittle archipelago of specialized modules, mirroring how human experts actually operate.

Daniel Kilov

Architecture Design (Transformers, SSMs, MoE)Scaling Laws & Emergent Abilities

Ning Liao +33w ago

FineRMoE: Dimension Expansion for Finer-Grained Expert with Its Upcycling Approach

FineRMoE achieves 6x higher parameter efficiency, 281x lower prefill latency, and 136x higher decoding throughput compared to strong baselines, demonstrating a significant leap in MoE performance.

Ning Liao, Xiaoxing Wang, Xiaohan Qin +1

Architecture Design (Transformers, SSMs, MoE)Scaling Laws & Emergent Abilities Training Efficiency & Optimization

Mar 8, 2026

Darius Catrina +33w ago

Reverse Distillation: Consistently Scaling Protein Language Model Representations

Protein language models finally scale predictably: Reverse Distillation unlocks consistent gains by distilling large models into nested, Matryoshka-style embeddings guided by smaller, capacity-constrained models.

Darius Catrina, Christian Bepler, Samuel Sledzieski +1

Inference & Quantization Scaling Laws & Emergent Abilities Scientific Discovery & Drug Design

Mar 5, 2026

3w ago

Asymptotic Behavior of Multi--Task Learning: Implicit Regularization and Double Descent Effects

Multi-task learning's generalization boost comes from implicit regularization, effectively postponing the dreaded double descent.

Ayed M. Alrashdi, O. Dhifallah, Oussama Dhifallah +1

Scaling Laws & Emergent Abilities Training Efficiency & Optimization

Google Research3w ago·also Emory, UMass

Scaling Laws for Reranking in Information Retrieval

You can accurately predict the NDCG of a 1B-parameter reranking model by only training models up to 400M parameters, unlocking massive compute savings.

Rahul Seetharaman, Aman Bansal, Hamed Zamani +2

Natural Language Processing Recommendation & Information Retrieval Scaling Laws & Emergent Abilities

Stanford HAI3w ago·also AGI Lab, HKU, HKUST, NUDT +3

Progressive Residual Warmup for Language Model Pretraining

By strategically warming up residual connections layer-by-layer, ProRes unlocks faster and more stable pretraining for language models.

Tianhao Chen, Tianhao Chen, Xin Xu +9

Architecture Design (Transformers, SSMs, MoE)Natural Language Processing Scaling Laws & Emergent Abilities+1

Search

Scaling Laws & Emergent Abilities - Weekly Roundup

Selected Labs publishing this week

Top Papers

All Papers (12)