Yelong Shen

Microsoft

Papers on Lattice

Total citations

Topics

Publication activitypapers/week, last 8 weeks

Research focus

Architecture Design (Transformers, SSMs, MoE) (3)Training Efficiency & Optimization (3)Scaling Laws & Emergent Abilities (2)Recommendation & Information Retrieval (1)

Frequent co-authors

LiLiang Ren (3)Weizhu Chen (2)Zeyi Huang (1)Xuehai He (1)

Papers (3)

May 26, 2026

Microsoft Research2w ago·also UW, CUHK, HKUST, UW-Madison

Latent Recurrent Transformer: Architecture Exploration, Training Strategies, and Scaling Behavior

Recurrent memory can be added to transformers at scale with minimal parameter overhead and no performance penalty by reusing existing hidden states and training with interleaved parallel updates.

Zeyi Huang, Xuehai He, LiLiang Ren +7

Architecture Design (Transformers, SSMs, MoE)Scaling Laws & Emergent Abilities Training Efficiency & Optimization

Apr 15, 2026

Zichong Li +5Apr 15, 2026·also Microsoft Research

Shuffle the Context: RoPE-Perturbed Self-Distillation for Long-Context Adaptation

LLMs can be made far more robust to the position of information in long contexts by simply shuffling the context during fine-tuning.

Zichong Li, Chen Liang, LiLiang Ren +3

Architecture Design (Transformers, SSMs, MoE)Recommendation & Information Retrieval Training Efficiency & Optimization

Mar 30, 2026

LiLiang Ren +2Mar 30, 2026·also Microsoft Research

Rethinking Language Model Scaling under Transferable Hypersphere Optimization

Forget painstaking hyperparameter tuning: this hypersphere parameterization lets you transfer a single learning rate across model sizes, depths, and even MoE architectures, slashing compute costs by 1.58x.

LiLiang Ren, Yelong Shen, Weizhu Chen