Weiran He

Moonshot AI

Papers on Lattice

Total citations

Topics

h-index

Research focus

Architecture Design (Transformers, SSMs, MoE) (2)Distributed Systems & Hardware (1)Inference & Quantization (1)Training Efficiency & Optimization (1)

Frequent co-authors

Xinran Xu (2)Ruoyu Qin (1)Yaoyu Wang (1)Zheming Li (1)

Papers (2)

Apr 16, 2026

Apr 16, 2026·also Tsinghua AI

Prefill-as-a-Service: KVCache of Next-Generation Models Could Go Cross-Datacenter

Decoupling LLM prefill and decode across datacenters is now practical, unlocking independent scaling and resource elasticity, thanks to a system that combines KV-efficient models with intelligent request scheduling.

Ruoyu Qin, Weiran He, Yaoyu Wang +5

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware Inference & Quantization

Mar 16, 2026

Mar 16, 2026·also Cohere, Moonshot, UCSD, Xidian

Attention Residuals

Forget fixed residual connections: Attention Residuals let each layer selectively attend to previous layers, boosting performance and gradient flow in deep LLMs.

Kimi Team, Jianlin Su, Weixin Xu +28

Architecture Design (Transformers, SSMs, MoE)Training Efficiency & Optimization

Search

Weiran He

Research focus

Frequent co-authors

Papers (2)