Mou Sun

Papers on Lattice

Total citations

Topics

Publication activitypapers/week, last 8 weeks

Research focus

Architecture Design (Transformers, SSMs, MoE) (2)Training Efficiency & Optimization (2)Distributed Systems & Hardware (1)

Frequent co-authors

Wuyue Zhang (1)Chongdong Huang (1)Chunbo You (1)Cheng Gu (1)

Papers (2)

Mar 3, 2026

Wuyue Zhang +5Mar 3, 2026

Practical FP4 Training for Large-Scale MoE Models on Hopper GPUs

Train massive MoEs on Hopper GPUs faster and with less memory, even without native FP4 support, by cleverly quantizing activations and communication.

Wuyue Zhang, Chongdong Huang, Chunbo You +3

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware Training Efficiency & Optimization

Feb 26, 2026

Shuchen Zhu +9Feb 26, 2026·also Corresponding author

Accelerating LLM Pre-Training through Flat-Direction Dynamics Enhancement

By strategically amplifying updates along flat directions in the loss landscape, LITE unlocks faster LLM pre-training with existing matrix-based optimizers like Muon and SOAP.

Shuchen Zhu, Shuchen Zhu, Rizhen Hu +7

Architecture Design (Transformers, SSMs, MoE)Training Efficiency & Optimization

Search

Mou Sun

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (2)