Jiachen Zhu

Papers on Lattice

Total citations

Topics

h-index

Publication activitypapers/week, last 8 weeks

Research focus

Architecture Design (Transformers, SSMs, MoE) (1)Interpretability & Mechanistic Interp (1)Natural Language Processing (1)

Frequent co-authors

Shangwen Sun (1)A. Canziani (1)Alfredo Canziani (1)Yann LeCun (1)

Papers (1)

Mar 5, 2026

Meta AIMar 5, 2026·also NYU

The Spike, the Sparse and the Sink: Anatomy of Massive Activations and Attention Sinks

Pre-normalization in Transformers is the culprit behind the mysterious link between massive activation outliers and attention sinks, but decoupling them reveals their distinct functions: global parameterization vs. local attention modulation.

Shangwen Sun, A. Canziani, Alfredo Canziani +2

Architecture Design (Transformers, SSMs, MoE)Interpretability & Mechanistic Interp Natural Language Processing

Search

Jiachen Zhu

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (1)