Peter Balogh

Papers on Lattice

Total citations

Topics

Publication activitypapers/week, last 8 weeks

Research focus

Architecture Design (Transformers, SSMs, MoE) (2)Training Efficiency & Optimization (1)Interpretability & Mechanistic Interp (1)

Papers (2)

Mar 3, 2026

Peter Balogh1w ago

Half the Nonlinearity Is Wasted: Measuring and Reallocating the Transformer's MLP Budget

Transformers waste up to 56% of their MLP compute on near-linear operations, and selectively replacing nonlinear layers with linear ones can actually *improve* performance.

Peter Balogh

Architecture Design (Transformers, SSMs, MoE)Training Efficiency & Optimization

Feb 19, 2026

Peter Balogh3w ago

The Anxiety of Influence: Bloom Filters in Transformer Attention Heads

Certain transformer attention heads act like surprisingly robust Bloom filters, remembering which tokens appeared earlier in the context with impressive accuracy and generalizing beyond just repeated names.

Peter Balogh

Architecture Design (Transformers, SSMs, MoE)Interpretability & Mechanistic Interp

Search

Peter Balogh

Publication activitypapers/week, last 8 weeks

Research focus

Papers (2)