Hongwu Peng

D slices by indexing into the respective

Papers on Lattice

Total citations

Topics

h-index

Research focus

Architecture Design (Transformers, SSMs, MoE) (1)Distributed Systems & Hardware (1)Inference & Quantization (1)

Frequent co-authors

Songtao Liu (1)Zhiwei Zhang (1)Zhengyu Chen (1)Yue Guo (1)

Papers (1)

Mar 2, 2026

Mar 2, 2026·also D slices by indexing into the respective, DeepSeek, Perplexity

Multi-Head Low-Rank Attention

MLRA unlocks 2.8x faster LLM decoding by enabling efficient tensor parallelism for latent attention, sidestepping the memory traffic bottlenecks that plague existing methods.

Songtao Liu, Hongwu Peng, Zhiwei Zhang +2

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware Inference & Quantization

Search

Hongwu Peng

Research focus

Frequent co-authors

Papers (1)