Wenwu Zhu

Papers on Lattice

Total citations

Topics

Research focus

Architecture Design (Transformers, SSMs, MoE) (1)Distributed Systems & Hardware (1)Inference & Quantization (1)

Frequent co-authors

Yuan Meng (1)Xiaoyu Zhan (1)Zhi Wang (1)

Papers (1)

Feb 16, 2026

WiSparse: Boosting LLM Inference Efficiency with Weight-Aware Mixed Activation Sparsity

LLMs can be sped up by 21% at inference time without retraining, thanks to a new sparsity method that smartly prunes activations based on the importance of the weights they interact with.

Yuan Meng, Xiaoyu Zhan, Zhi Wang +1

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware Inference & Quantization

Search

Wenwu Zhu

Research focus

Frequent co-authors

Papers (1)