Suvinay Subramanian

Google

Google Research

Papers on Lattice

Total citations

Topics

Publication activitypapers/week, last 8 weeks

Research focus

Architecture Design (Transformers, SSMs, MoE) (1)Distributed Systems & Hardware (1)Inference & Quantization (1)

Frequent co-authors

Hanjiang Wu (1)Abhimanyu Rajeshkumar Bambhaniya (1)Sarbartha Banerjee (1)Tuhin Khare (1)

Papers (1)

May 27, 2026

May 27, 2026·also DeepMind, Google Research, AMD Research and Advanced Development, Intel Labs

How Far Can Disaggregation Go? A Design-Space Exploration of Attention-FFN Disaggregation for Efficient MoE LLM Serving

Splitting attention and feedforward networks onto separate GPUs can unlock 4x higher MoE LLM throughput, but only if you carefully tune the GPU partitioning strategy based on the workload.

Hanjiang Wu, Abhimanyu Rajeshkumar Bambhaniya, Sarbartha Banerjee +9

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware Inference & Quantization

Search

Suvinay Subramanian

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (1)