Souvik Kundu

Intel

Papers on Lattice

Total citations

Topics

Publication activitypapers/week, last 8 weeks

Research focus

Architecture Design (Transformers, SSMs, MoE) (1)Distributed Systems & Hardware (1)Inference & Quantization (1)Eval Frameworks & Benchmarks (1)

Frequent co-authors

Hanjiang Wu (1)Abhimanyu Rajeshkumar Bambhaniya (1)Sarbartha Banerjee (1)Tuhin Khare (1)

Papers (2)

May 27, 2026

May 27, 2026·also DeepMind, Google Research, AMD Research and Advanced Development, Intel Labs

How Far Can Disaggregation Go? A Design-Space Exploration of Attention-FFN Disaggregation for Efficient MoE LLM Serving

Splitting attention and feedforward networks onto separate GPUs can unlock 4x higher MoE LLM throughput, but only if you carefully tune the GPU partitioning strategy based on the workload.

Hanjiang Wu, Abhimanyu Rajeshkumar Bambhaniya, Sarbartha Banerjee +9

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware Inference & Quantization

May 25, 2026

K on Terminal Bench and 52.May 25, 2026·also Intel Labs, K versus 72.

Agentic AI Workload Characteristics

Agentic workloads aren't just long prompts; they're decode-bound beasts with a tool-use personality arc, demanding a rethink of LLM serving infrastructure.

Yichao Yuan, Ankita Nayak, Souvik Kundu +1

Eval Frameworks & Benchmarks Reasoning & Chain-of-Thought Tool Use & Agents

Search

Souvik Kundu

Publication activitypapers/week, last 8 weeks

Research focus

Frequent co-authors

Papers (2)