U. Ogras

Department of Electrical and Computer Engineering, University of Wisconsin-Madison

Papers on Lattice

Total citations

Topics

h-index

Publication activitypapers/week, last 8 weeks

Research focus

Architecture Design (Transformers, SSMs, MoE) (2)Distributed Systems & Hardware (2)Inference & Quantization (2)

Frequent co-authors

Alish Kanani (2)Sang-Won Lee (1)Sangwan Lee (1)Han Lyu (1)

Papers (2)

Mar 16, 2026

2w ago

DUET: Disaggregated Hybrid Mamba-Transformer LLMs with Prefill and Decode-Specific Packages

Hybrid Mamba-Transformer models can get 4x faster time to first token and 1.4x higher throughput by disaggregating prefill and decode phases onto specialized accelerator packages.

Alish Kanani, Sang-Won Lee, Sangwan Lee +6

Architecture Design (Transformers, SSMs, MoE)Distributed Systems & Hardware Inference & Quantization

CMU ML2w ago·also UW-Madison

LEXI: Lossless Exponent Coding for Efficient Inter-Chiplet Communication in Hybrid LLMs

LLMs can run up to 35% faster on chiplet architectures thanks to a new lossless exponent compression technique that slashes inter-chiplet communication overhead.

Miao Sun, Alish Kanani, Kaushik Shroff +2