Search papers, labs, and topics across Lattice.
Department of Electrical and Computer Engineering, University of Wisconsin-Madison
2
0
3
23
Hybrid Mamba-Transformer models can get 4x faster time to first token and 1.4x higher throughput by disaggregating prefill and decode phases onto specialized accelerator packages.
LLMs can run up to 35% faster on chiplet architectures thanks to a new lossless exponent compression technique that slashes inter-chiplet communication overhead.