Search papers, labs, and topics across Lattice.
2
0
3
4
Hybrid Mamba-Transformer models can get 4x faster time to first token and 1.4x higher throughput by disaggregating prefill and decode phases onto specialized accelerator packages.
LLMs can run up to 35% faster on chiplet architectures thanks to a new lossless exponent compression technique that slashes inter-chiplet communication overhead.