Search papers, labs, and topics across Lattice.
2
0
3
Hybrid Mamba-Transformer LLMs get a 4x speed boost in time-to-first-token and 1.4x higher throughput thanks to a new disaggregated accelerator architecture tailored to prefill and decode phases.
By exploiting the low entropy of BF16 exponents with Huffman coding, LEXI slashes inter-chiplet communication latency in LLMs by up to 45% without sacrificing accuracy.