Search papers, labs, and topics across Lattice.
1
0
3
2
Hybrid Mamba-Transformer models can get 4x faster time to first token and 1.4x higher throughput by disaggregating prefill and decode phases onto specialized accelerator packages.