Search papers, labs, and topics across Lattice.
1
0
3
21
MoE models, despite their training efficiency, can be structurally 4.5x slower than quality-matched dense models at inference due to memory fragmentation, especially in long-context scenarios.