Search papers, labs, and topics across Lattice.
2
0
5
Audio-specific KV cache eviction lets you compress LALMs by 40% with almost no accuracy loss, while generic methods fall apart.
Unified multimodal models secretly contain separate inference pathways for generation and understanding, and FlashU unlocks this hidden potential for 2x speedup without retraining.