Search papers, labs, and topics across Lattice.
2
0
4
7
Exploit the surprisingly stable, yet heterogeneous, sparsity patterns across attention heads to slash LLM attention latency by 2.88x without sacrificing quality.
Run 2x more LLM fine-tuning jobs on the same hardware with MuxTune's clever spatial-temporal multiplexing, making your datacenter greener and your boss happier.