Search papers, labs, and topics across Lattice.
University of California, Merced
1
0
3
Squeeze up to 3.2x more performance from your long-context LLMs by intelligently splitting attention computation between CPU and GPU.