Search papers, labs, and topics across Lattice.
2
0
5
0
Current LLM efficiency metrics fail to capture the true cost of tool use, as measured by wall-clock latency, but a new hardware-aware metric closes the gap.
Forget SVD: CARE aligns low-rank attention approximations with input activations, boosting accuracy up to 1.7x and slashing perplexity by 215x when converting models to multi-head latent attention.