Search papers, labs, and topics across Lattice.
University of California San Diego
3
0
5
PALUTE achieves 1,264 TPS at only 0.16 W, revolutionizing edge LLM inference with unprecedented energy efficiency.
Provable adversarial repair of Transformers is now possible beyond the last layer, thanks to a new framework that formulates repair as a tractable convex optimization problem.
Forget fixed residual connections: Attention Residuals let each layer selectively attend to previous layers, boosting performance and gradient flow in deep LLMs.