Search papers, labs, and topics across Lattice.
2
0
5
0
On-device LLM inference can be sped up by an order of magnitude with a flexible TrustZone-based system that selectively protects memory and the NPU.
Generative recommendation can beat DLRM in large-scale advertising, driving a 4.2% revenue lift in Kuaishou's production system via innovations in tokenization, decoding, optimization, and serving.