Search papers, labs, and topics across Lattice.
6
0
9
2
LLMs can generate recommendations up to 3.1x faster by explicitly modeling token position within items and speculation depth during speculative decoding.
Radar odometry, typically confined to urban settings, can be pushed off-road with simple adaptations like IMU preintegration, but still faces significant challenges in unstructured environments.
Two-tower recommendation models can get a major online performance boost without latency penalties, thanks to a new capability synergy framework.
Forget disjointed workflows: AutoCut's unified token space for video, audio, and text slashes ad production costs while boosting consistency.
Generative recommendation can beat DLRM in large-scale advertising, driving a 4.2% revenue lift in Kuaishou's production system via innovations in tokenization, decoding, optimization, and serving.
Ditch rigid distribution assumptions: a novel residual quantization approach predicts continuous values by recursively refining quantization codes, outperforming SOTA in recommendation tasks.