Search papers, labs, and topics across Lattice.
6
0
9
6
Achieving nearly 10 times faster reranking without sacrificing performance, CompRank revolutionizes the efficiency of LLMs in retrieval tasks.
FPGAs can beat GPUs at dynamically allocating computation for LLM inference, thanks to a new architecture that fuses operations, uses mixed precision, and caches KV values on-chip.
Untangling the mess of "streaming LLMs," this paper delivers a clear taxonomy that distinguishes between streaming generation, streaming inputs, and interactive architectures.
LVLMs can reason about video streams *much* faster and better by thinking concurrently with the incoming data, not in batches.
Forget simple image search: MCMR reveals how current multimodal models struggle with the complex, interdependent constraints of real-world product search.
LLMs actually *do* improve time series forecasting, especially for cross-domain generalization, overturning prior doubts with a massive 8-billion observation study.