Search papers, labs, and topics across Lattice.
Eastern Institute of Technology, Ningbo Key Laboratory of Spatial Intelligence and Digital Derivative
5
0
9
6
FPGAs can beat GPUs at dynamically allocating computation for LLM inference, thanks to a new architecture that fuses operations, uses mixed precision, and caches KV values on-chip.
Untangling the mess of "streaming LLMs," this paper delivers a clear taxonomy that distinguishes between streaming generation, streaming inputs, and interactive architectures.
LVLMs can reason about video streams *much* faster and better by thinking concurrently with the incoming data, not in batches.
Forget simple image search: MCMR reveals how current multimodal models struggle with the complex, interdependent constraints of real-world product search.
LLMs actually *do* improve time series forecasting, especially for cross-domain generalization, overturning prior doubts with a massive 8-billion observation study.