Search papers, labs, and topics across Lattice.
University of California, Los Angeles
1
0
3
1
FPGAs can beat GPUs at dynamically allocating computation for LLM inference, thanks to a new architecture that fuses operations, uses mixed precision, and caches KV values on-chip.