Search papers, labs, and topics across Lattice.
The Hong Kong University of Science and Technology (Guangzhou), Guangzhou, China
1
0
3
6
Lossless compression can actually *speed up* LLM inference on GPUs, not just shrink model size, thanks to ZipServ's hardware-aware design.