Search papers, labs, and topics across Lattice.
University of Illinois Chicago
1
0
3
Ternary LLMs can run up to 62x faster on CPU and 1.9x faster on CUDA with RSR-core, a new engine that finally brings theoretically fast low-bit matrix multiplication to practical hardware.