Search papers, labs, and topics across Lattice.
University of Illinois Chicago
2
0
6
Ternary LLMs can run up to 62x faster on CPU and 1.9x faster on CUDA with RSR-core, a new engine that finally brings theoretically fast low-bit matrix multiplication to practical hardware.
Generative AI can drastically improve image retrieval accuracy for complex queries, outperforming contrastive learning methods by up to 93%.