Search papers, labs, and topics across Lattice.
4
0
7
1
HELMSMAN slashes hardware costs by over 90% while enabling billion-scale index rebuilds in mere hours, revolutionizing ANNS for large-scale applications.
Transforming the KV cache from a monolithic structure into a dynamic, head-aware system could revolutionize LLM serving efficiency and scalability.
Real-world robots forget how to fold towels after learning to pick-and-place, but this work shows experience replay can help, if you do it right.
Xiaohongshu's LASER framework slashes latency and boosts revenue by 2% in real-world recommendation systems via a novel segmented attention mechanism and a hybrid DRAM-SSD indexing strategy.