Search papers, labs, and topics across Lattice.
University of Notre Dame IN
2
0
5
Cut KV-cache transfer times by up to 32% with SplitZip, a new GPU-friendly lossless compressor that unlocks faster disaggregated LLM serving.
Stop wasting your finetuning data: Specialized Pretraining (SPT) can outperform standard pretraining and finetuning, achieving better domain performance with fewer parameters and less compute.