Search papers, labs, and topics across Lattice.
2
0
4
9
Training speculative decoding models just got an order of magnitude faster, unlocking real-world deployment with a new open-source framework and a suite of production-ready draft models.
PromptTuner slashes SLO violations by up to 7.9x and costs by 4.5x in LLM prompt tuning, outperforming existing resource management systems.