Search papers, labs, and topics across Lattice.
2
0
5
0
Forget coarse sequence-level hacks: LenVM lets you precisely dial in token generation length, boosting a 7B model's length accuracy from 30.9 to 64.8 and crushing closed-source rivals.
Training speculative decoding models just got an order of magnitude faster, unlocking real-world deployment with a new open-source framework and a suite of production-ready draft models.