Search papers, labs, and topics across Lattice.
2
2
3
4
Training speculative decoding models just got an order of magnitude faster, unlocking real-world deployment with a new open-source framework and a suite of production-ready draft models.
LLMs can now autonomously generate and deploy GPU kernels into production LLM engines, thanks to a new standardized framework for benchmarking and integrating these AI-generated kernels.