Search papers, labs, and topics across Lattice.
2
0
5
5
Speculative decoding, typically used post-RL, can be integrated directly into RL training loops to accelerate LLM rollout generation by up to 2.5x.
Synthetic data can significantly overestimate the real-world throughput gains from speculative decoding, highlighting the critical need for benchmarks like SPEED-Bench that use diverse, production-realistic workloads.