Search papers, labs, and topics across Lattice.
Shenzhen Institutes of Advanced Technology, CAS; UCAS
1
0
2
Fine-grained management of speculative decoding phases can boost LLM serving throughput by over 50% and cut latency nearly in half.