Search papers, labs, and topics across Lattice.
University of Science and Technology of China
1
0
3
Achieve up to 3.1x faster LLM-based recommendations by making speculative decoding aware of token positions within items and speculation depth.