Search papers, labs, and topics across Lattice.
1
0
3
LLMs can achieve up to 2x inference speedup without retraining by intelligently sharing KV cache states during early exit, sidestepping the usual performance bottlenecks.