Search papers, labs, and topics across Lattice.
Shanghai Jiao Tong University, University of California
1
0
2
LLMs waste compute on tokens that have already "figured it out" – DASH selectively skips these tokens during prefill, speeding things up without retraining or sacrificing accuracy.