Search papers, labs, and topics across Lattice.
2
0
4
1
Stop wasting compute: LRMs can cut reasoning steps by 30% without sacrificing accuracy using a metacognitive approach to determine when "thinking is enough."
Achieve 11.8x faster reasoning with 80% KV cache compression by estimating token importance directly from FlashAttention's intermediate results – no extra compute needed.