Search papers, labs, and topics across Lattice.
2
0
6
2
Long-context LLM rankings dramatically reshuffle when evaluated across a range of context lengths and capabilities, proving that a single headline score is misleading.
Achieve full-attention accuracy with 10x operator speedup and 4.7x throughput improvement in long-context LLM inference by overlapping KV cache transfers with computation.