Search papers, labs, and topics across Lattice.
DSA, HKUST(GZ)
1
0
2
Dynamic allocation of compute resources based on attention entropy can yield significant speedups in long-context LLM inference without sacrificing quality.