Search papers, labs, and topics across Lattice.
School of Computer Science and Engineering, University of Electronic Science and Technology of China
3
0
7
Attention's quadratic complexity is no longer a bottleneck: DASH-KV achieves linear O(N) inference without sacrificing accuracy by reformulating attention as an approximate nearest-neighbor search.
Ditch the haystack: Tri-RAG structures external knowledge into logical triplets, slashing irrelevant context and boosting RAG's reasoning power.
Video-LLMs hallucinate because they fixate on a single "anchor frame," but a simple decoder-side attention fix can dramatically improve grounding without retraining.