Search papers, labs, and topics across Lattice.
The State Key Laboratory of Blockchain and Data Security, Zhejiang University
3
0
5
0
Visual token dominance is the hidden culprit behind LVLM inference inefficiency, and this paper dissects the problem to reveal how to navigate the fidelity-efficiency tradeoff.
MLLMs can achieve up to 7.9x KV cache compression and 1.52x faster decoding without sacrificing performance by intelligently compressing different attention heads with distinct strategies.
Zero-shot RL agents can now learn better representations by focusing on dynamics-relevant image regions, leading to state-of-the-art generalization performance.