Search papers, labs, and topics across Lattice.
The State Key Laboratory of Blockchain and Data Security, Zhejiang University
2
0
3
25
Video-LLMs can be sped up by nearly 3x without sacrificing performance, simply by loosening the strict matching requirements of speculative decoding and focusing on visual-semantic relevance.
Visual token dominance is the hidden culprit behind LVLM inference inefficiency, and this paper dissects the problem to reveal how to navigate the fidelity-efficiency tradeoff.