Search papers, labs, and topics across Lattice.
Westlake University
1
0
3
Video-LLMs can achieve up to 2.65x faster time-to-first-token and 61% FLOPs reduction by compressing visual tokens *inside* the vision encoder, not just after.