Search papers, labs, and topics across Lattice.
2
0
5
4
Video-LLMs can achieve up to 2.65x faster time-to-first-token and 61% FLOPs reduction by compressing visual tokens *inside* the vision encoder, not just after.
Open-source document parsing models are shockingly brittle, losing nearly 18% accuracy on real-world photos and 14% on non-Latin scripts compared to their closed-source counterparts.