Search papers, labs, and topics across Lattice.
4
0
8
11
Skip the costly full training runs: this new metric accurately predicts face recognition dataset quality using only lightweight proxy models.
LLaVA-OV-2's codec-stream tokenization lets it crush existing video-language models, especially in tasks requiring fine-grained temporal understanding of high-frequency motion.
Forget generic retrieval signals – UniDoc-RL uses reinforcement learning to teach LVLMs how to actively perceive and reason about visual information, yielding a 17.7% performance boost.
Turns out, skipping the boring parts of a video (like static backgrounds) makes your vision AI both faster and smarter, beating state-of-the-art models with less data.