Search papers, labs, and topics across Lattice.
3
0
5
3
Forget brute-force scaling: intelligently selecting just 1% of video frames can actually *improve* video QA accuracy and cut compute by 93%.
Visual language models can now explicitly reason about object trajectories in videos, thanks to a simple yet effective method that augments training data and uses discrete motion tags.
MLLMs can gain surprisingly strong 3D spatial reasoning abilities simply by tapping into the latent knowledge already present in video generation models.