Search papers, labs, and topics across Lattice.
2
0
5
0
Today's best video models achieve near-zero success rates on interactive video generation, revealing a stark gap in multimodal reasoning and physical grounding.
Forget real-world video datasets: training VLMs on just 7.7K synthetic videos with temporal primitives beats 165K real-world examples, unlocking surprisingly effective transfer learning for video reasoning.