Search papers, labs, and topics across Lattice.
3
0
4
5
Text-to-video models can now learn geometrically consistent world dynamics via reinforcement learning, without expensive architectural changes.
Counterintuitively, VLMs can achieve higher VQA accuracy by intentionally degrading visual inputs, suggesting that high-resolution details can act as noise that hinders reasoning.
Generative video models can now simulate a continuously evolving world, even when objects are out of sight, thanks to a new framework that maintains persistent global state.