Search papers, labs, and topics across Lattice.
5
0
7
2
Existing video world models struggle with long-term memory retention, and MBench exposes their critical limitations while providing a structured path for future improvements.
Forget brute-force scaling: REVERSE shows that teaching an agent *how* to search and verify evidence lets a smaller model beat giants at image geo-localization.
Aligning diffusion models with human preferences just got a fidelity upgrade: DRM leverages the generative backbone itself for rewards, unlocking step-wise guidance that boosts image quality.
Current video understanding models struggle with long-horizon robustness and non-speech audio, as revealed by the new OmniPro benchmark designed for comprehensive omni-modal proactive evaluation.
Removing objects from video now means removing their shadows and reflections too, thanks to a new method that teaches diffusion models to "understand" object-scene physics.