Search papers, labs, and topics across Lattice.
4
0
4
1
Text-to-video diffusion models can now count (more accurately) without retraining, thanks to a clever attention-based guidance method.
Achieve state-of-the-art 3D scene understanding by dynamically adapting network parameters at test time, proving that input-aware adjustments can significantly boost performance with minimal overhead.
World models can now remember and realistically regenerate dynamic objects that temporarily disappear from view, thanks to a novel hybrid memory architecture.
MLLMs can gain surprisingly strong 3D spatial reasoning abilities simply by tapping into the latent knowledge already present in video generation models.