Search papers, labs, and topics across Lattice.
8
2
9
3
Forget expensive human feedback loops: a VLM-powered reward function can efficiently align image editing diffusion models with human preferences.
Coordinating embodied multi-agent systems doesn't require end-to-end training; instead, offload planning to a VLM in simulation and transfer back to the real world for execution.
Current image editing models, even closed-source ones, still fall short on complex and creative instruction-based tasks, as revealed by a new interpretable QA-based evaluation framework.
Foundation models can be tamed to reconstruct realistic 4D interactions between hands and articulated objects from a single RGB video, even without pre-scanning or multi-view data.
RL's inherent resilience to catastrophic forgetting can be harnessed to improve continual learning in GUI agents, outperforming SFT alone.
Achieve spatially precise image edits in complex scenes by explicitly reasoning about object positions in text *before* visual grounding.
Achieve more realistic and coherent 4D scene representations by modeling motion within the SE(3) Lie group, outperforming NeRF-based methods.
Imagine AI scientists that not only reason but also autonomously conduct experiments in the real world – that's the promise of Intelligent Science Laboratories.