Search papers, labs, and topics across Lattice.
2
0
3
VLMs can achieve state-of-the-art Vision-Language Navigation performance by explicitly training them to reason about past actions and predict future visual transitions.
Image-goal navigation gets a boost from hierarchical reasoning, using vision-language models for high-level planning and online RL for low-level execution, significantly reducing wandering and improving success in complex environments.