Search papers, labs, and topics across Lattice.
Centre for Artificial Intelligence and Robotics, The State Key Laboratory of Internet, of Things for Smart City, University of Macau
2
0
3
VLMs can achieve state-of-the-art Vision-Language Navigation performance by explicitly training them to reason about past actions and predict future visual transitions.
Image-goal navigation gets a boost from hierarchical reasoning, using vision-language models for high-level planning and online RL for low-level execution, significantly reducing wandering and improving success in complex environments.