Search papers, labs, and topics across Lattice.
Institute of Artificial Intelligence and Robotics, National Key Laboratory of Human-Machine Hybrid Augmented Intelligence, National Engineering Research Center for Visual Information and Applications, Xi'an Jiaotong University
3
0
4
VLMs can achieve state-of-the-art Vision-Language Navigation performance by explicitly training them to reason about past actions and predict future visual transitions.
Image-goal navigation gets a boost from hierarchical reasoning, using vision-language models for high-level planning and online RL for low-level execution, significantly reducing wandering and improving success in complex environments.
Open-source VLN agents can nearly double their navigation success by remembering where they've been, thanks to a new hierarchical memory system.