Search papers, labs, and topics across Lattice.
Xi'an Jiaotong University
2
0
4
VLMs often fail at spatial reasoning because they either ignore visual cues or exhibit unstable reasoning, but a novel process-shaping framework can fix this.
RL agents can learn more robust vision-and-language navigation policies by exploring diverse trajectories and comparing their performance, even without expert demonstrations or value networks.