Search papers, labs, and topics across Lattice.
University of British Columbia, Vancouver, BC, Canada, Vector Institute, Toronto, ON, Canada
2
0
5
Current vision-language models falter in ultra-resolution reasoning, with errors primarily stemming from evidence grounding and local perception.
VLMs get a 24% performance boost and run 56% faster on robot manipulation tasks by explicitly modeling action advantages and exploring multiple future paths, instead of relying on noisy foresight predictions.