Search papers, labs, and topics across Lattice.
2
0
4
0
A single spatial token, learned via occupancy prediction on a massive dataset, is surprisingly effective at injecting crucial spatial awareness into vision-language navigation, leading to state-of-the-art performance.
Forget training wheels: DeepScan unlocks significant gains in LVLM visual reasoning *without* any additional training, achieving state-of-the-art results on V*.