Search papers, labs, and topics across Lattice.
1
0
3
7
Turns out, VLA models are mostly just looking at the scene: visual pathways dominate action generation, and language only matters when the visuals are ambiguous.