Search papers, labs, and topics across Lattice.
State Key Laboratory of Multimedia Information Processing, School of Computer Science, Peking University;
1
0
3
VLA models lose sight of visual cues as they generate actions, but injecting multi-level visual features into deeper layers and pruning irrelevant tokens can significantly boost performance.