Search papers, labs, and topics across Lattice.
State Key Laboratory of Multimedia Information Processing, School of Computer Science, Peking University;
2
0
5
Visual grounding in VLAs weakens in deeper layers, but injecting multi-level visual features and pruning irrelevant tokens can boost performance by 9% in simulation and 7.5% in the real world.
Forget GPT-4o, the secret to better robot manipulation might be an agentic framework that generates diverse, physically plausible tasks, leading to superior VLA pre-training.