Search papers, labs, and topics across Lattice.
6
0
9
10
One model to control them all: Qwen-VLA achieves impressive zero-shot generalization across diverse robotic tasks and embodiments by unifying vision-language-action modeling.
Decoupling high-level VLM planning from low-level diffusion-based control lets robots reason like foundation models *and* execute precisely, outperforming end-to-end approaches in complex manipulation tasks.
A 7B model trained with RL can outperform 72B-scale general MLLMs in robotic manipulation process supervision by explicitly reasoning about progress toward the final task goal.
Achieve real-time embodied manipulation with large 3D vision models using a novel asynchronous architecture that boosts success rates by up to 51.4% while simultaneously reducing inference time.
Forget short-term context windows: VPWEM's Transformer-based memory compressor lets robots ace long-horizon manipulation tasks by distilling past observations into fixed-size episodic memories.
Bimanual robots can now achieve robust dexterous grasping in the real world, thanks to a massive 20M-frame synthetic dataset and a simple attention-based policy that transfers surprisingly well.