Search papers, labs, and topics across Lattice.
D Diffusion Policy [29]. These techniques improve sample efficiency and multimodal trajectory modeling but still face challenges in distributional shift, real-time inference, and training stability. 2.2 Vision-Language Action Model Building on these foundations, Vision–Language–Action (VLA) models merge perception, instruction understanding, and control into unified networks. Representative instances include RT-2 [32], OpenVLA [13], Robotics Diffusion Transformer (RDT) [18], π0\pi_{0} [1], CogACT [14], SpatialVLA [21], π0.5\pi_{0.5} [11], SmolVLA [23], UniVLA [3], WALL-OSS [30], GR
1
0
0
1
Current robot learning benchmarks may be too simplistic: GM-100 offers 100 new, challenging tasks designed to expose the long-tail behaviors that existing benchmarks miss.