Search papers, labs, and topics across Lattice.
The authors introduce GigaBrain-0.5M*, a vision-language-action (VLA) model that leverages world model-based reinforcement learning to overcome limitations in scene understanding and future anticipation. They employ RAMP (Reinforcement leArning via world Model-conditioned Policy) to fine-tune GigaBrain-0.5, a model pre-trained on 10,000 hours of robotic manipulation data. The resulting GigaBrain-0.5M* demonstrates a 30% performance improvement over the RECAP baseline on complex manipulation tasks and exhibits reliable long-horizon execution in real-world deployments.
Forget end-to-end VLAs: GigaBrain-0.5M* leverages world models and reinforcement learning to achieve a 30% performance boost on complex robotic manipulation tasks, showcasing reliable long-horizon execution.
Vision-language-action (VLA) models that directly predict multi-step action chunks from current observations face inherent limitations due to constrained scene understanding and weak future anticipation capabilities. In contrast, video world models pre-trained on web-scale video corpora exhibit robust spatiotemporal reasoning and accurate future prediction, making them a natural foundation for enhancing VLA learning. Therefore, we propose \textit{GigaBrain-0.5M*}, a VLA model trained via world model-based reinforcement learning. Built upon \textit{GigaBrain-0.5}, which is pre-trained on over 10,000 hours of robotic manipulation data, whose intermediate version currently ranks first on the international RoboChallenge benchmark. \textit{GigaBrain-0.5M*} further integrates world model-based reinforcement learning via \textit{RAMP} (Reinforcement leArning via world Model-conditioned Policy) to enable robust cross-task adaptation. Empirical results demonstrate that \textit{RAMP} achieves substantial performance gains over the RECAP baseline, yielding improvements of approximately 30\% on challenging tasks including \texttt{Laundry Folding}, \texttt{Box Packing}, and \texttt{Espresso Preparation}. Critically, \textit{GigaBrain-0.5M$^*$} exhibits reliable long-horizon execution, consistently accomplishing complex manipulation tasks without failure as validated by real-world deployment videos on our \href{https://gigabrain05m.github.io}{project page}.