Search papers, labs, and topics across Lattice.
Lequn Fu, Yijun Zhong, Xiao Li, and Yibin Liu are with the Huazhong University of Science and Technology, Wuhan 430074, China (e-mail: fulq@hust.edu.cn; zhongyijun@hust.edu.cn; li_xiao@hust.edu.cn; liu_yibin@hust.edu.cn). Zhiyuan Xu, and Jian Tang are with the Beijing Humanoid Robot Innovation Center, Beijing 100086, China (e-mail: eric.xu@x-humanoid.com; jian.tang@x-humanoid.com). Shiqi Li is with the Huazhong University of Science and Technology, Wuhan 430074, China (e-mail: sqli@hust.edu.cn)
1
0
4
2
A 7B model trained with RL can outperform 72B-scale general MLLMs in robotic manipulation process supervision by explicitly reasoning about progress toward the final task goal.