ShanghaiTechApr 16, 2026arXiv:2604.15281

R3D: Revisiting 3D Policy Learning

Zhengdong Hong, Shenrui Wu, Haozhe Cui, Boyi Zhao, Bo Zhao, Ran Ji, Yiyang He, Hangxing Zhang, Zundong Ke, Guofeng Zhang, Jiayuan Gu

AI Summary

This paper addresses the challenges of training 3D perception models for robotic policy learning, specifically instability and overfitting. They identify the lack of 3D data augmentation and the destabilizing effects of Batch Normalization as key culprits. To overcome these issues, they introduce a transformer-based 3D encoder coupled with a diffusion decoder, achieving state-of-the-art performance on manipulation benchmarks.

Key Contribution

3D policy learning can finally leverage powerful 3D perception models thanks to a new architecture that overcomes training instabilities and overfitting.

Abstract

3D policy learning promises superior generalization and cross-embodiment transfer, but progress has been hindered by training instabilities and severe overfitting, precluding the adoption of powerful 3D perception models. In this work, we systematically diagnose these failures, identifying the omission of 3D data augmentation and the adverse effects of Batch Normalization as primary causes. We propose a new architecture coupling a scalable transformer-based 3D encoder with a diffusion decoder, engineered specifically for stability at scale and designed to leverage large-scale pre-training. Our approach significantly outperforms state-of-the-art 3D baselines on challenging manipulation benchmarks, establishing a new and robust foundation for scalable 3D imitation learning. Project Page: https://r3d-policy.github.io/

Architecture Design (Transformers, SSMs, MoE)Data Curation & Synthetic Data Robotics & Embodied AI

Citation Metrics

Citations0

Influential citations0

References60

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

R3D: Revisiting 3D Policy Learning

Related Papers