Apr 15, 2026arXiv:2604.13476

RobotPan: A 360$^\circ$ Surround-View Robotic Vision System for Embodied Perception

Jiahao Ma, Qiang Zhang, Peiran Liu, Peiran Liu, Zeran Su, Pihai Sun, Gang Han, Wen Zhao, Wei Cui, Zhang Zhang, Zhiyuan Xu, Renjing Xu, Jian Tang, Miaomiao Liu, Miaomiao Liu, Yijie Guo

AI Summary

This paper introduces RobotPan, a robotic vision system that uses six cameras and LiDAR to achieve 360° surround-view perception for robots. RobotPan employs a feed-forward framework to predict metric-scaled 3D Gaussians from sparse-view inputs, enabling real-time rendering, reconstruction, and streaming. The system uses hierarchical spherical voxel priors to allocate fine resolution near the robot and coarser resolution at larger radii, and includes an online fusion method to update dynamic content and prevent unbounded growth in static regions.

Key Contribution

Achieve real-time 360° robotic vision with RobotPan, which drastically reduces the number of Gaussians needed for reconstruction and view synthesis compared to previous feed-forward methods.

Abstract

Surround-view perception is increasingly important for robotic navigation and loco-manipulation, especially in human-in-the-loop settings such as teleoperation, data collection, and emergency takeover. However, current robotic visual interfaces are often limited to narrow forward-facing views, or, when multiple on-board cameras are available, require cumbersome manual switching that interrupts the operator's workflow. Both configurations suffer from motion-induced jitter that causes simulator sickness in head-mounted displays. We introduce a surround-view robotic vision system that combines six cameras with LiDAR to provide full 360$^\circ$ visual coverage, while meeting the geometric and real-time constraints of embodied deployment. We further present \textsc{RobotPan}, a feed-forward framework that predicts \emph{metric-scaled} and \emph{compact} 3D Gaussians from calibrated sparse-view inputs for real-time rendering, reconstruction, and streaming. \textsc{RobotPan} lifts multi-view features into a unified spherical coordinate representation and decodes Gaussians using hierarchical spherical voxel priors, allocating fine resolution near the robot and coarser resolution at larger radii to reduce computational redundancy without sacrificing fidelity. To support long sequences, our online fusion updates dynamic content while preventing unbounded growth in static regions by selectively updating appearance. Finally, we release a multi-sensor dataset tailored to 360$^\circ$ novel view synthesis and metric 3D reconstruction for robotics, covering navigation, manipulation, and locomotion on real platforms. Experiments show that \textsc{RobotPan} achieves competitive quality against prior feed-forward reconstruction and view-synthesis methods while producing substantially fewer Gaussians, enabling practical real-time embodied deployment. Project website: https://robotpan.github.io/

Computer Vision Multimodal Models Robotics & Embodied AI

Citation Metrics

Citations0

Influential citations0

References50

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

RobotPan: A 360$^\circ$ Surround-View Robotic Vision System for Embodied Perception

Related Papers