Mar 5, 2026arXiv:2603.05355

Omni-Manip: Beyond-FOV Large-Workspace Humanoid Manipulation with Omnidirectional 3D Perception

Pei Qu, Zheng Li, Yufei Jia, Ziyun Liu, Liang Zhu, Liang-Jia Zhu, Haoang Li, Jinni Zhou, Jun Ma

AI Summary

Omni-Manip, a novel LiDAR-driven visuomotor policy, is introduced to enable humanoid robots to perform manipulation tasks in large workspaces by overcoming perceptual limitations. The method uses a Time-Aware Attention Pooling mechanism to process panoramic point clouds, efficiently encoding sparse 3D data and capturing temporal dependencies for robust 360° perception. Experiments demonstrate that Omni-Manip outperforms egocentric depth camera baselines in both simulated and real-world cluttered environments, showcasing its ability to handle large workspaces without frequent repositioning.

Key Contribution

Humanoid robots can now nimbly manipulate objects across much larger workspaces thanks to a LiDAR-powered perception system that eliminates the need for constant repositioning.

Abstract

The deployment of humanoid robots for dexterous manipulation in unstructured environments remains challenging due to perceptual limitations that constrain the effective workspace. In scenarios where physical constraints prevent the robot from repositioning itself, maintaining omnidirectional awareness becomes far more critical than color or semantic information. While recent advances in visuomotor policy learning have improved manipulation capabilities, conventional RGB-D solutions suffer from narrow fields of view (FOV) and self-occlusion, requiring frequent base movements that introduce motion uncertainty and safety risks. Existing approaches to expanding perception, including active vision systems and third-view cameras, introduce mechanical complexity, calibration dependencies, and latency that hinder reliable real-time performance. In this work, We propose Omni-Manip, an end-to-end LiDAR-driven 3D visuomotor policy that enables robust manipulation in large workspaces. Our method processes panoramic point clouds through a Time-Aware Attention Pooling mechanism, efficiently encoding sparse 3D data while capturing temporal dependencies. This 360{\deg} perception allows the robot to interact with objects across wide areas without frequent repositioning. To support policy learning, we develop a whole-body teleoperation system for efficient data collection on full-body coordination. Extensive experiments in simulation and real-world environments show that Omni-Manip achieves robust performance in large-workspace and cluttered scenarios, outperforming baselines that rely on egocentric depth cameras.

Computer Vision Multimodal Models Robotics & Embodied AI

Citation Metrics

Citations0

Influential citations0

References39

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Omni-Manip: Beyond-FOV Large-Workspace Humanoid Manipulation with Omnidirectional 3D Perception

Related Papers