XJTUMay 25, 2026arXiv:2605.25829

OASIS: Observation-Action Space Alignment via SE(3) Trajectory Prediction for Robotic Manipulation

Xinzhe Chen, Sihua Ren, Liqi Huang, Haowen Sun, Mingyang Li, Xingyu Chen, Zeyang Liu, Xuguang Lan

AI Summary

The paper introduces OASIS, a visuomotor policy that aligns intermediate representations with the action space by predicting SE(3) end-effector trajectories. This is motivated by the observation that existing VLA and WAM models operate primarily in the observation space, forcing action decoders to implicitly recover rigid-body geometry. OASIS achieves improved performance and out-of-distribution generalization in robotic manipulation tasks by coupling a 3D-aware feature encoder with an SE(3) trajectory predictor, enabling the action decoder to generate actions consistent with rigid-body motion.

Key Contribution

By explicitly aligning visuomotor representations with the action space's rigid-body geometry, OASIS achieves superior robotic manipulation performance compared to methods relying on implicit geometry recovery.

Abstract

Recent vision-language-action (VLA) models and world action models (WAMs) advance robotic manipulation by enriching intermediate representations with auxiliary spatial features or future visual-state prediction. However, these representations largely remain within the observation space and do not share the rigid-body geometry of the action space, forcing the action decoder to implicitly recover this geometry. We propose OASIS, a visuomotor policy that aligns the intermediate representation with the action space via $SE(3)$ end-effector trajectory prediction. OASIS couples a 3D-aware feature encoder that fuses vision-language and metric-depth features with an $SE(3)$ trajectory predictor that produces a camera-frame end-effector trajectory. Conditioned on the predictor's pose-supervised hidden states, the action decoder generates action chunks consistent with rigid-body motion. Across simulation and real-world experiments, OASIS outperforms VLA and WAM baselines in success rate and out-of-distribution generalization. Our project page is available at https://npuhandsome.github.io/OASIS_web.

Multimodal Models Robotics & Embodied AI World Models & Planning

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

OASIS: Observation-Action Space Alignment via SE(3) Trajectory Prediction for Robotic Manipulation

Related Papers