FudanUSCApr 20, 2026arXiv:2604.17876

OFlow: Injecting Object-Aware Temporal Flow Matching for Robust Robotic Manipulation

Kuanning Wang, Ke Fan, Chenhao Qiu, Zeyu Shangguan, Yuqian Fu, Yu Fu, Yanwei Fu, Daniel Seita, Xiangyang Xue

AI Summary

The paper introduces OFlow, a framework that unifies temporal foresight and object-aware reasoning in a shared semantic latent space for robotic manipulation. OFlow forecasts future latents using temporal flow matching and factorizes them into object-aware representations. Integrating OFlow into Vision-Language-Action (VLA) pipelines improves control reliability under distribution shifts, as demonstrated across multiple benchmarks and real-world tasks.

Key Contribution

Robots get a crucial boost in robustness by learning to "see" and predict how objects will move, not just react to the current frame.

Abstract

Robust robotic manipulation requires not only predicting how the scene evolves over time, but also recognizing task-relevant objects in complex scenes. However, existing VLA models face two limitations. They typically act only on the current frame, while future prediction and object-aware reasoning are often learned in separate latent spaces. We propose OFlow (injecting Object-Aware Temporal Flow Matching into VLAs), a framework that addresses both limitations by unifying temporal foresight and object-aware reasoning in a shared semantic latent space. Our method forecasts future latents with temporal flow matching, factorizes them into object-aware representations that emphasize physically relevant cues while filtering task-irrelevant variation, and conditions continuous action generation on these predictions. By integrating OFlow into VLA pipelines, our method enables more reliable control under distribution shifts. Extensive experiments across LIBERO, LIBERO-Plus, MetaWorld, and SimplerEnv benchmarks and real-world tasks demonstrate that object-aware foresight consistently enhances robustness and success.

Computer Vision Multimodal Models Robotics & Embodied AI

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

OFlow: Injecting Object-Aware Temporal Flow Matching for Robust Robotic Manipulation

Related Papers