Mar 5, 2026arXiv:2603.05410

PhysiFlow: Physics-Aware Humanoid Whole-Body VLA via Multi-Brain Latent Flow Matching and Robust Tracking

Weikai Qin, S. Wu, Sichen Wu, Ci Chen, Mengfan Liu, Linxi Feng, Xinru Cui, Haoqi Han, Hesheng Wang

AI Summary

The paper introduces PhysiFlow, a Vision-Language-Action (VLA) framework that integrates semantic guidance with physics-aware whole-body control for humanoid robots. PhysiFlow uses a multi-brain latent flow matching approach to improve VLA inference efficiency and enable stable, dynamic limb-coordinated movements. Experiments demonstrate that PhysiFlow allows for reliable vision-language-guided full-body coordination.

Key Contribution

Achieve stable, semantically-guided humanoid robot control with PhysiFlow, a framework that fuses vision-language-action with physics-aware whole-body control.

Abstract

In the domain of humanoid robot control, the fusion of Vision-Language-Action (VLA) with whole-body control is essential for semantically guided execution of real-world tasks. However, existing methods encounter challenges in terms of low VLA inference efficiency or an absence of effective semantic guidance for whole-body control, resulting in instability in dynamic limb-coordinated tasks. To bridge this gap, we present a semantic-motion intent guided, physics-aware multi-brain VLA framework for humanoid whole-body control. A series of experiments was conducted to evaluate the performance of the proposed framework. The experimental results demonstrated that the framework enabled reliable vision-language-guided full-body coordination for humanoid robots.

Multimodal Models Robotics & Embodied AI World Models & Planning

Citation Metrics

Citations0

Influential citations0

References29

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

PhysiFlow: Physics-Aware Humanoid Whole-Body VLA via Multi-Brain Latent Flow Matching and Robust Tracking

Related Papers