Search papers, labs, and topics across Lattice.
The paper introduces PhysiFlow, a Vision-Language-Action (VLA) framework that integrates semantic guidance with physics-aware whole-body control for humanoid robots. PhysiFlow uses a multi-brain latent flow matching approach to improve VLA inference efficiency and enable stable, dynamic limb-coordinated movements. Experiments demonstrate that PhysiFlow allows for reliable vision-language-guided full-body coordination.
Achieve stable, semantically-guided humanoid robot control with PhysiFlow, a framework that fuses vision-language-action with physics-aware whole-body control.
In the domain of humanoid robot control, the fusion of Vision-Language-Action (VLA) with whole-body control is essential for semantically guided execution of real-world tasks. However, existing methods encounter challenges in terms of low VLA inference efficiency or an absence of effective semantic guidance for whole-body control, resulting in instability in dynamic limb-coordinated tasks. To bridge this gap, we present a semantic-motion intent guided, physics-aware multi-brain VLA framework for humanoid whole-body control. A series of experiments was conducted to evaluate the performance of the proposed framework. The experimental results demonstrated that the framework enabled reliable vision-language-guided full-body coordination for humanoid robots.