Search papers, labs, and topics across Lattice.
This paper introduces ProbeFlow, a training-free adaptive inference framework for Flow Matching-based Vision-Language-Action (VLA) models to reduce action decoding latency in robotics. ProbeFlow dynamically schedules integration steps based on trajectory complexity, measured by the cosine similarity between initial and lookahead velocity vectors, pruning redundant network evaluations. Experiments on MetaWorld and LIBERO benchmarks demonstrate significant speedups (14.8x action decoding, 2.8x end-to-end) without compromising manipulation success, validated in real-world deployments.
Robot control gets a whole lot faster: ProbeFlow slashes action decoding latency by 14.8x in Vision-Language-Action models, all without retraining.
Recent Vision-Language-Action (VLA) models equipped with Flow Matching (FM) action heads achieve state-of-the-art performance in complex robot manipulation. However, the multi-step iterative ODE solving required by FM introduces inference latency that precludes responsive physical control. While current acceleration efforts optimize the Vision-Language Model (VLM) backbone, the action head bottleneck remains overlooked. To address this, we propose ProbeFlow, a training-free adaptive inference framework tai- lored for continuous robotic control. By evaluating geometric trajectory complexity via the cosine similarity between initial and lookahead velocity vectors, ProbeFlow dynamically sched- ules integration steps to prune redundant network evaluations. On the MetaWorld benchmark, it accelerates action decoding by 14.8x (reducing average steps from N = 50 to 2.6) and cuts end-to-end system latency by 2.8x without compromising the manipulation success rate. On the long-horizon LIBERO benchmark, the probe automatically allocates a denser schedule to navigate semantic bottlenecks, effectively resolving the flow solver delay. Real-world physical deployments confirm that ProbeFlow successfully mitigates action decoding latency while ensuring execution stability, offering a highly practical solution for low-latency continuous generative policies.