Tsinghua AIFeb 13, 2026arXiv:2602.12978

Learning Native Continuation for Action Chunking Flow Policies

Yufeng Liu, Hang Yu, Juntu Zhao, Bocheng Li, Di Zhang, Mingzhu Li, Wenxuan Wu, Yingdong Hu, Junyuan Xie, Junliang Guo, Dequan Wang

AI Summary

This paper introduces Legato, a training-time continuation method for action-chunked flow-based Vision Language Action (VLA) policies that addresses discontinuities at chunk boundaries. Legato initializes denoising from a schedule-shaped mixture of known actions and noise, exposing the model to partial action information and reshaping the learned flow dynamics for consistency between training and inference. Experiments demonstrate that Legato produces smoother trajectories, reduces spurious multimodal switching, and improves task completion time compared to Real-Time Chunking (RTC) across five manipulation tasks.

Key Contribution

Legato makes action-chunked VLA policies smoother and faster by learning a native continuation that eliminates the discontinuities introduced by chunking.

Abstract

Action chunking enables Vision Language Action (VLA) models to run in real time, but naive chunked execution often exhibits discontinuities at chunk boundaries. Real-Time Chunking (RTC) alleviates this issue but is external to the policy, leading to spurious multimodal switching and trajectories that are not intrinsically smooth. We propose Legato, a training-time continuation method for action-chunked flow-based VLA policies. Specifically, Legato initializes denoising from a schedule-shaped mixture of known actions and noise, exposing the model to partial action information. Moreover, Legato reshapes the learned flow dynamics to ensure that the denoising process remains consistent between training and inference under per-step guidance. Legato further uses randomized schedule condition during training to support varying inference delays and achieve controllable smoothness. Empirically, Legato produces smoother trajectories and reduces spurious multimodal switching during execution, leading to less hesitation and shorter task completion time. Extensive real-world experiments show that Legato consistently outperforms RTC across five manipulation tasks, achieving approximately 10% improvements in both trajectory smoothness and task completion time.

Multimodal Models Robotics & Embodied AI Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References46

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Learning Native Continuation for Action Chunking Flow Policies

Related Papers