May 25, 2026arXiv:2605.25537

Action-Prior Denoising for Smooth Real-Time Chunking

Dongyang Liu, Zhaowen Zheng, Longxu Zhang, Yixuan Liu, Hao Wan

AI Summary

This paper introduces Soft Real-Time Chunking (RTC), a training-time generalization of RTC based on action-prior denoising, to address the limitations of binary prefix masks in standard RTC which under-models asynchronous execution. Soft RTC constructs corrupted overlap tokens from partially denoised states, providing a more nuanced constraint during training. Experiments on Kinetix levels demonstrate that Soft RTC achieves comparable solve rates to hard RTC while significantly reducing action delta and jerk, and a real-robot sorting study shows improved completion rates and smoother actions.

Key Contribution

Denoising-based training unlocks smoother, more reliable real-time robot control by better modeling asynchronous execution during chunked action planning.

Abstract

Real-time chunking (RTC) lets chunked action policies operate under inference delay by conditioning a newly generated action chunk on actions already committed by the previous chunk. Training-time RTC simulates this delay during learning and avoids expensive guidance at deployment, but its binary prefix mask treats all non-prefix tokens as fully unconstrained. This under-models asynchronous execution: early overlap actions are fixed, while later overlap actions remain editable but should still stay close to the previous plan. We propose Soft RTC, a training-time RTC generalization based on action-prior denoising. Soft RTC constructs corrupted overlap tokens from partially denoised states instead of pure noise and injects the aligned previous chunk as the same prior during inference through a lightweight token-wise blending rule. On the 12 released large Kinetix levels, a short soft window nearly matches hard training-time RTC in overall solve rate (0.809 vs. 0.815), while a medium window reduces high-delay action delta and jerk by 9.1% and 9.6% relative to hard RTC. Both variants keep near-naive runtime, unlike inference-time RTC baselines. A small preliminary real-robot sorting study provides additional evidence that training-time RTC can improve completion and that Soft RTC gives the lowest commanded-action finite-difference metrics among the tested policies.

Robotics & Embodied AI Tool Use & Agents World Models & Planning

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Action-Prior Denoising for Smooth Real-Time Chunking

Related Papers