Search papers, labs, and topics across Lattice.
This paper introduces C$^3$ache, a novel method that accelerates World Action Models (WAMs) by caching and reusing residuals across multiple inference chunks during the denoising process. By leveraging the strong correlation of residuals from one chunk to the next, C$^3$ache significantly reduces computational costs associated with inference without sacrificing performance. Experimental results demonstrate that this approach can achieve up to a 2.5x speedup in total wall-clock inference time while maintaining a comparable task success rate.
C$^3$ache reveals that reusing residuals across inference chunks can dramatically speed up World Action Models, achieving a 2.5x reduction in inference time with minimal impact on performance.
World Action Models (WAMs) generalize better than standard Vision-Language-Action (VLA) policies to novel motions and environments, because a video-modeling objective lets them learn from abundant unlabeled video rather than scarce labeled robot demonstrations. This generalization is computationally expensive. To complete a task, a WAM runs over multiple inference chunks, and each chunk requires a costly denoising process. Existing acceleration methods reduce this cost by caching and reusing computation within a single chunk's denoising trajectory. Our empirical analysis reveals a substantial source of redundancy they overlook: redundancy across chunks. When a robot executes a smooth behavior, the residuals computed at a given denoising step are strongly correlated from one chunk to the next. We introduce C$^3$ache, a training-free method that caches and reuses these residuals across inference chunks at the same denoising step. Experiments on benchmarks with a Fast-WAM backbone show that C$^3$ache achieves up to a $2.5\times$ speedup in total wall-clock inference time, with negligible degradation in task success rate.