BMFTR Research Hub 6G-LifeDKFZDLRLudwig-Maximilians- Universität MünchenNCT/UCC DresdenTU DresdenUKDD DresdenUniversity of TromsøMay 21, 2026arXiv:2605.22493

Understanding Multimodal Failure in Action-Chunking Behavioral Cloning

Lorenzo Mazza, Massimiliano Datres, Ariel Rodriguez, Sebastian Bodenstedt, Gitta Kutyniok, Stefanie Speidel

AI Summary

This paper analyzes failure modes in multimodal behavioral cloning (BC) with action-chunking, focusing on latent-variable and action-space generative policies. It finds that latent-variable policies struggle with balancing posterior-prior regularization, where too much regularization loses mode information and too little leads to unreliable sampling. Action-space generative policies are limited by the smoothness of the mapping from a base space to the action space, requiring either sharp transitions or off-support bridge regions to represent multiple modes.

Key Contribution

Multimodal behavioral cloning fails in surprising ways: latent-variable policies can't balance regularization, while action-space generative policies are fundamentally limited by the smoothness of their action mappings.

Abstract

Behavioral cloning becomes difficult when the same observation admits several valid actions. We study this problem for action-chunking policies and show that different multimodal parameterizations fail in different ways. For latent-variable policies, posterior-prior regularization makes deployment-time sampling more reliable, but excessive regularization removes the action-conditioned information needed to distinguish demonstrated modes. Reducing this regularization can preserve mode information, but then success depends on whether the prior covers the relevant latent regions. For action-space generative policies, multimodality is constrained by the smoothness of the base-to-action transport: a map with small Lipschitz constant cannot assign substantial probability to many well-separated modes. Covering many modes therefore requires either sharp transitions in base space or off-support bridge regions in action space. Experiments on synthetic multimodal tasks and robotic simulation benchmarks support these mechanisms.

Multimodal Models Robotics & Embodied AI Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Understanding Multimodal Failure in Action-Chunking Behavioral Cloning

Related Papers