MIT CSAILUW-MadisonJun 10, 2026arXiv:2606.12475

Learning to Assist: Collaborative VLAs for Implicit Human-Robot Collaboration

AI Summary

This study explores the use of end-to-end trained vision-language-action (VLA) models for enhancing human-robot collaboration (HRC) by addressing the limitations of traditional hand-engineered systems. The researchers identify a critical failure mode related to action-chunking policies, where premature assistive behavior occurs due to action demonstration leakage, particularly in longer execution tasks. By implementing an inference-time steering method, they demonstrate improved performance in real-world collaborative tasks, resulting in faster collaboration and reduced failures in a user study involving 16 participants.

Key Contribution

Action-chunking policies can lead to premature robot assistance, but a novel steering method effectively mitigates this issue, enhancing collaboration efficiency.

Abstract

Human-robot collaboration (HRC) combines the complementary strengths of humans and robots to improve task efficiency. However, many existing collaborative systems rely on hand-engineered pipelines, limiting their scalability and flexibility for new tasks. In this work, we show that models trained end-to-end with imitation learning, specifically vision-language-action (VLA) models, can support collaborative manipulation, and characterize the key factors affecting their real-world performance. We evaluate two state-of-the-art models and identify a failure mode of action-chunking policies in implicit HRC, where demonstration action leakage (i.e., action chunks crossing latent task transitions) can cause premature assistive behavior. We find that this issue increases with longer execution horizons and occurs in real-world collaborative VLA systems, such as when a robot attempts to hand over a tool before the person is ready. We propose an inference-time steering method to mitigate these erroneous assistive actions while preserving policy performance. Finally, through a 16-participant user study on a long-horizon collaborative assembly task, we show that steering enables a longer execution horizon while mitigating premature assistance, leading to faster collaboration and fewer failures compared to a shorter-horizon policy.

Multimodal Models Robotics & Embodied AI

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Learning to Assist: Collaborative VLAs for Implicit Human-Robot Collaboration

Related Papers