DeepMindFeb 16, 2026arXiv:2602.15010

BPP: Long-Context Robot Imitation Learning by Focusing on Key History Frames

Max Sobol Mark, Jacky Liang, Maria Attarian, Chuyuan Fu, Debidatta Dwibedi

AI Summary

The paper addresses the challenge of long-context robot imitation learning, where policies often fail due to spurious correlations arising from limited coverage of possible histories during training. To mitigate this, they introduce Big Picture Policies (BPP), which conditions on a minimal set of task-relevant keyframes detected by a vision-language model. BPP significantly reduces distribution shift between training and deployment by projecting diverse rollouts onto a compact set of task-relevant events, achieving 70% higher success rates than the best comparison on real-world manipulation tasks.

Key Contribution

Robots can now learn long-horizon tasks far more effectively by distilling complex histories into a few key visual moments, outperforming standard imitation learning by 70% on real-world tasks.

Abstract

Many robot tasks require attending to the history of past observations. For example, finding an item in a room requires remembering which places have already been searched. However, the best-performing robot policies typically condition only on the current observation, limiting their applicability to such tasks. Naively conditioning on past observations often fails due to spurious correlations: policies latch onto incidental features of training histories that do not generalize to out-of-distribution trajectories upon deployment. We analyze why policies latch onto these spurious correlations and find that this problem stems from limited coverage over the space of possible histories during training, which grows exponentially with horizon. Existing regularization techniques provide inconsistent benefits across tasks, as they do not fundamentally address this coverage problem. Motivated by these findings, we propose Big Picture Policies (BPP), an approach that conditions on a minimal set of meaningful keyframes detected by a vision-language model. By projecting diverse rollouts onto a compact set of task-relevant events, BPP substantially reduces distribution shift between training and deployment, without sacrificing expressivity. We evaluate BPP on four challenging real-world manipulation tasks and three simulation tasks, all requiring history conditioning. BPP achieves 70% higher success rates than the best comparison on real-world evaluations.

Robotics & Embodied AI Training Efficiency & Optimization World Models & Planning

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

BPP: Long-Context Robot Imitation Learning by Focusing on Key History Frames

Related Papers