Stanford HAIUMichMar 4, 2026arXiv:2603.04639

RoboMME: Benchmarking and Understanding Memory for Robotic Generalist Policies

Yinpei Dai, H. Fu, Hongze Fu, Jayjun Lee, Yuejiang Liu, Haoran Zhang, Jianing Yang, Chelsea Finn, Nima Fazeli, Joyce Chai

AI Summary

RoboMME is introduced as a large-scale benchmark to evaluate vision-language-action (VLA) models on long-horizon, history-dependent robotic manipulation tasks requiring memory. The benchmark includes 16 tasks designed to assess temporal, spatial, object, and procedural memory. Experiments using 14 memory-augmented VLA variants built on the π0.5 backbone reveal that the effectiveness of different memory representations varies significantly across tasks.

Key Contribution

Turns out, the best memory design for robotic manipulation depends heavily on the task, with no single architecture dominating across the board.

Abstract

Memory is critical for long-horizon and history-dependent robotic manipulation. Such tasks often involve counting repeated actions or manipulating objects that become temporarily occluded. Recent vision-language-action (VLA) models have begun to incorporate memory mechanisms; however, their evaluations remain confined to narrow, non-standardized settings. This limits their systematic understanding, comparison, and progress measurement. To address these challenges, we introduce RoboMME: a large-scale standardized benchmark for evaluating and advancing VLA models in long-horizon, history-dependent scenarios. Our benchmark comprises 16 manipulation tasks constructed under a carefully designed taxonomy that evaluates temporal, spatial, object, and procedural memory. We further develop a suite of 14 memory-augmented VLA variants built on the {\pi}0.5 backbone to systematically explore different memory representations across multiple integration strategies. Experimental results show that the effectiveness of memory representations is highly task-dependent, with each design offering distinct advantages and limitations across different tasks. Videos and code can be found at our website https://robomme.github.io.

Eval Frameworks & Benchmarks Multimodal Models Robotics & Embodied AI

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

RoboMME: Benchmarking and Understanding Memory for Robotic Generalist Policies

Related Papers