Jun 17, 2026arXiv:2606.18847

WorldLines: Benchmarking and Modeling Long-Horizon Stateful Embodied Agents

Yehang Zhang, Jianchong Su, Haojian Huang, Yifan Chang, Tianhao Zhou, Xinli Xu, Yingjie Xu, Yinchuan Li, Zexi Li, Ying-Cong Chen

AI Summary

This paper introduces WorldLines, a novel benchmark designed to evaluate long-horizon memory capabilities in embodied agents within dynamic household environments. By constructing temporally extended traces that integrate dialogues, actions, and state changes, the benchmark enables comprehensive assessments of memory use in task planning and execution. The proposed ObsMem framework enhances memory management by maintaining visibility-aware memories, revealing significant challenges in translating long-term memory into effective embodied actions while providing a robust reference architecture for future research.

Key Contribution

Long-horizon embodied agents struggle to translate long-term memory into actionable plans, exposing critical gaps in current benchmarks and methodologies.

Abstract

To assist humans over extended periods in real homes, embodied agents must remember user routines, world states, and past interactions. Existing long-term memory benchmarks mainly evaluate language-centric retrieval and question answering, while embodied benchmarks often focus on short-horizon task execution without testing long-term memory use in dynamic environments. We introduce WorldLines, a project-driven benchmark for long-horizon embodied household assistance. It constructs temporally extended household traces with dialogues, actions, execution feedback, object and device state changes, and converts them into evidence-linked samples for Memory QA and Embodied Task Planning. We further propose ObsMem, an observer-grounded memory framework that maintains visibility-aware memories and action-native state trails for state-aware decisions. Experiments reveal persistent challenges in partial observability, overwritten world states, and translating long-term memory into embodied plans, while ObsMem offers a stronger reference architecture for this setting.

Eval Frameworks & Benchmarks Robotics & Embodied AI World Models & Planning

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

WorldLines: Benchmarking and Modeling Long-Horizon Stateful Embodied Agents

Related Papers