Search papers, labs, and topics across Lattice.
This paper introduces WorldLines, a novel benchmark designed to evaluate long-horizon memory capabilities in embodied agents within dynamic household environments. By constructing temporally extended traces that integrate dialogues, actions, and state changes, the benchmark enables comprehensive assessments of memory use in task planning and execution. The proposed ObsMem framework enhances memory management by maintaining visibility-aware memories, revealing significant challenges in translating long-term memory into effective embodied actions while providing a robust reference architecture for future research.
Long-horizon embodied agents struggle to translate long-term memory into actionable plans, exposing critical gaps in current benchmarks and methodologies.
To assist humans over extended periods in real homes, embodied agents must remember user routines, world states, and past interactions. Existing long-term memory benchmarks mainly evaluate language-centric retrieval and question answering, while embodied benchmarks often focus on short-horizon task execution without testing long-term memory use in dynamic environments. We introduce WorldLines, a project-driven benchmark for long-horizon embodied household assistance. It constructs temporally extended household traces with dialogues, actions, execution feedback, object and device state changes, and converts them into evidence-linked samples for Memory QA and Embodied Task Planning. We further propose ObsMem, an observer-grounded memory framework that maintains visibility-aware memories and action-native state trails for state-aware decisions. Experiments reveal persistent challenges in partial observability, overwritten world states, and translating long-term memory into embodied plans, while ObsMem offers a stronger reference architecture for this setting.