Tsinghua AIRayNeo.AIShenzhen UniversitySJTUSUSTechApr 13, 2026arXiv:2604.11182

Evaluating Memory Capability in Continuous Lifelog Scenario

Jianjie Zheng, Zhichen Liu, Zhanyu Shen, Jingxiang Qu, Guanhua Chen, Yile Wang, Yang Xu, Sijie Cheng

AI Summary

The authors introduce LifeDialBench, a new benchmark for evaluating memory systems in continuous lifelogging scenarios, comprising EgoMem (based on real egocentric videos) and LifeMem (using simulated virtual communities). To prevent temporal leakage, they propose an online evaluation protocol that respects temporal causality, evaluating systems in a streaming fashion. Experiments show that surprisingly, complex memory systems are outperformed by a simple RAG baseline, suggesting that high-fidelity context preservation is crucial for lifelogging applications.

Key Contribution

Current memory systems, despite their complexity, are surprisingly worse than naive RAG when applied to continuous lifelogging scenarios, revealing a critical need for better context preservation.

Abstract

Nowadays, wearable devices can continuously lifelog ambient conversations, creating substantial opportunities for memory systems. However, existing benchmarks primarily focus on online one-on-one chatting or human-AI interactions, thus neglecting the unique demands of real-world scenarios. Given the scarcity of public lifelogging audio datasets, we propose a hierarchical synthesis framework to curate \textbf{\textsc{LifeDialBench}}, a novel benchmark comprising two complementary subsets: \textbf{EgoMem}, built on real-world egocentric videos, and \textbf{LifeMem}, constructed using simulated virtual community. Crucially, to address the issue of temporal leakage in traditional offline settings, we propose an \textbf{Online Evaluation} protocol that strictly adheres to temporal causality, ensuring systems are evaluated in a realistic streaming fashion. Our experimental results reveal a counterintuitive finding: current sophisticated memory systems fail to outperform a simple RAG-based baseline. This highlights the detrimental impact of over-designed structures and lossy compression in current approaches, emphasizing the necessity of high-fidelity context preservation for lifelog scenarios. We release our code and data at https://github.com/qys77714/LifeDialBench.

Data Curation & Synthetic Data Eval Frameworks & Benchmarks Speech & Audio

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Evaluating Memory Capability in Continuous Lifelog Scenario

Related Papers