Search papers, labs, and topics across Lattice.
The paper introduces EviMem, a novel framework for long-term conversational memory that iteratively retrieves evidence by explicitly diagnosing and addressing evidence gaps in the accumulated retrieval set. EviMem combines IRIS, a closed-loop framework for detecting and diagnosing missing evidence, with LaceMem, a coarse-to-fine memory architecture. Experiments on the LoCoMo dataset demonstrate that EviMem significantly improves Judge Accuracy on temporal and multi-hop questions compared to existing methods, while also reducing latency.
Explicitly diagnosing what's missing from a retrieval set unlocks substantial gains in long-term conversational memory, boosting accuracy on temporal and multi-hop questions by up to 20% while simultaneously reducing latency.
Long-term conversational memory requires retrieving evidence scattered across multiple sessions, yet single-pass retrieval fails on temporal and multi-hop questions. Existing iterative methods refine queries via generated content or document-level signals, but none explicitly diagnoses the evidence gap, namely what is missing from the accumulated retrieval set, leaving query refinement untargeted. We present EviMem, combining IRIS (Iterative Retrieval via Insufficiency Signals), a closed-loop framework that detects evidence gaps through sufficiency evaluation, diagnoses what is missing, and drives targeted query refinement, with LaceMem (Layered Architecture for Conversational Evidence Memory), a coarse-to-fine memory hierarchy supporting fine-grained gap diagnosis. On LoCoMo, EviMem improves Judge Accuracy over MIRIX on temporal (73.3% to 81.6%) and multi-hop (65.9% to 85.2%) questions at 4.5x lower latency. Code: https://github.com/AIGeeksGroup/EviMem.