Search papers, labs, and topics across Lattice.
The paper introduces RF-Mem, a novel memory retrieval mechanism for personalized LLMs that mimics human cognitive processes by employing a dual-path approach: Familiarity-based retrieval for high-confidence matches and Recollection-based retrieval for uncertain cases. The Recollection path iteratively expands evidence in the embedding space by clustering candidate memories and applying alpha-mixing with the query, simulating contextual reconstruction. Experiments on three benchmarks demonstrate that RF-Mem outperforms one-shot retrieval and full-context reasoning, achieving better performance under budget and latency constraints.
LLMs can now retrieve memories like humans, using a fast familiarity check or a deliberate recollection process, leading to better personalization without overwhelming the model with irrelevant context.
Personalized large language models (LLMs) rely on memory retrieval to incorporate user-specific histories, preferences, and contexts. Existing approaches either overload the LLM by feeding all the user's past memory into the prompt, which is costly and unscalable, or simplify retrieval into a one-shot similarity search, which captures only surface matches. Cognitive science, however, shows that human memory operates through a dual process: Familiarity, offering fast but coarse recognition, and Recollection, enabling deliberate, chain-like reconstruction for deeply recovering episodic content. Current systems lack both the ability to perform recollection retrieval and mechanisms to adaptively switch between the dual retrieval paths, leading to either insufficient recall or the inclusion of noise. To address this, we propose RF-Mem (Recollection-Familiarity Memory Retrieval), a familiarity uncertainty-guided dual-path memory retriever. RF-Mem measures the familiarity signal through the mean score and entropy. High familiarity leads to the direct top-K Familiarity retrieval path, while low familiarity activates the Recollection path. In the Recollection path, the system clusters candidate memories and applies alpha-mix with the query to iteratively expand evidence in embedding space, simulating deliberate contextual reconstruction. This design embeds human-like dual-process recognition into the retriever, avoiding full-context overhead and enabling scalable, adaptive personalization. Experiments across three benchmarks and corpus scales demonstrate that RF-Mem consistently outperforms both one-shot retrieval and full-context reasoning under fixed budget and latency constraints. Our code can be found in the Reproducibility Statement.