Search papers, labs, and topics across Lattice.
The paper introduces Rec2PM, a generative recommendation framework that compresses long user interaction histories into compact Preference Memory tokens to address the computational cost and noise accumulation challenges of full-attention models. Rec2PM uses a self-referential teacher-forcing strategy, generating reference memories from a global history view to supervise parallelized recurrent updates, enabling fully parallel training and iterative updates during inference. Experiments on large-scale benchmarks demonstrate that Rec2PM achieves superior accuracy with reduced inference latency and memory footprint, functioning as a denoising Information Bottleneck.
Forget full attention: Rec2PM distills long user histories into compact, interpretable "Preference Memory" tokens, slashing latency and memory while boosting accuracy in generative recommendation.
Generative recommendation (GenRec) models typically model user behavior via full attention, but scaling to lifelong sequences is hindered by prohibitive computational costs and noise accumulation from stochastic interactions. To address these challenges, we introduce Rec2PM, a framework that compresses long user interaction histories into compact Preference Memory tokens. Unlike traditional recurrent methods that suffer from serial training, Rec2PM employs a novel self-referential teacher-forcing strategy: it leverages a global view of the history to generate reference memories, which serve as supervision targets for parallelized recurrent updates. This allows for fully parallel training while maintaining the capability for iterative updates during inference. Additionally, by representing memory as token embeddings rather than extensive KV caches, Rec2PM achieves extreme storage efficiency. Experiments on large-scale benchmarks show that Rec2PM significantly reduces inference latency and memory footprint while achieving superior accuracy compared to full-sequence models. Analysis reveals that the Preference Memory functions as a denoising Information Bottleneck, effectively filtering interaction noise to capture robust long-term interests.