Search papers, labs, and topics across Lattice.
The paper addresses limitations of flat-sequence generative recommenders like HSTU, which fail to capture hierarchical temporal structure in user behavior and suffer from computational inefficiency due to dense attention. To overcome these limitations, the authors propose HPGR, a two-stage framework that first uses session-based Masked Item Modeling (MIM) for structure-aware pre-training and then employs Preference-Guided Sparse Attention for efficient fine-tuning. Experiments on a large-scale industrial dataset and A/B testing demonstrate that HPGR achieves state-of-the-art performance compared to HSTU and MTGR.
Generative recommenders get a major upgrade: HPGR leverages hierarchical pre-training and sparse attention to dramatically improve performance and efficiency by explicitly modeling the structure of user behavior.
Generative Recommenders (GRs), exemplified by the Hierarchical Sequential Transduction Unit (HSTU), have emerged as a powerful paradigm for modeling long user interaction sequences. However, we observe that their "flat-sequence" assumption overlooks the rich, intrinsic structure of user behavior. This leads to two key limitations: a failure to capture the temporal hierarchy of session-based engagement, and computational inefficiency, as dense attention introduces significant noise that obscures true preference signals within semantically sparse histories, which deteriorates the quality of the learned representations. To this end, we propose a novel framework named HPGR (Hierarchical and Preference-aware Generative Recommender), built upon a two-stage paradigm that injects these crucial structural priors into the model to handle the drawback. Specifically, HPGR comprises two synergistic stages. First, a structure-aware pre-training stage employs a session-based Masked Item Modeling (MIM) objective to learn a hierarchically-informed and semantically rich item representation space. Second, a preference-aware fine-tuning stage leverages these powerful representations to implement a Preference-Guided Sparse Attention mechanism, which dynamically constrains computation to only the most relevant historical items, enhancing both efficiency and signal-to-noise ratio. Empirical experiments on a large-scale proprietary industrial dataset from APPGallery and an online A/B test verify that HPGR achieves state-of-the-art performance over multiple strong baselines, including HSTU and MTGR.