Search papers, labs, and topics across Lattice.
This paper introduces GloRank, a generative reranking framework for recommender systems that transitions from local index selection to generating global identifiers, addressing the limitations of semantically inconsistent action spaces in existing methods. By representing items as sequences of discrete tokens, GloRank reformulates the reranking task into a token generation challenge, allowing for a consistent evaluation of items against a global standard. Extensive experiments and online A/B tests reveal that GloRank significantly outperforms state-of-the-art approaches, particularly excelling in cold-start scenarios.
Reranking in recommender systems can be revolutionized by shifting from local indices to generating global identifiers, enhancing robustness and user satisfaction.
In modern recommender systems, list-wise reranking serves as a critical phase within the multi-stage pipeline, finalizing the exposed item sequence and directly impacting user satisfaction by modeling complex intra-list item dependencies. Existing methods typically formulate this task as selecting indices from the local input list. However, this approach suffers from a semantically inconsistent action space: the same output neuron (logits) represents different items across different samples, preventing the model from establishing a stable, intrinsic understanding of the items. To address this, we propose GloRank (Global Action Space Ranker), a generative framework that shifts reranking from selecting local indices to generating global identifiers. Specifically, we represent items as sequences of discrete tokens and reformulate reranking as a token generation task. This design effectively decouples the scoring mechanism from the variable input order, ensuring that items are evaluated against a consistent global standard. We further enhance this with a two-stage optimization pipeline: a supervised pre-training phase to initialize the model with high-quality demonstrations, followed by a reinforcement learning-based post-training phase to directly maximize list-wise utility. Extensive experiments on two public benchmarks and a large-scale industrial dataset, coupled with online A/B tests, demonstrate that GloRank consistently outperforms state-of-the-art baselines and achieves superior robustness in cold-start scenarios.