Search papers, labs, and topics across Lattice.
The paper introduces Dual-Rerank, a generative reranking framework designed to optimize user experience in Kuaishou's large-scale short video search. It bridges the gap between autoregressive (AR) and non-autoregressive (NAR) models using Sequential Knowledge Distillation and tackles optimization challenges with List-wise Decoupled Reranking Optimization (LDRO) for stable online reinforcement learning. A/B testing shows Dual-Rerank significantly improves user satisfaction and watch time while reducing inference latency compared to AR baselines.
Kuaishou's new Dual-Rerank system slashes latency and boosts user engagement by fusing the best of autoregressive and non-autoregressive generative reranking, proving you can have your cake and eat it too in billion-scale search.
Kuaishou serves over 400 million daily active users, processing hundreds of millions of search queries daily against a repository of tens of billions of short videos. As the final decision layer, the reranking stage determines user experience by optimizing whole-page utility. While traditional score-and-sort methods fail to capture combinatorial dependencies, Generative Reranking offers a superior paradigm by directly modeling the permutation probability. However, deploying Generative Reranking in such a high-stakes environment faces a fundamental dual dilemma: 1) the structural trade-off where Autoregressive (AR) models offer superior Sequential modeling but suffer from prohibitive latency, versus Non-Autoregressive (NAR) models that enable efficiency but lack dependency capturing; 2) the optimization gap where Supervised Learning faces challenges in directly optimizing whole-page utility, while Reinforcement Learning (RL) struggles with instability in high-throughput data streams. To resolve this, we propose Dual-Rerank, a unified framework designed for industrial reranking that bridges the structural gap via Sequential Knowledge Distillation and addresses the optimization gap using List-wise Decoupled Reranking Optimization (LDRO) for stable online RL. Extensive A/B testing on production traffic demonstrates that Dual-Rerank achieves State-of-the-Art performance, significantly improving User satisfaction and Watch Time while drastically reducing inference latency compared to AR baselines.