Search papers, labs, and topics across Lattice.
This paper introduces a framework for generating LLM-based user interest personas in real-time for a large-scale video recommendation platform, overcoming limitations of traditional methods that rely on structured IDs. By combining existing user interests with novel topics during serving, the approach effectively addresses the exploitation-exploration trade-off. The proposed architecture utilizes knowledge distillation and asynchronous inference to enable cost-efficient online LLM inference, leading to significant improvements in viewer engagement as demonstrated through extensive evaluations and live A/B tests.
Real-time LLM-generated user personas can dramatically enhance viewer engagement by dynamically balancing existing interests with new content recommendations.
Large Language Models (LLMs) offer unprecedented potential for enhancing recommendation systems through their world knowledge and reasoning capabilities. However, existing approaches often rely on structured IDs or offline processing, limiting semantic richness, real-time adaptability, and user-facing interpretability. In this paper, we introduce a novel framework that enables real-time generation of LLM-based user interest personas for a large-scale commercial video recommendation platform. Our method generates natural-language user interest personas that address the exploitation-exploration trade-off by combining the summarization of existing interests with novel topics, directly during serving. To overcome the computational challenges of online LLM inference at a billion-user scale, we design a cost-efficient architecture leveraging knowledge distillation, asynchronous inference, and input optimization via semantically clustered video representations. Extensive offline evaluations, user studies, and live A/B tests demonstrate significant improvements in viewer value. This work bridges the gap between high-level semantic understanding and industrial-scale recommendation, paving the way for more dynamic, explainable, and satisfying personalized experiences.