Google ResearchYouTubeMar 30, 2026arXiv:2603.28994

Zero-shot Cross-domain Knowledge Distillation: A Case study on YouTube Music

Srivaths Ranganathan, Nikhil Khani, Shawn Andrews, Chieh Lo, Li Wei, Gergo Varady, Jochen Klingenhoefer, Tim Steele, Bernardo Cunha, Aniruddh Nath, Yanwei Song

AI Summary

This paper investigates zero-shot cross-domain knowledge distillation (KD) to improve ranking models in a low-traffic music recommendation system by transferring knowledge from a high-traffic video recommendation system. The study evaluates various KD techniques in offline and live experiments, demonstrating the effectiveness of this approach for multi-task ranking models. Results show that zero-shot cross-domain KD can significantly improve the performance of ranking models in data-scarce environments without requiring dedicated teacher model training.

Key Contribution

You can boost ranking model performance in low-traffic recommendation systems by directly distilling knowledge from a large-scale, but different, domain like video recommendations.

Abstract

Knowledge Distillation (KD) has been widely used to improve the quality of latency sensitive models serving live traffic. However, applying KD in production recommender systems with low traffic is challenging: the limited amount of data restricts the teacher model size, and the cost of training a large dedicated teacher may not be justified. Cross-domain KD offers a cost-effective alternative by leveraging a teacher from a data-rich source domain, but introduces unique technical difficulties, as the features, user interfaces, and prediction tasks can significantly differ. We present a case study of using zero-shot cross-domain KD for multi-task ranking models, transferring knowledge from a (100x) large-scale video recommendation platform (YouTube) to a music recommendation application with significantly lower traffic. We share offline and live experiment results and present findings evaluating different KD techniques in this setting across two ranking models on the music app. Our results demonstrate that zero-shot cross-domain KD is a practical and effective approach to improve the performance of ranking models on low traffic surfaces.

Inference & Quantization Recommendation & Information Retrieval Training Efficiency & Optimization

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Zero-shot Cross-domain Knowledge Distillation: A Case study on YouTube Music

Related Papers