Microsoft ResearchBing AdsKuaishouWashington StateFeb 2, 2026arXiv:2602.01563

AdNanny: One Reasoning LLM for All Offline Ads Recommendation Tasks

Nan Hu, Han Li, Jimeng Sun, Lu Wang, Fangkai Yang, Bo Qiao, Pu Zhao, David Dai, Mengyu Liu, Yuefeng Zhan, Jianjin Zhang, Weihao Han, Allen Sun, Qing-Yu Lin, S. Rajmohan, Dongmei Zhang, Denvy Deng, Feng Sun, Qi Zhang

AI Summary

The paper introduces AdNanny, a unified reasoning-centric LLM fine-tuned from a 671B DeepSeek-R1 checkpoint for various offline advertising tasks. They construct reasoning-augmented corpora with structured supervision and natural language explanations, and then use multi-task supervised fine-tuning with adaptive reweighting followed by reinforcement learning to align with online advertising objectives. Deployed in Bing Ads, AdNanny reduces manual labeling effort and improves accuracy, demonstrating a scalable and cost-effective solution by consolidating task-specific models.

Key Contribution

Ditch the army of task-specific models: AdNanny shows a single, reasoning-centric LLM can handle diverse offline advertising tasks with improved accuracy and reduced manual effort.

Abstract

Large Language Models (LLMs) have shown strong capabilities in Natural Language Understanding and Generation, but deploying them directly in online advertising systems is often impractical due to strict millisecond-level latency constraints. This has motivated the use of LLMs offline to improve retrieval, ranking, and recommendation models. Existing solutions typically fine-tune separate LLMs for individual tasks such as query-ad relevance labeling, keyword-based query generation, and user profiling. This results in redundant models, high maintenance cost, and limited performance gains despite substantial overlap in domain knowledge and reasoning patterns. We introduce AdNanny, a unified reasoning-centric LLM that serves as a shared backbone for offline advertising tasks. AdNanny is obtained by fine-tuning a public 671B-parameter DeepSeek-R1 checkpoint using a scalable training system that supports hybrid dense-MoE parallelism. We construct reasoning-augmented corpora that pair structured supervision with step-by-step natural language explanations. A multi-task supervised fine-tuning stage with adaptive reweighting enables AdNanny to handle diverse labeling and generation tasks in a consistent reasoning format. This is followed by reinforcement learning using downstream advertising metrics to align model behavior with online retrieval and ranking objectives. AdNanny is deployed in production within Bing Ads, where it significantly reduces manual labeling effort and improves accuracy across multiple offline tasks. By consolidating many task-specific models into a single reasoning-centric foundation model, AdNanny provides a scalable and cost-effective solution for large-scale advertising systems.

Natural Language Processing Reasoning & Chain-of-Thought Recommendation & Information Retrieval

Citation Metrics

Citations0

Influential citations0

References36

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

AdNanny: One Reasoning LLM for All Offline Ads Recommendation Tasks

Related Papers