KuaishouNTUMar 5, 2026arXiv:2603.05134

LBM: Hierarchical Large Auto-Bidding Model via Reasoning and Acting

Yewen Li, Zhiyi Lyu, Peng Jiang, Qingpeng Cai, Fei Pan, Bo An

AI Summary

The paper introduces a hierarchical Large auto-Bidding Model (LBM) composed of a high-level LBM-Think for reasoning and a low-level LBM-Act for action generation to improve auto-bidding strategies. A dual embedding mechanism fuses language and numerical inputs for training LBM-Act, while a novel offline reinforcement fine-tuning technique called GQPO mitigates hallucinations in LBM-Think. Experiments demonstrate that the LBM, particularly with a generative backbone, achieves superior performance and generalization in auto-bidding tasks compared to existing methods.

Key Contribution

LLMs can master auto-bidding in dynamic ad environments, but only if you give them a hierarchical architecture and offline RL fine-tuning to avoid hallucinating suboptimal decisions.

Abstract

The growing scale of ad auctions on online advertising platforms has intensified competition, making manual bidding impractical and necessitating auto-bidding to help advertisers achieve their economic goals. Current auto-bidding methods have evolved to use offline reinforcement learning or generative methods to optimize bidding strategies, but they can sometimes behave counterintuitively due to the black-box training manner and limited mode coverage of datasets, leading to challenges in understanding task status and generalization in dynamic ad environments. Large language models (LLMs) offer a promising solution by leveraging prior human knowledge and reasoning abilities to improve auto-bidding performance. However, directly applying LLMs to auto-bidding faces difficulties due to the need for precise actions in competitive auctions and the lack of specialized auto-bidding knowledge, which can lead to hallucinations and suboptimal decisions. To address these challenges, we propose a hierarchical Large autoBidding Model (LBM) to leverage the reasoning capabilities of LLMs for developing a superior auto-bidding strategy. This includes a high-level LBM-Think model for reasoning and a low-level LBM-Act model for action generation. Specifically, we propose a dual embedding mechanism to efficiently fuse two modalities, including language and numerical inputs, for language-guided training of the LBM-Act; then, we propose an offline reinforcement fine-tuning technique termed GQPO for mitigating the LLM-Think's hallucinations and enhancing decision-making performance without simulation or real-world rollout like previous multi-turn LLM-based methods. Experiments demonstrate the superiority of a generative backbone based on our LBM, especially in an efficient training manner and generalization ability.

Reasoning & Chain-of-Thought Recommendation & Information Retrieval Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References54

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

LBM: Hierarchical Large Auto-Bidding Model via Reasoning and Acting

Related Papers