Search papers, labs, and topics across Lattice.
The paper introduces a hierarchical Large auto-Bidding Model (LBM) composed of a high-level LBM-Think for reasoning and a low-level LBM-Act for action generation to improve auto-bidding strategies. A dual embedding mechanism fuses language and numerical inputs for training LBM-Act, while a novel offline reinforcement fine-tuning technique called GQPO mitigates hallucinations in LBM-Think. Experiments demonstrate that the LBM, particularly with a generative backbone, achieves superior performance and generalization in auto-bidding tasks compared to existing methods.
LLMs can master auto-bidding in dynamic ad environments, but only if you give them a hierarchical architecture and offline RL fine-tuning to avoid hallucinating suboptimal decisions.
The growing scale of ad auctions on online advertising platforms has intensified competition, making manual bidding impractical and necessitating auto-bidding to help advertisers achieve their economic goals. Current auto-bidding methods have evolved to use offline reinforcement learning or generative methods to optimize bidding strategies, but they can sometimes behave counterintuitively due to the black-box training manner and limited mode coverage of datasets, leading to challenges in understanding task status and generalization in dynamic ad environments. Large language models (LLMs) offer a promising solution by leveraging prior human knowledge and reasoning abilities to improve auto-bidding performance. However, directly applying LLMs to auto-bidding faces difficulties due to the need for precise actions in competitive auctions and the lack of specialized auto-bidding knowledge, which can lead to hallucinations and suboptimal decisions. To address these challenges, we propose a hierarchical Large autoBidding Model (LBM) to leverage the reasoning capabilities of LLMs for developing a superior auto-bidding strategy. This includes a high-level LBM-Think model for reasoning and a low-level LBM-Act model for action generation. Specifically, we propose a dual embedding mechanism to efficiently fuse two modalities, including language and numerical inputs, for language-guided training of the LBM-Act; then, we propose an offline reinforcement fine-tuning technique termed GQPO for mitigating the LLM-Think's hallucinations and enhancing decision-making performance without simulation or real-world rollout like previous multi-turn LLM-based methods. Experiments demonstrate the superiority of a generative backbone based on our LBM, especially in an efficient training manner and generalization ability.