KuaishouJun 23, 2026arXiv:2606.24605

ScaleToT: Generalizing Structured LLM Reasoning for Billion-Scale Low-Activity User Modeling

Tianbao Ma, Chang Xi, Yichuan Zou, Chengen Li, Linxun Chen, Zilong Lu, Yanan Niu, Zhaojie Liu, Han Li, Kun Gai

AI Summary

ScaleToT addresses the challenge of user modeling for billions of low-activity users by leveraging structured reasoning from a small subset of LLM-processed data. It employs a Tree-of-Thought refinement procedure to enhance reasoning reliability and trains a lightweight profile encoder to extend these insights to the broader user population without the need for extensive LLM inference. The method demonstrated a significant increase in lifetime value (LTV) prediction, achieving a 6.738% lift in LT30 during a randomized online A/B test while minimizing computational costs.

Key Contribution

A novel approach that boosts LTV prediction for billions of low-activity users by transforming sparse profiles into actionable insights without heavy LLM reliance.

Abstract

Accurate user modeling often depends on rich interaction histories, which are unavailable for billions of low-activity users. Large Language Models (LLMs) can infer latent user states from static profiles, but this reasoning becomes unreliable when profiles are sparse, and applying an LLM to billions of users is prohibitively expensive. We present ScaleToT, which learns structured reasoning from a small LLM-processed subset and extends it to the broader low-activity user population. To improve reasoning reliability, ScaleToT constructs typed user-state chains with a bounded entropy-guided Tree-of-Thought (ToT) refinement procedure. To make this structured reasoning usable from sparse profiles, the teacher-curated chains are used to train a student model on static profiles through supervised fine-tuning (SFT) and Outcome-Driven Segment-Aware Implicit Reward Policy Optimization (OSIPO). ScaleToT then transfers the student's reasoning representations to a lightweight profile encoder, providing shared reasoning signals for the remaining users without LLM inference. We evaluate ScaleToT on lifetime value (LTV) prediction in a billion-scale advertising deployment. A randomized online A/B test increased LT30 by 6.738\%, while offline reasoning covered only 7.32\% of the potential population, greatly reducing compute cost compared with full-population reasoning.

Reasoning & Chain-of-Thought Recommendation & Information Retrieval

Citation Metrics

Citations0

Influential citations0

References0

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

ScaleToT: Generalizing Structured LLM Reasoning for Billion-Scale Low-Activity User Modeling

Related Papers