Tencent AIApr 15, 2026arXiv:2604.13737

TokenFormer: Unify the Multi-Field and Sequential Recommendation Worlds

Yifeng Zhou, Yifeng Zhou, Yuehong Hu, Yuehong Hu, Zhixiang Feng, Junwei Pan, Junwei Pan, Kaihui Wu, Kai-Chiang Wu, Hanyong Li, Shangyu Zhang, Shudong Huang, Zhangbin Zhu, Zhangbin Zhu, Chengguo Yin, Haijie Gu, Jie Jiang

AI Summary

The paper identifies a "Sequential Collapse Propagation" (SCP) problem when naively unifying multi-field feature interaction and sequential recommendation models, where non-sequence fields cause dimensional collapse of sequence features. To address this, they propose TokenFormer, a unified architecture incorporating a Bottom-Full-Top-Sliding (BFTS) attention scheme and Non-Linear Interaction Representation (NLIR). Experiments show TokenFormer achieves state-of-the-art performance and improved robustness against SCP on public benchmarks and a real-world advertising platform.

Key Contribution

Unifying feature interaction and sequential recommendation models can lead to "Sequential Collapse Propagation," but TokenFormer's novel attention and interaction mechanisms solve it.

Abstract

Recommender systems have historically developed along two largely independent paradigms: feature interaction models for modeling correlations among multi-field categorical features, and sequential models for capturing user behavior dynamics from historical interaction sequences. Although recent trends attempt to bridge these paradigms within shared backbones, we empirically reveal that naive unifying these two branches may lead to a failure mode of Sequential Collapse Propagation (SCP). That is, the interaction with those dimensionally ill non-sequence fields leads to the dimensional collapse of the sequence features. To overcome this challenge, we propose TokenFormer, a unified recommendation architecture with the following innovations. First, we introduce a Bottom-Full-Top-Sliding (BFTS) attention scheme, which applies full self-attention in the lower layers and shrinking-window sliding attention in the upper layers. Second, we introduce a Non-Linear Interaction Representation (NLIR) that applies one-sided non-linear multiplicative transformations to the hidden states. Extensive experiments on public benchmarks and Tencent's advertising platform demonstrate state-of-the-art performance, while detailed analysis confirm that TokenFormer significantly improves dimensional robustness and representation discriminability under unified modeling.

Architecture Design (Transformers, SSMs, MoE)Recommendation & Information Retrieval

Citation Metrics

Citations0

Influential citations0

References51

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

TokenFormer: Unify the Multi-Field and Sequential Recommendation Worlds

Related Papers