Search papers, labs, and topics across Lattice.
This paper introduces Capability Synergy (CS3), a framework to improve the effectiveness of two-tower retrieval models in recommender systems without sacrificing online efficiency. CS3 enhances the towers through cycle-adaptive feature denoising, cross-tower synchronization for better embedding alignment, and cascade-model sharing to leverage knowledge from downstream models. Experiments demonstrate significant performance gains on public datasets and in a large-scale advertising system, achieving up to 8.36% revenue improvement while maintaining millisecond-level latency.
Two-tower recommendation models can get a major online performance boost without latency penalties, thanks to a new capability synergy framework.
To balance effectiveness and efficiency in recommender systems, multi-stage pipelines commonly use lightweight two-tower models for large-scale candidate retrieval. However, the isolated two-tower architecture restricts representation capacity, embedding-space alignment, and cross-feature interactions. Existing solutions such as late interaction and knowledge distillation can mitigate these issues, but often increase latency or are difficult to deploy in online learning settings. We propose Capability Synergy (CS3), an efficient online framework that strengthens two-tower retrievers while preserving real-time constraints. CS3 introduces three mechanisms: (1) Cycle-Adaptive Structure for self-revision via adaptive feature denoising within each tower; (2) Cross-Tower Synchronization to improve alignment through lightweight mutual awareness between towers; and (3) Cascade-Model Sharing to enhance cross-stage consistency by reusing knowledge from downstream models. CS3 is plug-and-play with diverse two-tower backbones and compatible with online learning. Experiments on three public datasets show consistent gains over strong baselines, and deployment in a largescale advertising system yields up to 8.36% revenue improvement across three scenarios while maintaining ms-level latency.