Search papers, labs, and topics across Lattice.
This paper tackles the challenges of high computational cost and unstable performance in Generative Recommendation (GR) systems using Semantic IDs (SIDs). They identify a "Semantic Dilution Effect" caused by redundant tokens and propose STAMP, a framework with Semantic Adaptive Pruning (SAP) to filter redundant input and Multi-step Auxiliary Prediction (MAP) to densify output feedback. Experiments on Amazon and industrial datasets demonstrate STAMP achieves significant speedup and VRAM reduction while maintaining or improving performance.
Semantic Trimming and Auxiliary Multi-step Prediction (STAMP) slashes the computational cost of Generative Recommendation by up to 38% while simultaneously boosting performance.
Generative Recommendation (GR) has recently transitioned from atomic item-indexing to Semantic ID (SID)-based frameworks to capture intrinsic item relationships and enhance generalization. However, the adoption of high-granularity SIDs leads to two critical challenges: prohibitive training overhead due to sequence expansion and unstable performance reliability characterized by non-monotonic accuracy fluctuations. We identify that these disparate issues are fundamentally rooted in the Semantic Dilution Effect, where redundant tokens waste massive computation and dilute the already sparse learning signals in recommendation. To counteract this, we propose STAMP (Semantic Trimming and Auxiliary Multi-step Prediction), a framework utilizing a dual-end optimization strategy. We argue that effective SID learning requires simultaneously addressing low input information density and sparse output supervision. On the input side, Semantic Adaptive Pruning (SAP) dynamically filters redundancy during the forward pass, converting noise-laden sequences into compact, information-rich representations. On the output side, Multi-step Auxiliary Prediction (MAP) employs a multi-token objective to densify feedback, strengthening long-range dependency capture and ensuring robust learning signals despite compressed inputs. Unifying input purification and signal amplification, STAMP enhances both training efficiency and representation capability. Experiments on public Amazon and large-scale industrial datasets show STAMP achieves 1.23--1.38$\times$ speedup and 17.2\%--54.7\% VRAM reduction while maintaining or improving performance across multiple architectures.