Mar 12, 2026arXiv:2603.11901

FlexRec: Adapting LLM-based Recommenders for Flexible Needs via Reinforcement Learning

Yijun Pan, Weikang Qiu, Qiyao Ma, Mingxuan Ju, Tong Zhao, Neil Shah, Rex Ying

AI Summary

FlexRec is introduced as a post-training reinforcement learning framework to adapt LLM-based recommenders to dynamic, need-specific objectives in closed-set autoregressive ranking. It addresses the challenges of coarse credit assignment and sparse feedback by employing a causally grounded item-level reward based on counterfactual swaps and critic-guided, uncertainty-aware scaling of rewards. Experiments demonstrate that FlexRec significantly outperforms traditional and LLM-based baselines, achieving up to 59% improvement in NDCG@5 and 109.4% in Recall@5 in need-specific ranking.

Key Contribution

LLM-based recommenders can be dramatically improved (up to 109% Recall@5) by using counterfactual rewards and uncertainty-aware scaling within a reinforcement learning framework, enabling flexible adaptation to diverse recommendation scenarios.

Abstract

Modern recommender systems must adapt to dynamic, need-specific objectives for diverse recommendation scenarios, yet most traditional recommenders are optimized for a single static target and struggle to reconfigure behavior on demand. Recent advances in reinforcement-learning-based post-training have unlocked strong instruction-following and reasoning capabilities in LLMs, suggesting a principled route for aligning them to complex recommendation goals. Motivated by this, we study closed-set autoregressive ranking, where an LLM generates a permutation over a fixed candidate set conditioned on user context and an explicit need instruction. However, applying RL to this setting faces two key obstacles: (i) sequence-level rewards yield coarse credit assignment that fails to provide fine-grained training signals, and (ii) interaction feedback is sparse and noisy, which together lead to inefficient and unstable updates. We propose FlexRec, a post-training RL framework that addresses both issues with (1) a causally grounded item-level reward based on counterfactual swaps within the remaining candidate pool, and (2) critic-guided, uncertainty-aware scaling that explicitly models reward uncertainty and down-weights low-confidence rewards to stabilize learning under sparse supervision. Across diverse recommendation scenarios and objectives, FlexRec achieves substantial gains: it improves NDCG@5 by up to \textbf{59\%} and Recall@5 by up to \textbf{109.4\%} in need-specific ranking, and further achieves up to \textbf{24.1\%} Recall@5 improvement under generalization settings, outperforming strong traditional recommenders and LLM-based baselines.

Recommendation & Information Retrieval RLHF & Preference Learning

Citation Metrics

Citations0

Influential citations0

References40

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

FlexRec: Adapting LLM-based Recommenders for Flexible Needs via Reinforcement Learning

Related Papers