Google ResearchUIUCApr 6, 2026arXiv:2604.04457

Retrieval Augmented Conversational Recommendation with Reinforcement Learning

Zhenrui Yue, Honglei Zhuang, Zhen Qin, Zhankui He, Huimin Zeng, Julian McAuley, Dong Wang

AI Summary

This paper introduces RAR, a two-stage retrieval-augmented conversational recommendation framework that aligns retrieval and generation for improved performance and factuality. They construct a large-scale movie corpus with rich metadata and use reinforcement learning with LLM feedback to iteratively update the retriever, mitigating misalignment between retrieval and generation. Experiments on multiple benchmarks demonstrate that RAR consistently outperforms state-of-the-art baselines, showing improved context-aware recommendations and reduced hallucinations.

Key Contribution

LLMs can now generate more relevant and factual movie recommendations by dynamically bridging retrieval and generation with a novel reinforcement learning approach.

Abstract

Large language models (LLMs) exhibit enhanced capabilities in language understanding and generation. By utilizing their embedded knowledge, LLMs are increasingly used as conversational recommender systems (CRS), achieving improved performance across diverse scenarios. However, existing LLM-based methods rely on pretrained knowledge without external retrieval mechanisms for novel items. Additionally, the lack of a unified corpus poses challenges for integrating retrieval augmentation into CRS. Motivated by these challenges, we present RAR, a novel two-stage retrieval augmented conversational recommendation framework that aligns retrieval and generation to enhance both performance and factuality. To support this framework and provide a unified corpus, we construct a large-scale movie corpus, comprising over 300k movies with rich metadata, such as titles, casts and plot summaries. Leveraging this data, our primary contribution is RAR, the first framework to departs from standard two-stage CRS by dynamically bridging retrieval and generation. First, a retriever model generates candidate items based on user history; in the subsequent stage, an LLM refines the recommendations by incorporating conversational context with retrieved results. In addition, we introduce a novel reinforcement learning (RL) method that leverages LLM feedback to iteratively update the retriever. By creating a collaborative feedback loop that reinforces sampled candidate sets with higher ranking metrics, RAR effectively mitigates the misalignment between the retrieval and generation stages. Furthermore, grounding the LLM in factual metadata allows our RL-driven approach to capture subtle user intentions and generate context-aware recommendations with reduced hallucinations. We validate our approach through extensive experiments on multiple benchmarks, where RAR consistently outperforms state-of-the-art baseline methods.

Recommendation & Information Retrieval RLHF & Preference Learning Tool Use & Agents

Citation Metrics

Citations0

Influential citations0

References75

Year2026

VenueN/A

Related Papers

Finding related papers...

Search

Retrieval Augmented Conversational Recommendation with Reinforcement Learning

Related Papers