Search papers, labs, and topics across Lattice.
The paper addresses the challenge of hallucination in industrial advertising QA systems by proposing a reinforced co-adaptation framework for RAG. This framework incorporates GraphRAG, a graph-aware retrieval mechanism that leverages entity-relation structures for multi-hop evidence selection, and evidence-constrained reinforcement learning using Group Relative Policy Optimization (GRPO) with multi-dimensional rewards. Experiments on an internal dataset and a two-week A/B test demonstrate significant improvements in accuracy, completeness, safety, and a substantial reduction in URL hallucination, leading to improved user engagement.
Dramatically reduce hallucination in industrial RAG systems by jointly optimizing retrieval and generation with graph-aware retrieval and reinforcement learning, leading to a 92.7% reduction in URL hallucination in a real-world advertising QA system.
Industrial advertising question answering (QA) is a high-stakes task in which hallucinated content, particularly fabricated URLs, can lead to financial loss, compliance violations, and legal risk. Although Retrieval-Augmented Generation (RAG) is widely adopted, deploying it in production remains challenging because industrial knowledge is inherently relational, frequently updated, and insufficiently aligned with generation objectives. We propose a reinforced co-adaptation framework that jointly optimizes retrieval and generation through two components: (1) Graph-aware Retrieval (GraphRAG), which models entity-relation structure over a high-citation knowledge subgraph for multi-hop, domain-specific evidence selection; and (2) evidence-constrained reinforcement learning via Group Relative Policy Optimization (GRPO) with multi-dimensional rewards covering faithfulness, style compliance, safety, and URL validity. Experiments on an internal advertising QA dataset show consistent gains across expert-judged dimensions including accuracy, completeness, and safety, while reducing the hallucination rate by 72\%. A two-week online A/B test demonstrates a 28.6\% increase in like rate, a 46.2\% decrease in dislike rate, and a 92.7\% reduction in URL hallucination. The system has been running in production for over half a year and has served millions of QA interactions.